Commentary on Tenenbaum & Griffiths "Generalization, Similarity, and Bayesian Inference"
Abstract: 60 words
Main Text: 1015 words
References: 0 words
Total Text: 1098 words (including abstract, main text, and
aknowledgments)
lera@psych.stanford.edu
http://www-psych.stanford.edu/~lera
Michael Ramscar
michael@dai.ed.ac.uk
http://www.dai.ed.ac.uk/homes/michael/
Tenenbaum & Griffiths (henceforth T&G) present an ambitious attempt at a computational framework encompassing generalization, similarity, and categorization. Although it would seem elegant to account for all of similarity and/or categorization in a simple unitary framework, the phenomena in question are almost certainly far too complex and heterogeneous to allow this. A framework this general will inevitably fail to capture much of the intricacy and sophistication of human conceptual processing. That is, it may turn out to be a theory about spherical cows rather than cow-shaped ones.
T&G propose to model similarity as generalization based on Bayesian inference. However, while T&G specify a framework (essentially, Bayes' rule and some ancillary equations), they fail to specify a procedure for generating, weighting, or constraining any of the input into this framework. At times, T&G base the representations in their hypothesis space on people's similarity judgments. It is hardly surprising that a model with people's similarity judgments built in can compute similarity. Further, the basis for T&G's claim that similarity is based on Bayesian generalization becomes unclear - in their model, generalization appears to be based on similarity and not the other way around. At present the framework relies solely on hand-coded and hand-tailored representations, while the few predictions it does make (relying on asymmetrical comparison and the size principle) are not borne out by data. We review just a few of the complications as illustrations below.
People's similarity judgments are based on a myriad of contextual, perceptual, and conceptual factors. In carrying out a comparison, people need to choose a way to represent the things to be compared as well as a strategy for comparing them. This means that a comparison between the same two items in different circumstances will yield different results. For example, in a replication of T&G's study shown in the left panel of Figure 6 (with right-left position counterbalanced), 62% of our subjects picked the object-match (a) as most similar to the top example. But, if subjects were first given the example shown in the right panel of Figure 6 and then the question in the left panel, then only 33% picked the object-match. Changing how likely people were to notice and represent the relational structure of the stimuli had a dramatic effect on the results of the comparison. In another example, subjects were asked to say which of AXX or QJN was most similar to AHM (a problem structurally similar to T&G's in Figure 6), and 43% chose QJN when the letters were presented in Chicago font (which makes all the letters look boxy). When the same letters were presented in Times font (which emphasized the pointy ends of the A's), only 17% chose QJN. Thus, trivially changing the perceptual properties of the stimuli can have a dramatic effect on how people choose to represent and compare the arrays.
Nothing inherent in T&G's framework predicts these kinds of results. Although T&G's framework might allow for perceptual similarity, effects of context, and other factors to be coded into the hypothesis space, it is disappointing that it is these back-door (coded-in and not necessarily principled) elements, and not anything about the framework itself that carry all of the explanatory power. Moreover, at times the specifics of the framework can even prevent the back-door solutions from working, even when these solutions are probably the psychologically correct ones. Consider the following example.
When subjects were asked which of 1-911-ANALOGY or 1-208-BKSDEMG was most similar to 1-615-QFRLOWY, 75% of the subjects chose 1-208-BKSDEMG (chi^2=5.00, p<.05) even though 1-911-ANALOGY shares 4 extra features with the base example, and the "1 in position 3, L in position 8, O in position 9, and Y in position 11" hypothesis is more than 72,000 times more restrictive than the "all different letters" hypothesis. Despite an advantage of more than 72,000 to 1, the size principle proposed by T&G as a new universal had no effect. We doubt that any one of our subjects even considered the "1 in position 3, L in position 8, O in position 9, and Y in position 11" hypothesis. Clearly the distinctive properties in 1-911-ANALOGY are responsible for the subjects choices. Although T&G's model can discover distinctive features utilizing the size principle, it is limited to discovering the distinctive features of the base of the comparison (in T&Gs framework, similarity is based on the intrinsically asymmetrical function of generalization, which depends only on the distinctive features of the base and not the target). But for the subjects, the outcome of this problem depends on the distinctive features of the target (the opposite of what T&G predict). It seems unlikely, given the flexibility and sophistication of human thought, that all comparison processes will be bound by the asymmetrical properties of Bayesian inference. Further, if the model is extended to be able to perform bi-directional comparisons, how will it decide which of the computations to choose as the measure of similarity? Unless some principled way is specified, the model will be able to predict anything (and as such will explain nothing). It would appear that the model's predictions (asymmetrical comparison and the size principle) are not borne out by data. Rather, the hand-coded hypothesis space (a kind of a clairvoyant homunculus that can mysteriously assemble itself to fit any given occasion) carries most of the explanatory power.
Finally, we should evaluate any model not only on whether or not it can be falsified, but also importantly on its usefulness. How much does it add to our understanding of cognition? T&Gs model is only viable if we can somehow anticipate (and hand-code in) all the adjustments to the hypothesis space that will be required in any given situation (i.e., build in complete world knowledge). As such, the framework is either computationally unimplementable (if we can't build everything in) or psychologically uninformative (if we can).
A theory that applies equally well to all possible situations may apply poorly in each. This is especially true if generality requires us to disregard much of our hard-won understanding of the details of psychological processing. There is a vast literature documenting the complexity and diversity of representations and processes involved in similarity and categorization. The sheer variety of these psychological phenomena weighs heavily against any simple unitary account. Any such account can at best aspire to be a theory of spherical cows -- elegant, but of little use in a world filled with cows that stubbornly insist on being cow-shaped.