Below is the unedited preprint (not a quotable final draft) of:
Schyns, P., Goldstone, R.L., & Thilbaut, J.-P. (1998). The development of features in object concepts. Behavioral and Brain Sciences 21 (1): 1-54.
The final published draft of the target article, commentaries and Author's Response are currently available only in paper.
For information about subscribing or purchasing offprints of the published version, with commentaries and author's response, write to: journals_subscriptions@cup.org (North America) or journals_marketing@cup.cam.ac.uk (All other countries).

THE DEVELOPMENT OF FEATURES IN OBJECT CONCEPTS

Philippe G. Schyns
University of Glasgow
Dept. of Psychology
Glasgow, G12 8QB
UNITED KINGDOM
philippe@psy.gla.ac.uk

Robert L. Goldstone
Indiana University
Dept. of Psychology
Bloomington, IN 47405
USA
rgoldsto@ucs.indiana.edu

Jean-Pierre Thibaut
Universite de Liege
Dept. of Psychology
Batiment B32
Sart-Tilman 4000 Liege
BELGIUM
jthibaut@vm1.ulg.ac.be

Keywords

Concept learning, conceptual development, perceptual learning, features, stimulus encoding

Abstract

According to an influential approach to cognition, our perceptual systems provide us with a repertoire of fixed features as input to higher-level cognitive processes. We present a theory of category learning and representation in which features, instead of being components of a fixed repertoire, are created under the influence of higher-level cognitive processes. When new categories need to be learned, fixed features face one of two problems: (1) High-level features that are directly useful for categorization may not be flexible enough to represent all relevant objects. (2) Low-level features consisting of unstructured fragments (such as pixels) may not capture the regularities required for successful categorization. We report evidence that feature creation occurs in category learning and we describe the conditions that promote it. Feature creation can adapt flexibly to changing environmental demands and may be the origin of fixed feature repertoires. Implications for object categorization, conceptual development, chunking, constructive induction and formal models of dimensionality reduction are discussed.


1. INTRODUCTION

We believe an influential and powerful idea in cognitive science must be revised in order to provide a full account of cognition. This idea is that cognitive processes such as categorization and object recognition operate on a fixed set of perceptual or conceptual features which are the building blocks for complex object representations. We will argue that categorization and object recognition often require the creation of new features. The featural repertoire, rather than being fixed is dependent on situation demands, novel categorization requirements, and environmental contingencies.

In this target article, a "feature" will refer to any elementary property of a distal stimulus that is an element of cognition--an atom of psychological processing. This does not imply that people are consciously aware of these properties. Instead, features are identified by their functional role in cognition--e.g., they authorize new categorizations and perceptions. Stimulus dimensions are ordered sets of feature values such as size, brightness, hue. Two features can create a new stimulus dimension, for example by interpolating the intermediate values between poles defined by the two features.

1.1. Fixed Feature Vocabularies

In a typical application of the fixed features approach to categorization (e.g. Bruner, Goodnow, & Austin, 1956), subjects are shown simple objects, and are instructed to learn the rule for sorting them. Such rules are based on logical combinations of features that are manifestly present in the stimuli. For example, a subject might learn a rule that objects that are white and square should be put in the same category. The subject does not have to create the relevant features to be used for categorization. Instead, there is an implicit agreement between the experimenter and the subject about what features compose the stimuli.

Although categorization research has come a long way since these early experiments, many recent approaches to categorization have continued to use stimuli that "wear their features on their sleeves." Clear-cut dimensions with distinct values are often used for reasons of experimental hygiene. Researchers have used simple shapes (Murphy & Ross, 1994), line positions (Aha & Goldstone, 1992), colors (Bruner et al, 1956), and line orientations (Nosofsky, 1987) as the sources of variation in their experiments. This tendency to compose stimuli out of components has also influenced other fields. In the Recognition By Components (RBC) theory (Biederman, 1987), combinations of a fixed set of 36 geometric elements are used to account for the recognition of a very large set of objects. Theories of phoneme (Jacobson, Fant & Halle, 1963) and letter (Gibson, 1971; Selfridge, 1959) recognition are also based on a limited set of primitives. Schank's (1972) Conceptual Dependency theory likewise postulates a fixed set of about 20 semantic primitives such as PTRANS (physical transfer) and INGEST (cf. Katz & Fodor, 1962 for a related point of view). This work varies widely in the kinds of they use and how they are combined, but all assume that representations are composed out of a fixed feature set.

Fixed features are the lowest building blocks of object representation and categorization. Any functionally important difference between objects must be representable as differences in their building blocks if it is to be represented within the system. It is typically assumed that these features are nondecomposable units or "atoms," although, if pressed, many researchers would concede that their atoms may be decomposable if necessary. All of the strengths of "mental chemistry" are inherited by this approach: A very large number of object descriptions can be generated from a finite set of elements and a set of combination rules. In addition, combinations of features allow for structured hierarchical representations (Palmer, 1977), as opposed to the template approach to recognition (Ullman, 1989) which typically does not assume a decomposition into building blocks. In addition, the systematic relations between different objects can be expressed in terms of their features and their combination rules (Fodor & Pylyshyn, 1988).

We wish to retain these powerful properties of componential representations, but we also wish to provide a framework for augmenting feature sets with new features. Componential theories of cognition should provide ways to develop new representations. Fixed feature theories limit new representations to new combinations of the fixed features. Consequently, all possible categorizations are bounded by the possible combinations of the features. If a categorization requires a feature not present in, or derivable from the feature set then the categorization cannot be learned. This is a rather limiting view of representational change. There may be occasions when features not originally present in the system are useful for distinguishing between important categories in the world that newly confront the organism. A system that is constructed flexibly enough to learn such features would be able to tailor its feature repertoire to the demands of categorization. In many situations, it is unrealistic to think that a system comes fully equipped to deal with all possible contingencies in a complex environment.

We will provide an account of feature learning in which the components of a representation have close ties to the categorization history of the organism. We will discuss the empirical evidence suggesting that such development occurs, and the reasons why learned features are necessary. Although we will not propose a specific implementation of flexible feature learning, we will discuss computational models that can account for learned features, and how current models must be supplemented. Our analysis is addressed to literatures on both object recognition and categorization. Although these fields have not traditionally been linked, both deal with the question: "What is this object?" To recognize an object as a cart is equivalent to placing the object into the category of things called "carts." In both cases, the problem is to detect the relevant features of the object in the visual array.

1.2. Empirical Evidence for Learned Features

Although it is not addressed by fixed feature approaches to categorization, there is some that substantial changes occur to perceptual systems during learning. The most parsimonous explanation for some of these perceptual changes is that structures in the environment are discovered and incorporated as new features of psychological processing. This is what we mean by "feature creation."

Before we review the evidence, a few ground rules are needed. First, we distinguish between feature weighting and feature creation. A feature that is useful (diagnostic) for a categorization may be selectively attended. This selective attention may simply be a decisional strategy that does not affect the appearance of the object to be categorized (Elio & Anderson, 1981; Nosofsky, 1987). For example, to categorize efficiently, Elio and Anderson's subjects learned base their judgments on diagnostic features even though they could still easily perceive the nondiagnostic features. Some researchers, however, have hypothesized that features are selectively weighted if they are diagnostic, and that this selective weighting affects perceptual, rather than only strategic or final decisional processes (Gibson, 1969). In both of these views, it is assumed that the changes are based on previously existing features or dimensions. A third view is possible in which new features or dimensions are created in the service of categorization requirements. The creation of new features is implicated if the required number of prespecified features would otherwise be implausibly large.

Second, the reported experiments we review differ in the level at which representational change is assumed to take place. Sometimes, representational changes are relatively late and strategic. Learning may consist of coming to use a previously diagnostic feature under new conditions (Lawrence, 1949). On other occasions, feature changes are relatively perceptual and nonstrategic. Relevance for categorization may influence relatively perception- based tasks. It is notoriously difficult to draw a sharp distinction between perceptual and conceptual tasks; we wil argue that it is ill- advised to make the distinction. For example, same/different judgments tasks (tasks where subjects are required to judge whether or not two simultaneously presented stimuli are physically identical or not) have usually been thought of as providing relatively clear evidence for perceptual similarity. However, to the extent that subjects always have to represent, remember (albeit for a very short time), and attend to aspects of the compared stimuli, we cannot be certain that these tasks tap purely sensory representations. Still, by examining the particular stimuli and task demands, we might be able to assess the relative contributions of strategic and perceptual factors.

1.2.1. Preexposure

The simplest form of perceptual learning that has been studied is predifferentiation (Gibson & Walk, 1956). In predifferentiation, exposure to stimuli before testing results in heightened sensitivity to those stimuli. For example, human subjects are better able to distinguish between "doodles" (a contiguous concatenation of randomly selected complex curves) after repeated exposures to them. Researchers (e.g. Gibson, 1991) have interpreted preexposure results as perceptual differentiation, a process in which aspects of the stimuli that serve to distinguish them are made more salient. Feedback on the classification or use of stimuli is not required for sensitization; simple exposure to the stimuli suffices.

1.2.2. Diagnosticity Driven Learning

Although preexposure effects indicate that category feedback is not a prerequisite for learning new aspects of the stimuli, other studies have suggested that categorization exerts an additional influence on how subjects deal with the stimuli. Subjects become selectively attuned to diagnostic features that facilitate discrimination between categories. Lawrence (1949) described a theory of acquired distinctiveness in which the cues relevant to a task become more differentiated. For example, rats were rewarded for choosing one stimulus over another in a rough-smooth discrimination task. Subsequently, the rats were tested on a discrimination task in which, for example, rough patterns required left responses and smooth patterns required right responses. Rats learned this second discrimination more quickly than rats who were first given a black- white discrimination.

Although experiments of this sort show that dimensions can be selectively sensitized, they provide little evidence for perceptual changes per se. One simple explanation of these results is that the organism simply generalizes the usefulness of a dimension from one situation to another. However, other recent data suggest that categorization diagnosticity influences an object's representation in terms of features. Categorization diagnosticity refers to the predictability of a category from its building blocks. Categorization diagnosticity can influence perceptual changes in at least two ways. First, it can influence the discriminability of values within existing dimensions, or the discriminability of entire preexisting dimensions. For example, Goldstone (1994a) gave human subjects categorization training on squares varying in size or brightness. After prolonged training, subjects were tested in a same/different task. When a dimension had been relevant for categorization, same/different judgments along this dimension were more accurate (using the d' measure from signal detection theory) than those of subjects for whom the dimension had been irrelevant, or control subjects who had not undergone categorization training. The greatest acuity increase along the categorization-relevant dimension was found between those points that had served as the boundaries between the learned categories. However, this sensitization of the relevant dimension also extended to other points along that dimension even though those were originally placed in the same category. In addition, one case of acquired equivalence was found in which discrimination along a dimension that was irrelevant for categorization became less acute that that of control subjects. Because same/different judgments involve "cognitive" factors such as (very) short term memory, attention, and encoding, these results do not garantee that the changes were perceptual, but at least it can be said that the categorization training influences a task that many researchers have assumed to tap relatively low-level perceptual processes. Andrews, Livingston and Harnad (in press) have found similar influences of categorization on similarity judgments.

Category diagnosticity can also influence perception by participating in the creation of new features for object categorization. For example, Schyns & Murphy (1991, 1994) provide evidence for such a process. In a typical experiment, subjects had to learn to label new objects and were later tested on the features used to encode the categories. The stimuli were continuous three- dimensional rock-like "blobs" (see Figure 2, picture a). The stimuli had a complex blob structure so that naive subjects showed little agreement in how they decomposed them before categorization training. The categories were defined by a coherent group of a few contiguous parts present in each category member; the rest of the blobs of an object had random shapes. After learning to categorize them, subjects were instructed to decompose the objects into parts that they thought were relevant. These parts tended to be the ones that were diagnostic for categorization. This parsing differed from what it had been before the training and occured despite a strong bottom-up constraint (the minima rule, Hoffman & Richards, 1984) on object parsing that would predict parsings other than those obtained in the experiments. Schyns and Murphy's subjects did their parsing by outlining the parts of each object (using either a computer mouse or a pen). Although it is not free of cognitive influences, this technique has the advantage of leaving subjects free to report any fragment of a stimulus they wish (independently of whether it has an easily expressible name). Braunstein, Hoffman and Saidpour (1989) found that an outlining method gave the strongest evidence for parsing with the minima rule, hence Schyns and Murphy should have found evidence of physically determined parsing with this task; instead, parsing was determined mostly by categorization constraints.

For the Martian rocks experiments, hypothesizing that the effects are due to shifts of attention to existing features would require positing an implausibly large number of dimensions or features. Explanations in terms of mechanisms for dynamically creating new features seem more parsimonious in these cases.

1.2.3. Differentiation

Several researchers have suggested that experience with stimuli results in subjects differentiating stimulus dimensions that were originally processed together. There is a substantial developmental evidence that children are more likely to perceive stimuli in an undifferentiated manner whereas adults analyze the stimuli into distinct dimensions. For adults, some pairs of dimensions, like size and brightness, are called "separable" (Garner, 1974). They are processed separately, attention can be selectively placed on just one of the dimensions, and similarities between stimuli are computed by summing their separately determined dimension differences. Other pairs of dimensions, like the saturation and brightness of a color, are called "integral." Such dimensions appear to be psychologically "fused" in that it is difficult to attend selectively to just one of them, and similarities between stimuli are computed by considering the two dimensional differences simultaneously. Several studies have indicated that children process separable dimensions in the same way that adults process integral dimensions (Smith & Kemler, 1978; Ward, 1983). One explanation of these results is that part of the maturation process is to separate dimensions that were not originally separated. Such a process has also been implicated for learning distinctions between more abstract dimensions such as heat and temperature (Smith, Carey, & Wiser, 1985). The differentiation of dimensions seems to occur even in adulthood. Through training, the saturation of a color can be psychologically differentiated from its brightness (Goldstone, 1994a; Burns & Shepp, 1988).

There is a second type of differentiation in which categories, rather than dimensions, are split apart. Developmental studies find that the lexical categories of young children are often broader than the lexical categories of adults (Clark, 1973). For example, when children overgeneralize category labels, they may group together all round objects as instances of "ball" (Chapman, Leonard & Mervis, 1986). Eventually, after a progressive reorganization of their concepts, children's lexical categories narrow down and match those of adults. Adding features to an initially broad concept presumably allows it to be differentiated into more specific concepts. The acquisition of new features more specifically tuned to the categorization tasks at hand may also underlie the development of adults' conceptual expertise. Tanaka and Taylor (1991) studied categorizations by dog and bird experts. Experts were particularly adept at making fine discriminations within their category of expertise, suggesting that acquire features that are specific to their domain of expertise. Schyns (1991) provided a neural network model of this type of conceptual differentiation. In a two-layered net, units initially representing a broad category became progressively specialized in representing finer categories on the basis of a feature extraction process.

1.2.4. Summary

The experimental evidence reviewed above indicates that our categorizations, rather than being based on existing perceptual features, also determine the features that enter the representation of objects. Some perceptual changes may arise from mere exposure with to objects, but others depend on the way in which objects of the environment are organized into categories. In addition, perceptual dimensions and categories both undergo a differentiation process based on environmental contingencies. These results provide an initial indication that categorical constraints could influence features: Rather than being fixed and unaffected by experience, features could be progressively extracted and developed as an organism categorizes its world.

2. A FUNCTIONAL APPROACH TO FEATURE CREATION

2.1. The Function of Features

The function of a feature is to mark commonalities between members of the same category, and to distinguish between categories. In fixed feature approaches to categorization and object recognition, a functional constraint guides the construction of the repertoire of features. Researchers develop their feature sets by keeping in mind the question: "What features would be required to solve this categorization task?" In many cases, the researchers then test their theories using stimuli that were constructed from these feature sets.

We agree that features should be functionally determined. However, their constraints should be defined by the environment and not simply by the experimenter. Even if the fixed feature researcher manages to draw upon a plausible feature set, it will probably be limited to a specific domain, and will not adapt to temporary or local environmental states. More importantly, to restricting it to the problem of combining obvious, clearly demarcated features is to oversimplify the task of categorization.

Consider the current object recognition literature in which Biederman's geon theory of object recognition (Biederman, 1987) is contrasted with the multiple-views approach (Edelman & Blthoff, 1992; Poggio & Edelman, 1990; Tarr & Pinker, 1989). In Recognition By Components (Biederman, 1987), objects are represented by a set of geometric elements derived by taking various geometric slices through the possible transformations of a generalized cone. The resulting elements can be distinguished from each other on the basis of a few nonaccidental features--features that are invariant over a wide range of transformations (rotational, translational and scalar). Transformational invariance is a desirable property because telephones do not change their category membership (the fact that they are telephones) simply because they are rotated. However, Biederman's feature set is severely limited in its application to many natural objects (Kurbat, 1995; Ullman, 1989); it does not allow discriminations between many similar categories, and objects within the same category will not necessarily be represented by the same geon structure. These limitations are not problems for Biederman's theory alone, but for any approach that cannot adapt its building blocks flexibly to categorical constraints.

For example, recent object recognition research has demonstrated that the relationship between the observer and the object influences recognition performance (e.g. Edelman & Bulthoff, 1992; Palmer, Rosch & Chase, 1981; Tarr & Pinker, 1989). Viewpoint-dependent recognition was interpreted by Tarr and Pinker (1989) as evidence that objects are represented in memory by a collection of specific views (see also Poggio & Edelman, 1990). When views of an object are the basis of object representation, it is difficult to determine which subset of the set of all possible views best predicts categorizations. Categorizations are so diverse that there may not be a unique, canonical, and task- independent view-based representation of a particular object (Hill, Schyns & Akamatsu, in press). For example, any view of your face could reveal diagnostic information to distinguish it from a car, but fewer views would be well-suited to discriminate your face from a face of the other sex, and still fewer views would reliably distinguish your face from another face of the same gender. Viewpoint dependence appears to be relative to the diagnostic information in the task considered, and the location of the information on the object.

Hence, both the geon- and the view-based approaches to object recognition must tune their representations to the functional roles of their building blocks. That is, both theories must consider the possible categorizations of an object before considering the possible geometric elements or views that will be used to represent the object.

2.2. Categorical Constraints on Feature Creation

A categorical context is composed of the categories and features individuals know at a particular point of their conceptual development, plus the new category to be encoded. The individual knows what the categories are from external feedback--i.e., the consequences of their miscategorizations. Contrary to the classical assumption that category learning operates on fixed features, we wish to suggest that features are flexible--i.e., they adjust to the perceptual experience and the categorization history of the individual. Flexible features open the possibility that the same input is differently perceived and analyzed before being categorized. Hence, a complete theory of conceptual development should not only explain the ways in which object features are combined to form concepts, it should also explain the development of the features participating in the analysis of the input.

The role of categorization constraints on feature creation was explicitly studied by Schyns and Rodet (in press) using three categories of unknown stimuli called "Martian cells" (see Figure 1). Categories were defined by specific blobs common to all category members to which irrelevant blobs were added (to simulate various cell bodies). Figure 2 shows, from left to right, an exemplar of the XY category, an exemplar of X, and one of Y. The figure also shows the features x, y and xy defining each category. Note that xy is the conjunction of x and y.

Click here for Figure 1

Figure 1. This figure illustrates the design of Schyns and RodetUs (1996) feature creation experiment. From left to right, the top pictures are Martian cell exemplars from the XY, X, and Y categories, respectively. From left to right, the bottom pictures are the features xy, x, and y, defining the categories. Note that the feature xy is a combination of feature x and feature y. Subjects in the XY->X->Y (vs. X->Y->XY) group learned the category in this order.

The main goal of Schyns and Rodet's experiment was to demonstrate that different categorization constraints could induce orthogonal perceptions of the defining component of XY--i.e., perceptions of xy as an x&y feature conjunction, or as an xy unitary feature. One group of subjects was asked to learn X before Y before XY (X->Y->XY); the other group learned the same categories in a different order (XY->X->Y). Reliable classifications of X, Y and XY stimuli in the testing phase indicated, without any doubt, that all subjects saw and attended to the components x and y. X-Y cells were used to understand the perceptual analysis of XY. X-Y cells were XY exemplars in which the x and y components were not adjacent to each other. The reasoning was that subjects should categorize X-Y cells as XY members if they perceived and represented XY as conjunction of two individuated features. Results revealed that only one group (X->Y->XY) performed this categorization while the perception of XY in the other group prompted X or Y classifications of X-Y. In sum, orthogonal classifications of X-Y, when its component features were both clearly perceived and used in the experimental groups, suggested that different features were acquired to perceptually analyze and represent XY. Network simulations further suggested that the feature vocabularies were F X->Y->XY = {x, y} and F XY->X->Y = {xy, x, y}, respectively. Together, these results challenge the main claim of fixed feature approaches that category learning only consists of weighting the features of a fixed set that tend to characterize categories. It appears that category learning also changes the features that perceptually analyze the input.

Rodet and Schyns (1994) also tested more specifically the role of the context of categorization on the perceived similarity of stimuli. In the first part of their Experiment 3, two groups of subjects learned two Martian cells categories that would later serve as the background context for learning a third category. The categories were designed so that the two groups would learn different concepts using the same learned features. Both groups learned that the feature x characterized the first category X. The two groups differed on the nature of the second category. The first group was exposed to an XY category defined by the x feature adjacent to the y feature. The category of the second group was defined by only the y feature. Subjects then learned a third XYZ category defined by adjacent x, y and z components. Subjects' encoding of the new category was tested with a sorting task and a same/different speeded judgment task. It was found that the second group of subjects, but not the first group, distinguished XY stimuli from XYZ stimuli. These results confirmed the hypothesis that different histories of categorization generate different feature spaces to encode similarities and contrasts between objects.

2.2.1. Two types of concept learning

The experiment just described indicates that a history of categorization can trigger different concept learning mechanisms. By the time the third concept is to be acquired, subjects of the second group have the necessary features x and y to identify the third category; subjects of the first group must create a third, novel feature z in order to identify the third category.

In the concept learning space fixed by x and y, Group 2 subjects represent XYZ as a combination of the two previously acquired features. This particular encoding illustrates what we call "fixed space learning," the familiar diagnosticity-driven learning that Gibson (1991), Lawrence (1949), and concept learning researchers have discussed. However, the combination of x and y already represents the second category of Group 1, so subjects must develop a new feature z to distinguish the third category (see Rodet & Schyns, 1994). We call this encoding "flexible space learning," to emphasize the expansion of the categorization space to include a new feature or dimension.

2.3. Functional Features and Primitives

The premise that features are created to subserve categorizations applies to the creation of functional features but is neutral as to their perceptual realization. For example, the object property "square" could be featurally represented as a concatenation of image pixels, as four line segments, as four corners, as four smaller squares, as two smaller rectangles, as a linear combination of sinusoids, and so forth. In short, there are many possible realizations of a functional feature. We have proposed that properties of object that become diagnostic for important categorizations can become functional features of a system's repertoire. However, one potential problem that must be addressed is the degree to which these functional features are themselves based upon a (more) primitive set of features. If a primitive set of features can capture all the regularities and categorizations accommodated by the functional features, then the new functional features do not increase the representational capacity of the system. And if this is the case, then the hypothesis that feature creation is needed to allow a system to represent object properties it was previously incapable of cannot be maintained. We will accordingly argue that functional features are not always constructed out of a fixed catalog of primitive features.

A set of shape primitives that could ground categorization must satisfy at least three conditions: The primitives must exist prior to experience with the objects they represent, they must be sufficient to represent the entire set of representable objects, and they must be able to bootstrap complex recognition systems. Ultimately, there are two ways of conceptualizing these primitive features, each with its own problem. Either primitives are fine-grained and relatively unstructured, or they already represent complex structures of the environment.

2.3.1. Unstructured primitives

According to the unstructured approach, if one takes sufficiently fine-grained primitives (e.g. very small line segments, or even pixels) together with powerful combination rules, diagnostic compositions of the primitives could represent complex properties of objects. However, functionally important object regularities (e.g., symmetry, serif, beauty, and so forth) are often not captured by simple pixel-based features. It is unlikely that systems which hypothesize object properties such as symmetry as a primitive of object recognition (Gibson, 1969) can explain them by commonalties at the pixel-level (though see Barlow, 1980; Barlow & Reeves, 1979). Moreover, as will be discussed in the section on formal models of feature extraction, it is not practically feasible (although logically possible) to extract relevant categorization features from pixel-based (or similarly unstructured) representations of the input.

2.3.2. Structured primitives

According to another approach to primitives, the catalog includes more complex primitives such as larger curves, corners, squares, circles, triangles, or even three-dimensional shapes such as cones and cylinders (see Biederman, 1987; Garner, 1974; Treisman & Gelade, 1980; among others). Complex (rather than simple) primitives would already mirror important structures of the visual environment and could therefore account for complex recognition by initially segmenting the visual environment into useful primitives for recognition. However, such preformed recognition systems are blind to structures that are not represented as primitives, and that are not compositions of simpler primitives.

To illustrate, in Fisher's (1986) influential model of letter recognition (cited in Czerwinski, Lightfoot & Shiffrin, 1992), a capital "A" is identified by composing three primitives (two diagonal bars and a horizontal one). Clearly, diagonal and horizontal bars were selected as primitives with the task of letter categorization in mind; the same primitives would be particularly clumsy in categorizing varieties of ellipses. One might imagine adding a second subset of primitives for distinguishing ellipses. However, any large-scale, highly structured set of primitives is bound to be too coarse to detect (and internally represent) all of the distinctions that might be required by different categories of objects.

2.3.3. Interactions between choice of primitives and task constraints

Task constraints almost always influence the primitives that investigators import into their componential theories of recognition. In our view, the task of the subject creating new functional features for categorization is not substantially different from the task of the scientist creating a componential theory of recognition: both must produce a catalogue of features that are defined by their role in recognition. If the investigators want to posit a complete fixed set of primitives, they must envision all possible recognition problems before conceiving of the features that would solve them. So, the envisioned set of tasks influences the primitives of recognition that will be selected by a theory of object recognition. Similarly, the particular categorization tasks confronting individuals influence the units of representation that they will adopt. Thus, rather than draw a correspondence between a particular theory of object recognition (with its static primitives) and an individual's object recognition capabilities, the proper correspondence may be between the individual and the meta-theoretic search for a proper object recognition theory.

2.4. Functions, Perceptions and their Interactions

One constraint on the creation of features is their usefulness for categorization. Our claim for functionally determined features does not mean that physiological or sensory facts are unimportant for defining the feature repertoires. Features are also based on general perceptual constraints such as contiguity, topological cohesion, changes of curvature, and perceptual salience. In many cases, these constraints are not a catalogue of shape primitives, but the constraints nonetheless exert strong pressures to create certain features. To illustrate, Hoffman and Richards (1984) have proposed that objects are segmented by creating parts with endpoints that are local minima of principal curvature. Instead of assuming that objects are segmented into primitive shapes, the authors suggest that a particular patch of shape will be identified as a part because it lies between two points of extreme curvature, not because it matches a primitive element. This approach does not limit possible shape features to the compositions of a catalog of primitives. Instead, as a sheet adjusts to the surface on which it is thrown, new features can be acquired to mirror the shapes lying between the segmentations suggested by the minima rule. Hoffman and Richards' constraint on object segmentation illustrates that the structures required for organizing complex representations are not necessarily structured primitives. Instead, general shape-processing constraints can produce segmentations that interact with structuring principles. As Hoffman and Richards (1984, p. 77) state it, "a boundary-based scheme, then, is to be preferred over a primitive-based scheme because of its greater versatility."

A very interesting aspect of Hoffman and Richards' proposal as it applies to the creation of new shape features is that it allows the feature repertoire to partially mirror the shapes that categorizers experience in their environment. This presents new challenges for effective procedures of feature creation. It is conceivable, even desirable, that several distinctive methods are used for developing features, depending on the idiosyncrasies of different object classes. For example, smooth objects such as faces could be parsed into their relevant component features using elastic 3D templates (e.g., Hinton, Williams & Revow, 1992). These templates would behave as elastic masks whose parameters would adjust to shape variations within the class. At the time of writing, there is no agreement on the features, or feature configurations these masks would have. Class- specific variations (e.g., learning to categorize Caucasian faces) would result in class-specific features which would not be directly applicable to the shape variations of other classes (e.g., Asian faces). Mismatches between expected shapes, and expected shape variations could give rise to the "other race effect" in which people perceive faces of their own race with greater facility than those of another race (Brigham, 1986).

While face stimuli are mostly smooth, many man-made objects are discontinuous. This imposes different biases on the eventual elastic templates (and also segmentation constraints other than Hoffman and Richards' minima rule, which operates on continuous surfaces). The templates could be biased so as to "break" at sharp discontinuities of the surfaces, if a categorization required such a break. Such templates could progressively evolve into a repertoire whose asymptotic state could resemble Biederman's (1987) geons, if they were exposed to many man-made object categories. The extraction of 2D shape features could also require distinct mechanisms and representations. For example, 2D patterns (letters, numbers, textures, and so forth) could use feature creation mechanisms based on "growth" (see, e.g., Marr, 1982; Ullman, 1984). Small 2D patches could locally grow from the interior of a 2D pattern until boundary edges stop the growth. New shapes could then be learned from correlations across category exemplars. To illustrate, consider a simple example of this process (adapted from Schyns & Murphy, 1994). Object 1 is a 2D pattern in which arrows show the cusps which are perceptual indicators of its parts (see Figure 2). Consider that Object 1 and Object 2 (or Object 3) form a category. If a 2D contiguous patch is grown in Object 1, its intersection with the patch grown in Object 2 will identify a part feature (indicated by dotted lines on Figure 2). A different feature would result from the intersection of Object 1 with Object 3.

Click here for Figure 2

Figure 2. This figure, adapted from Schyns and Murphy (1994) illustrates a possible interaction between perceptual and functional constraints in learning new features of object representation. The arrows in the target object indicate perceptual constraints on its segmentation. The Target and Object 1 (or Object 2) constitute a category. The dashed lines on the bottom objects illustrate that the shape features extracted on the target also depends on its category membership.

In short, we are arguing that different object categories are likely to prompt the acquisition of different types of features. These different categories are likely to necessitate differently biased mechanisms. Perceptual biases should facilitate the extraction of features in the objects considered (e.g., smooth vs. discontinuous, 2D vs. 3D, and so forth). Categorical biases should tune the features for the categorizations performed. The examples discussed suggest the possibility of creating such features, but they do not provide detailed realizations. It will be a difficult (but necessary) task to extract class-specific perceptual biases to build task-specific feature extraction mechanisms. We will come back to this point when we discuss formal mechanisms of feature extraction.

Both functional (categorical) and perceptual constraints determine what features will be created. We see these constraints as mutually interactive rather than strictly sequential (see also Wisniewski & Medin, 1994). We might envision a system that first created a set of candidate features by applying perceptual constraints, and then selected the new feature from this set of candidates by applying functional constraints. Such a system would suffer from several problems, however. First, in many cases, an implausibly large number of candidates would need to be considered because objects are underdetermined by perceptual constraints (e.g. a 2D object silhouette with 20 bumps on it would have 380 possible parsings even if only contiguous segments were considered). If functional constraints are only considered secondarily, then processing will be inefficient in that too many candidate features that are not potentially useful will be considered; the constraining role of functionality would not be fully exploited. Second, if strong perceptual constraints are applied (e.g., shape primitives), then the relevant feature will often fail to be in the set of candidates. Third, there is substantial evidence that the functionality of a feature influences relatively low-level perceptual processing (Algom, 1992; Goldstone, 1994a; Goldstone, 1995; Oliva & Schyns, 1995; Rodet & Schyns, 1994; Schyns & Murphy, 1991, 1994; Schyns & Rodet, in press). The cumulative effect of this evidence makes it unlikely that functionality is only considered after perceptual processing has been completed.

Whereas we admit the intrinsic futility of searching for the boundary between perception and conception, we believe it is useful to describe a continuum from the perceptual to the conceptual. What varies along this continuum is how much and what sort of processing has been done to the input. Specifying exactly where experiential and categorical pressures influence processing along the perception/conception continuum is a real, although highly empirical, problem. One apparently fruitful approach to specifying how early an influence conceptual factors have is to identify influences on other processes. Thus, there is evidence that conceptual factors (knowledge of categories and attitudes) not only influence physical and immediate color judgments (Delk & Fillenbaum, 1965; Goldstone, 1995), but also exert an influence before the perceptual stage that creates color after-images has completed its processing (Moscovici & Personnaz, 1991). Similarly, there is evidence that conceptual factors related to one's knowledge of object categories exert an influence before the processing stage that produces figure/ground segregation (Peterson & Gibson, 1994).

Another approach to specifying locations of influence on a perceptual/conceptual continuum is to observe the time course for the use of particular types of information. For example, on the basis of priming evidence, Sekuler, Palmer, and Flynn (1992) argue that knowledge about what an occluded object would look like if it were completed influences perception after as little as 150 milliseconds. In general, there are experimental tools available that can identify when--absolutely and relative to other processes--conceptual factors modify information processing. Although the bulk of the work needed to specify the precise locus of influences has yet to be done, current evidence suggests a surprisingly early contribution of conceptual factors such as background knowledge and learned categories.

2.5. Feature Extraction and Experimental Materials

For reasons of control, many experiments in concept learning have used very simple stimuli varying on clearly demarcated dimensions. Real-world objects often vary along many dimensions, however, and in most cases, it is difficult to know what the relevant dimensions are. Although there are excellent reasons for using simple, easily described experimental materials, one major disadvantage with this approach is that it may systematically underestimate the importance of finding an appropriate encoding for the stimuli. It may even be that the traditional use of simple materials produces a bias against finding evidence for feature creation.

Table 1 illustrates some properties of different types of materials used in experiments. The properties listed in the left column characterize many typical stimuli used in concept learning experiments. The properties listed in the right column, "alternative materials," characterize materials that are likely to promote the creation of new features during concept learning. Conceptually, all the properties in the left column serve to make task-relevant features easy to isolate and identify. Conversely, the properties in the right column make it likely that the relevant features are not originally encoded, but allow for their derivation.

Table 1. Stimuli typically used in concept learning and stimuli likely to give rise to encoding new features

Traditional Materials

Alternative Materials
Properties of Dimensions in Isolation
Discrete
Analog/Continuous
Symbolic
Sub-symbolic
Parts easy to delineate
Parts difficult to delineate
Few features
Large number of potential features
Relevant features are salient
Relevant features are not salient
No emergent properties
Emergent properties
Single level of analysis
Multiple levels of analysis
Large dimension values differences
Small dimension value differences
Properties of Dimensions in Context
A priori diagnostic features
A priori nondiagnostic features
Features have constant instantiations
Features are variably instantiated

Alternative materials are typically dense (Goodman, 1965) in that there is no limit on the amount of information that can be obtained from the input or the number of interpretations that can be made. So, alternative materials may contain many different levels of intrinsic structure, allowing for the potential relevance of highly diverse feature sets. Many blobby structures can be extracted from, for example, X-ray pictures that are not combinations of a priori diagnostic features (except to radiologists). Conversely, traditional materials embody a single level of analysis into feature known a priori. The primary level of analysis for alternative materials is subsymbolic because they are designed to insure that symbols (e.g., "square," "circle," "has-legs," etc.) are not easy to assign a priori to the important structures of the stimuli. Stimuli that are likely to be represented in an analog fashion may preserve topological relations which leave open the possibility of a stimulus reinterpretation if new categorizations require such a reinterpretation. Discrete stimuli do not allow this possibility because their interpretation is often unequivocal and automatic. Figure 3 presents several examples of alternative materials that are used in our experiments on feature creation. Picture (a) shows a Martian Rock (Schyns & Murphy, 1991, 1994), picture (b) some doodles (Goldstone, work in progress), picture (c) some Japanese hiragana characters (Ryner & Goldstone, work in progress), picture (d) shows, from left to right an XY and a X Martian cell (Rodet & Schyns, 1994), picture (e) a Martian Lobster (Thibaut, 1995), and picture (f) a Martian landscape (Schyns & Thibaut, work in progress).

Click here for Figure 3

Figure 3. This figure illustrates examples of the alternative materials that are used in our experiments. Picture (a) shows a Martian Rock (Schyns & Murphy, 1991, 1994), picture (b) some doodles (Goldstone, work in progress), picture (c) some Japanese hiragana characters (Ryner & Goldstone, work in progress), picture (d) shows two Martian cells (Rodet & Schyns, 1994; Schyns & Rodet, in press), picture (e) a Martian Lobster (Thibaut, work in progress), and picture (f) a Martian landscape (Schyns & Thibaut, work in progress).

The task confronting subjects who are given what we are calling alternative materials is similar to the task confronting the child who must learn such concepts as dog, table, and father. The child must learn the features that comprise these objects in addition to learning the proper characterization of the concept. Many formal approaches to categorization explicitly avoid issues of feature representation. Researchers often adopt a stance of: "You tell me what the features are, and I will tell you how they are integrated to perform the categorization." Such formal approaches often place no constraints on what may count as a feature. In fact, the lion's share of the work in concept learning seems to be in finding the "right" description space for concept learning.

2.6. Evidence for Novel Functional Features

Novel features are sometimes created, and may not be irreducible to previously existing features of the system. One version of this claim is certainly false. Novel visual features are certainly reducible to their retinal encodings, and possibly to existing structures at early, lower level representations. Thus, it is a conceptual challenge to characterize a "novel feature." Part of the difficulty is that novelty implies a reference point. At the level of the retina, different encodings of the same object are always novel owing to differences in the retinal projections of the input. However, conceptual encodings of this object are much more stable. Functional, high-level features presumably supply the basis for this stability in the cognitive architecture. The question thus becomes: When is a functional feature novel?

Functionally, a feature may be novel simply because it encodes a categorization that was not performed previously. Our conception of functionality is more constrained than this, however, referring to the synthesis of new elements from raw data. There are two difficulties with the latter variety of novelty. The first difficulty is pre-existence: How do we show empirically that a "created" functional feature did not exist prior to the categorization problem? The second difficulty is reduction. How can we insure that a "created" functional feature does not result from the combination of pre-existing functional features?

An ideal empirical test of pre-existence would demonstrate that a functional feature fx not initially present in the feature repertoire becomes a member of the set as a result of learning a new categorization. The absence of fx from the initial repertoire, together with successful categorizations of the new objects would suggest that fx was created (instead of merely weighted for its diagnosticity), assuming that fx is required to perform the categorization. However, empirical evidence for the absence vs. presence of fx is limited to a behavioral manifestation of the new feature (for example, in a transfer or priming task). Unfortunately, a nonexistent feature is behaviorally equivalent to an existing feature with an "attentional weight" of 0. This makes it difficult to tease apart feature weighting from feature creation based on simple, direct tests of the existence of a feature in memory. Evidence of feature creation is, hence, necessarily indirect, testing the implications of foundational assumptions of fixed feature theories. Two of these assumptions are: (1) that objects are characterized by a pre-specified, fixed, unambiguous, and non-decomposable set of features, and (2) that learning always selects, combines, and weighs the fixed features that tend to characterize categories. An important implication of these two assumptions is that category learning is only strategic. That is, learning weighs features of the fixed set, but it does not change the perceptual analysis and the perceptual appearance of the input.

One way to provide evidence for feature creation would be to show that category learning changes features that participate in the perceptual analysis of identical stimuli. This was the goal of the experiment of Schyns and Rodet (in press) described earlier. This experiment was controlled so that the features x and y were each diagnostic of one category in the two categorization conditions (X- >Y->XY and XY->X->Y). Hence, they should in principle elicit identical featural analysis and identical perceptions of the same category exemplars--i.e., subjects in the two categorization conditions should equally see XY exemplars as feature conjunctions. However, the outcome was mutually exclusive perceptions of XY stimuli (a conjunctive and a configural perception), making a feature weighting interpretation of this data difficult to justify. Feature creation as opposed to feature weighting is preferable if category learning induces mutually exclusive perceptual analysis of an objectively identical object property, when the experimental design would predict identical perceptual analysis if the subjects used fixed features.

The other problem of feature reduction is comparatively simpler to address empirically. In principle, if a functional feature is the combination of two or more other features, these other features would become active each time the new feature was presented. However, priming tests on these subfeatures would indicate whether or not they participated in the perceptual encoding of the new feature.

It is always difficult to refute a feature weighting interpretation of categorization results. Part of the reason is that feature weighting is difficult to refute when it is used a posteriori to interpret patterns of data. Feature weighting is a form of curve fitting with free parameters (the weights assigned to features). Feature weighting therefore covers not one, but a potential infinity of models of categorization, and can potentially accomodate any pattern of experimental data if its features are not pre-specified. Attempting to explain features through the history of categorization allows the theorist to ask an important question: What counts as a feature? Most concept learning programs do not address this, but they nonetheless call for new features in different situations. We accept the need for to generate different feature sets for different tasks, but we would like the theorist to explain how the features come to be generated instead of simply positing their existence.

2.7. Advantages of New Feature Learning

A system that allows for the creation of new features during concept learning offers several advantages over fixed feature set approaches. 2.7.1. The most basic advantage, as alluded to earlier, is that an ability to acquire new features allows flexible but constrained features. Unlike purely formal models of similarity and categorization, our approach places constraints on what can count as features: Features will be incorporated into a system to the extent that they distinguish between object categories; features should not be limited to the finite set of a priori features designed by a particular researcher for a particular domain. 2.7.2. A learned set may be equivalent to, but not limited to, other proposed fixed feature sets. Fixed feature sets are motivated by design considerations and psychological evidence. For example, Biederman (1987) suggests that evidence in favor of geons as primitive features comes from studies that delete line segments from objects. When line segments are deleted in a way that does not allow geons to be recovered, object recognition is particularly impaired. However, to the extent that geons are useful features for object categorization, it is reasonable to suppose that they might be generated from functional constraints applied to simpler building blocks such as line segments, or corners, or surfaces. Consequently, evidence in favor of a particular set of features does not entail that the set of features is hard-wired. 2.7.3. A learned set permits a near-optimal fit between categorization demands and the expressive repertoire. New features are created to represent new categorical commonalties or contrasts and can be optimally adjusted in number to a wide variety of task demands (e.g., expert categorizations and subcategorizations). To the extent that each new feature accommodates at least the categorization for which the feature was created, the repertoire should be free of useless features. A fixed feature approach is necessarily much less parsimonious: Many spurious features must exist in the feature repertoire to foresee new categorizations. Moreover, most features of the fixed would never be used--they would keep waiting for their "Godot category." Fixed features necessarily have suboptimal fit outside the scope of the stimuli they were designed to represent. 2.7.4. A flexible set of features tuned to specific categorizations reduces the necessity of complex categorization rules. To illustrate that good representations often carry most of the burden of categorization, consider the XOR problem in learning theory. XOR is a binary function categorizing the pairs (0, 0) and (1, 1) as members of the "0" category and the pairs (1, 0) and (0, 1) as members of the "1" category. Categorization rules that separate the "0" from the "1" category are complex nonlinear rules because no linear solution (a straight line) achieves the separation. Complex learning problems often become simpler with better representations. Add another number as a third input to XOR which is 1 whenever the two input numbers are (1, 1) or (0, 0), and 0 otherwise. A simple recoding simplifies the problem: There is now a linear solution. Although XOR is only a simple formal problem, it nonetheless illustrates the general point that carefully crafted representations often reduce the complexity of categorization processes.

Concept learning theories have frequently stressed the importance of learning categories by discovering complex rules that integrate several distinct stimulus features (Bruner, Goodnow, & Austin, 1956; Nosofsky, Palmeri, & McKinley, 1994). Concept learning certainly does sometimes require such integration. However, these problems have effortful, strategic solutions. They are rather unnatural; people are not particularly adept at explicitly combining psychologically separated sources of information. Our alternative is that new categorizations can be based on relatively few, specially tailored features. 2.7.5. In the flexible feature approach, categorizations can induce a decomposition of features into subfeatures. Consider the contrast between glasses and cans. Early in conceptual development, these objects may be indistinguishable because their memory representations corresponds to a single, undifferentiated feature. Now assume that the organism needs to distinguish between these objects. This can be achieved by decomposing the undifferentiated feature into two specific features tailored to glasses and cans.

The acquisition of a new feature that segments an initially undifferentiated, unitary feature could account for conceptual differentiation, for example, the basic to subordinate shift (Tanaka & Taylor, 1991), the narrowing of children's lexical categories (Chapman, Leonard & Mervis, 1986) and the construction of conceptual hierarchies (Schyns & Murphy, 1994). Classical accounts of concept learning distinguish between features and concepts (which are combinations of features). However, there is little principled distinction between these constructs. Cars may be usefully represented with features such as wheels, but wheels are themselves concepts which maybe decomposable into features (Schyns & Murphy, 1994). Even features such as color which may appear unitary and unstructured, can be decomposed into sub-units (hue, saturation, and brightness) under certain conditions (Foard & Kemler, 1984; Goldstone, 1994a).

3. COMPARISON TO OTHER APPROACHES

We have argued that the feature space of object representations is often created to reflect the specific categorization requirements of an organism. We described some of the advantages of a feature set grounded in the organism's history of categorization (i.e., the categorizations the organism had to solve plus the corrective feedback it received) over the fixed feature sets proposed by many theories of object categorization and recognition. Our proposal for creating new features touches on several issues related to perceptual and conceptual change. The following sections discuss the similarities and contrasts between our proposal for feature creation and feature chunking, new features in constructive induction, developmental constraints on feature extraction, and formal models of feature extraction.

3.1. Chunking and Perceptual Unitization

Research in the visual search literature has supported perceptual changes similar to the types of changes that we have discussed. Training, or automatization effects occur when people actively search for a particular target shape (for example the letter "A") in a visual array of distracter letters (for example, "M," and "W"). In Fisher's (1986) model of visual search, letters are represented by simple features such as horizontal, vertical and diagonal line segments. Similarity between features generally makes it more difficult to find, for example, an "A" amongst "W"s than an "A" amongst "M"s; "A" and "W" share two diagonal bars but "A" and "M" have no common feature. However, even when featural descriptions are quite similar, extensive training significantly speeds up search times (e.g., Fisher, 1986).

Czerwinski, Lightfoot and Shiffrin (1992) suggested that a perceptual change called perceptual unitization could explain training effects in visual search. Perceptual unitization produces perceptual features from a set of more elementary components. These new features speed up visual search because they recode input objects in a more efficient feature repertoire--a repertoire tailored to the specifics of the search task.

Our feature creation theory has both similarities and differences with unitization and chunking theory. It is similar in that visual search may be framed as a categorization task of distracters and targets. Chunking can then be viewed as a context-dependent process influenced by the contrasts and similarities between targets and distracters. This reformulation of visual search emphasizes functional constraints that the chunking process must satisfy; units will be formed that allow members of the target category to be distinguished from distracters. It also allows specific predictions to be derived. For example, in Fisher's (1986) and Czerwinski et al.'s (1992) models, chunked features could represent any subpart of the capital letters, the subpart that reliably unifies and distinguishes the categories. Perceptual chunking is probably an important mechanism of feature creation. However, we believe that the principles governing chunking cannot be fully understood without the notion of category contrasts and similarities.

The influences of category contrasts and similarities on the segmentation of objects were specifically studied in Pevtzow and Goldstone (1994). Stick figures composed of six lines were categorized in one of two ways. Different arbitrary combinations of three contiguous lines were diagnostic for the different categorizations. After categorization training, subjects participated in part/whole judgments, indicating whether a particular set of three lines (a part) was present in a whole stick figure. Subjects were significantly faster in determining that a part was present in a whole when the part was previously diagnostic during categorization. The part/whole judgment task is arguably the most perceptually based task used by Palmer (1977) to explore the "naturalness" of a way of segmenting an object into parts. Although Palmer's model bases the naturalness of a particular segmentation on properties of the object (e.g. the proximities, similarities, and shapes of the line segments), the above results indicate that the subjects' experience also influences how they will segment an object into parts.

The differences between unitization and functional feature creation are mostly consequences of using discrete vs. continuous stimuli. As its name indicates, unitization requires the stimuli to be discretized before being unitized. However, it is frequently difficult to assess exactly what discretization the visual system initially applies to a stimulus before unitization occurs. Czerwinski et al.'s stimuli and Palmer's stick figures are designed to bias processing according to a particular discretization: line segments (but the authors acknowledge that they can only hope for this segmentation). These stimuli could give the impression that our perceptual systems initially segment the environment into little line segments and then construct complex task-dependent representations by unitization. However, the varieties of recognition tasks we face make it very likely that there is no single scale of representation.

Many psychophysical and computational models are converging on the observation that perception operates simultaneously at multiple spatial scales and that the coarser scales are often sufficient for effective processing of complex pictures (e.g., Burt & Adelson, 1983; De Valois & De Valois, 1990; Marr, 1982; Schyns & Oliva, 1994; Watt, 1987; Witkin, 1986). Multi scale representations suggest that the input stimuli are discretized at different scales, possibly using scale-specific feature repertoires. If line segments may serve as the discrete elements at the finer spatial scales (though even here there are serious difficulties), "blobs" or other image measurements are more appropriate for discretizing the coarser scales. A conjunction of high resolution edges often maps onto a single coarse-scale blob, suggesting that the input signal could initially be parsed into large components that do not result from fine scale unitizations. Hence, efficient parsings of real-world stimuli could initially operate with the scale-specific primitives closely corresponding to the relevant events of the input signal (e.g., Oliva & Schyns, 1995). These scale-specific primitives should be adjustable to scale-specific shapes and should therefore be sensitive to task contingencies. Scale-specific vocabularies could arise by applying our proposal for learning new features to the spatial scales made available by perception.

In summary, although chunking is probably an important mechanism for creating new perceptual features, we think there are alternatives. Chunking applies only to a priori discretized stimuli, but evidence suggests that stimuli are not unequivocally discretized into their smallest structures (or for that matter into a single, preferred scale). Large features may be registered without being composed out of smaller features, and small features may sometimes be created by decomposing larger features.

3.2. Constructive Induction

The idea of creating new featural descriptions has been a direct concern of a branch of machine learning called constructive induction (Matheus, 1991; Michalski, 1983). In constructive induction, new features are created by applying inductive operators to the existing set of features. For example, objects that belong to a category may originally be described as 74, 78 or 71 cm tall. With the "close interval" operator, a single new feature "any height between 70 and 80 cm" may be created. Generally, the operators that have been considered have been highly symbolic, including logical operators like "and" and "or," and hierarchical relations between category classes. As an example of the latter type of operator, a playing card that was originally represented as "diamond" may be recoded as "red" if the system knows that diamonds are red.

Hofstader and his colleagues (French & Hofstader, 1991; Mitchell, 1993) have also been concerned with computational systems that create new descriptions for input patterns. For example, Mitchell and Hofstader's Copycat system, when processing the letter sequence "PPQRR," may develop either the description "P, followed by the series PQR, followed by R" or the description "a Q in the middle, flanked by a pair of Ps on the left and a pair of Rs on the right." The description that emerges will depend upon the other developing structures. Copycat creates new descriptions by establishing groups of related letters, and by relating these groups.

Wisniewski and Medin (1994) have recently provided evidence that people alter their verbal descriptions of objects to fit the category labels provided (see also Medin, Goldstone & Gentner, 1993). The same figure in a child's drawing may be interpreted as a tie or buttons, depending on how the drawing is labeled. The authors argue that new descriptions are created when links are established between abstract background knowledge (e.g. "creative children should show more detail in their drawings") and concrete object information.

Our proposal for learning new features is consistent with the above proposals. Although many of the ideas are similar, our stress is different in several respects. We have emphasized that relatively raw stimulus properties need to be preserved for new features to be created. As argued earlier, if distilled, symbolic representations are used to create new features then there will be severe limitations on what new object features are possible. Such is the case with typical constructive induction systems. Although they can produce an infinite number of new features by successive application of inductive operators, the new features are highly constrained by the object interpretation made by the primitive symbolic features. Both the original features and the new features in constructive induction algorithms are discrete symbols that are the product of an object interpretation process. Far greater flexibility in feature creation can be achieved by beginning with object representations in terms of raw features that have not undergone interpretation. The representation should be raw enough so that both symbolic interpretation of an "X" ("two crossing diagonal lines" and "a 'v' and an upside-down 'v' just touching") can be generated (McGraw, Rehling, & Goldstone, 1994). Harnad (1990) has made a similar point with respect to the need for grounding symbols in terms of representations that are non- symbolic.

By stressing the importance of early perceptual representations that implicitly preserve distal object properties, our approach to feature creation also stresses perceptual constraints on feature extraction. Whereas constructive induction techniques can create arbitrarily complex features, features that are generated by humans are constrained by perceptual factors such as topology, spatial proximity and global coherence. Thus, features that are generated by standard constructive induction techniques may be improperly constrained in opposing ways. They may be too constrained by the initial symbolic representations, and they may not be sufficiently constrained by properties of our actual perceptual systems.

Another difference is that we have stressed the perceptual changes that accompany feature creation. In standard feature creation techniques, new features are added to the system's repertoire, but there is little reason to suggest that the new features alter the appearance of the described objects. Rather, they alter the properties that will be inferred about the objects. There is a difference in immediacy between seeing and inferring that an object might be expressed in terms of a particular feature. The psychological evidence that we have reviewed suggests that the immediate appearance of objects (e.g., their discriminability and apparent organization) is altered by experience. Mitchell and Hofstader's letter series may provide an intermediate case (see also Chalmers, French, & Hofstadter, 1992). When people interpret "PPQRR" in a particular way it may be a cognitive inference, an immediate perceptual phenomenon, or something in between. The same ambiguity seems to exist for the high-level features (e.g. forks, traps, and pawn support structures) that are used by chess experts but not novices (De Groot, 1965).

In sum, work in constructive induction is certainly relevant to the present theory of feature creation. Our approach differs from much of this work in focusing on the perceptual constraints and consequences of feature creation, and the importance of beginning with relatively raw object representations for developing novel interpretations of an object.

3.3. Developmental constraints on object feature extraction

There are in principle an infinite number of ways to represent real-world objects with features. This poses a serious problem for developmental psychologists who must explain how children acquire a particular featural object description from a limited data set. Similarly, in acquiring a new word meaning, children are "faced with an infinite set of possibilities about what a novel word might mean" (Markman, 1995, p.199; see also Landau, 1994; Markman, 1989; Quine, 1960; Jones & Smith, 1993). To reduce the indeterminacy of featural representations, it has been proposed that young learners come equipped with biases towards particular properties of stimuli that increase the speed and accuracy of learning (Landau, 1994; Markman, 1995; Eimas, 1994). These biases are of two sorts: theories and beliefs about objects in the real world, and perceptual structures and processes. We discuss how these biases constrain the development of functional features, and we argue that they must be supplemented by categorization constraints.

3.3.1. The role of theories in object parsing.

According to an influential account of conceptual development, new features and concepts are direct consequences of the development of theories--i.e., naive mental explanations of phenomena (Carey, 1985; Gelman, 1988; Keil, 1989; Murphy & Medin, 1985). Perceptual features (e.g., body_shape, length_of_legs, number_of_legs, and so forth) lie at the periphery of concepts whereas our theories about the causes of category membership (e.g., a genetic code) are at the core of conceptual organization. It has been suggested that the conceptual core exists prior to experience with the world and that it could bias the features young infants notice in objects (Spelke, 1994; Carey, 1985).

There are two different views on the development of theories, discontinuous and continuous. In the discontinuous view, the conceptual core develops through a differentiation process: New explanatory constructs (concepts and features) result from the differentiation of the existing constructs of an earlier theory (Carey, 1991). Children's theories may be incommensurable with corresponding theories in adults (Carey, 1985, 1991; Keil, 1989; Smith, Carey, & Wiser, 1985).

Spelke (1994) suggests that , contrary to the discontinuous view of theory development, there is continuity with respect to theories used during conceptual development. For Spelke, there is an innate constant core at the center of the (intuitive, naive) knowledge later used by older children and adults. Spelke argues that the constant core consists of general constraints that govern the way children perceive and reason about objects in different domains. As Spelke (1994, p.439) puts it: "learning systems require perceptual systems that parse the world appropriately." Among other constraints, Spelke suggests that an innate cohesion principle biases children to group parts that move together into single objects (see also Eimas, 1994, for a related point of view). This principle could facilitate the parsing of objects from their background and could bootstrap category learning.

In a continuous or discontinuous view of development, innate knowledge is important because it reduces the indeterminacy of featural descriptions to those dictated by pre-existing theories. In general, however, there is a conceptual difficulty with the idea that (innate) theoretical knowledge constrains perceptual information: Going from theories to predict perceptual data is underconstrained. To illustrate, if a categorizer is instructed that a set of objects with an unknown complex structure is a set of hammers, an existing theory of hammer would list the components representing these objects in memory. However, unless the theory also specifies all possible perceptual appearances of these components, a segmentation procedure would still have difficulties locating the actual parts in a new object: The perceptual realization of the parts depends on the new stimulus itself. This problem is analogous to the symbol grounding problem (Harnad, 1990).

Thibaut (1994) has recently investigated mappings of theories on perceptual features. In a feature circling task, subjects were instructed to parse the stimuli of a category of unfamiliar objects that displayed the same overall shape and structure (see Figure 2, picture e). All subjects were given a category name so that the corresponding general knowledge could assist their segmentations. When asked to name the segmented parts, subjects did not use the same name (e.g., the same part could be called "head", "leg", or "body" by different subjects). Thus, even when a theory provides a listing of the parts to be searched, the assignment of each part to a perceptual structure is not completely constrained by theories (Thibaut & Schyns, 1995).

3.3.2. The early role of perception in object parsing

. Theories are one source of constraints to reduce the perceptual indeterminacy of stimuli. However, it has recently been suggested that perception also biases children predispositions towards objects. Experimental evidence has revealed that category inductions are guided by a bias for the shape of objects (see Jones & Smith, 1993; Landau, 1994, for reviews of the relevant data). In a typical design, children are presented with a novel three-dimensional object with a novel name (a count noun). Children are then asked which objects (of a set of objects that have, or do not have, the same shape, texture, and size) should be called by the same name. Their performance is compared with that of children who are simply asked to select objects that are like the novel object, with no name provided. Converging evidence suggests that children generalize from object names on the basis of shape and neglect large differences in other object properties. This bias appears to develop until the age of two; later, the shape bias predominates only when children are given a count noun (see Jones & Smith, 1993).

The shape bias is intended to reduce some of the indeterminacy of category induction. However, since complex shapes are decomposable into many different sets of components, a bias towards shape is only a first necessary step. Other constraints are required to guide the decomposition of a particular shape into its features. In other words, it remains to be explained how children learn to decompose a set of objects into their relevant object features. Such an explanation of parsing could extend the shape bias to specifying precisely which aspects of shape attract attention (and therefore bias segmentation) at different stages of development. It is conceivable that early biases for shape are later superseded by biases resulting from experience with particular object categories. As argued earlier, segmentation routines for different categories of geometrical objects (e.g., continuous vs. discontinuous surfaces) could develop and become more adept at making the fine segmentations required by conceptual expertise.

Thibaut (1995) explored the development of segmentation skills in different age groups. Adults, and children aged 4 and 6, were instructed to learn a category of unknown stimuli and were later tested on the parsing of its exemplars. The stimuli shared a global shape and were composed of a common set of shape features that varied slightly across exemplars (see Figure 2, picture e). Children's parsings were highly inconsistent compared to those of adults. For example, although component parts kept the same relative locations across exemplars, children's parsings often violated topological coherence. They changed the location of the same part across exemplars, and the number of segmentations was not constant across stimuli. Together, these inconsistencies stress that when children attend to shape, they can be biased toward local similarities between shape aspects at the expense of a consistent integration of shape aspects across instances. Consequently, the new shape features that children isolate could be structurally different from those that adults extract from identical materials.

This has important implications for category learning. Children's biases towards locally salient properties could impede, or even prevent, their learning of new categories, when these are defined by features that are comparatively less salient. Recent evidence in Thibaut (1995) showed that 6 year old children could not learn a simple categorization (a first category defined by the perceptual cue "a-group-of-three-legs-plus-one" and a second category defined by "two-groups-of-two-legs") when the size and orientations of the legs that were irrelevant for categorization varied across exemplars. However, children of the same age experiencing the categories without variations across exemplars had no difficulty learning the categories. These results emphasize the interaction between the development of a feature repertoire and specific perceptual biases. Over the course of conceptual development, children must learn to neglect irrelevant features of the stimuli when they learn new categorizations. The processing differences that could explain the determinants of children's object segmentations should be an important area of future research.

In summary, we have presented theories and perceptual biases as possible predispositions of children towards specific object properties. These biases were not sufficiently specific to predict the actual segmentation of an object belonging to a category. The structure of the categorization problem itself could be an important constraint on the featural descriptions of objects, but it remains to be explained exactly how young children utilize this structure to discover relevant object features.

3.4. Formal Models of Feature Extraction

The understanding of the mechanisms and biases constraining the discovery of relevant structures in data is not only the province of developmental psychologists. For decades, mathematicians and statisticians have been confronted with the issue of structure as the following quote from a textbook on morphology illustrates (Serra, 1982, 57-58): "The universe of all possible object shapes is vast, even when it is reduced to equivalence classes... There is therefore a huge offering of potential structuring elements. Thus, analyzing the same object X by two dissimilar structuring elements results in two profoundly different pieces of information on its geometric structure... only the interaction of X with structuring element B has an objective meaning." Formal techniques for finding relevant structures in data could provide useful analogies for theories of feature creation.

Mathematically, an object is often expressed as an n- dimensional feature vector. Each component of the vector encodes the presence vs. absence, or the values of the n attributes describing the object (e.g., its parts, their shapes, colors and textures). Geometrically, different points in n-dimensional space encode different objects, and categories of similar objects form clouds of points. There are many ways to encode objects, ranging from the raw pixel intensities of digitized pictures, to sophisticated properties that are known to be diagnostic for classification--e.g., number_of_legs, has_wings, has_fur, has_feathers, and hibernates. Although the latter representation would describe animals in an appropriate feature space, pixel-arrays would require extensive processing before diagnostic properties were captured. Our proposal for functional feature creation concerns the extraction of new structures from perceptual data. How could has_feathers be discovered from a training set of pixel arrays, or similarly unstructured representations?

3.4.1. Properties of high dimensional spaces and the bias/variance dilemma.

Many models of concept learning have successfully shown that category representations can be learned from exemplars when they are composed of a small, prespecified feature set (e.g., Gluck & Bower, 1988; Krushke, 1992; Rumelhart, Williams & Hinton, 1986; Widrow & Hoff, 1960); the task is not to discover the feature set from high-dimensional raw data. However, it could be argued that the discovery of features from such high-dimensional spaces is not substantially different from standard mechanisms of category learning. Both concern the extraction of task-dependent invariants. Standard concept learning models operating in low-dimensional spaces could simply be scaled up to operate in high-dimensional spaces.

One of the problems with this idea is that high-dimensional spaces are mostly empty. To illustrate, imagine discretizing a line, a squared plane, a cube and a hypercube with tiles of equal size (e.g., 10 tiles per side). There is a geometric increase (in this example, 101, 102, 103, 104) in the number of tiles that cover the objects. If each tile is represented by an n-dimensional data point, the example shows that one needs approximately 10n tiles to cover an n- dimensional space. If the input distribution varies along many degrees of freedom, a learning problem in high-dimensional space may require an unrealistically large training set to discover robust features, even if an asymptotic solution exists in principle.

This curse of dimensionality (Bellman, 1961) imposes severe limitations on the idea of directly applying simple supervised categorization models to discover perceptual features. Typical concept learning models learn category decision boundaries from a set of pairings of exemplars and their respective category labels. Formally, this consists of finding a function f which successfully approximates the desired category name y from an input x. Often, f is chosen so as to minimize a cost function. Popular concept learning networks minimize the sum of the square of the error between the estimated and the desired category labels (e.g., Rumelhart, Williams & Hinton, 1986; Widrow & Hoff, 1960).

Generally speaking, "error-based" categorization models such as backpropagation are nonparametric statistical models (Geman, Bienenstock & Doursat, 1992). They are nonparametric because the networks are not biased to particular classes of solutions. Instead, the architectures are unbiased so as to flexibly discover structures from data. Mathematical analysis has shown that the error term (specifically the expected mean square error) of these networks can be algebraically decomposed into a bias and a variance term (see Geman et al., 1992, p. 9-10). These two terms summarize the bias/variance dilemma (Geman et al., 1992). Networks make a bias error when they are dedicated to a class of solutions that is not appropriate for the categorizations at hand. Such networks may be too rigid, and flexibility (low bias) would be needed to extract task-specific features. However, low bias comes at the cost of high variance, the second component of the error (where variance means the discrepancy between the correct categorization and the categorization of the network). There is high variance because a flexible system is too sensitive to the data: It learns many idiosyncrasies of the exemplars (e.g., differences in lighting conditions, rotation in depth, translation in the plane, and so forth) before learning the invariants of a category. Consequently, experience with many exemplars is necessary for the network to "forget" idiosyncrasies and learn relevant abstractions. Only with great experience is the system able to categorize accurately (keep the variance low). The curse of dimensionality is such that unbiased machines designed so as to flexibly discover many types of new perceptual features will often need implausibly large training sets to achieve good categorizations. Note that this problem does not greatly affect fixed feature models which usually operate in smaller spaces for which sufficient exemplars can be generated. The bias/variance dilemma addresses practical computability, not principled limitations.

An ideally flexible system should be constructed so as to keep bias and variance low, using a reasonable training set. The bias/variance dilemma is somewhat analogous to the contrast between structured and unstructured features discussed earlier. By analogy, fixed sets of structured features make it difficult to learn new categorizations (and therefore raise the bias error). In contrast, unstructured systems will tend to capture irrelevant aspects of the input set that have little relation to the actual basis of categorization (and therefore raise the variance).

3.4.2. Dimensionality reduction

Complex supervised categorization problems in high- dimensional spaces would be simplified if it were possible to reduce the dimensionality of the input. Several linear and nonlinear dimensionality reduction techniques have been designed to achieve this goal. Underlying dimensionality reduction is the idea that information processing is divided into two distinct stages. A first stage constructs a representation of the environment and a second stage uses this representation for higher-level cognition such as categorization and object recognition. It is hoped that the constructed representation in a smaller dimensional space is more useful than the raw input representation.

To illustrate, consider the popular technique called Principal Components Analysis. If redundancies exist in the input data, there should be fewer sources of variation than there are dimensions (i.e., p << n). PCA finds the first k orthogonal directions of highest variation in a data set. If each input vector of a high-dimensional space is recoded in terms of a linear combination of the first k sources of variation, the intrinsic structure of the data will be preserved to a first approximation (see Oja, 1982; Sanger, 1989). In general, however, the featural interpretation of principal components is often difficult because orthogonal directions of highest variance have little connection to the best projections for categorization. That is, there are no psychological constraints on the principal components. Principal components need not be spatially or topologically coherent (perceptual constraints), or summarized by a single explanation (conceptual constraints).

Other dimensionality reduction techniques aim at reproducing the intrinsic structure of the input space. Examples of these range from Shepard's (1957) early Multi Dimensional Scaling and Sammon's nonlinear Mapping (1969), to more recent Kohonen maps (Kohonen, 1984) and a promising extension to Kohonen Maps called Curvilinear Component Analysis (Demartines, 1994). These algorithms project an n dimensional space on a smaller p dimensional space, while keeping most of the information about the organization of the input space. To illustrate, consider two distinct clouds of points forming two categories in a "high"-dimensional space composed of four dimensions (n = 4). Assume further that exemplars of the first category are identical on two dimensions, while exemplars of the second category have only one dimension in common. Whereas the points of the first cloud lie on a plane (p = 2), the points of the other category have three degrees of freedom (p = 3) and therefore are in three-dimensional space. This simple example illustrates that data sets may have local distributions with different intrinsic dimensions of variation (in the example, 2 and 3). Projections of high-dimensional inputs onto lower-dimensional spaces should account for these intrinsic characteristics if they want to preserve the important degrees of freedom of the distribution. Unfortunately, techniques for discovering the intrinsic dimensionality of a data distribution are also plagued by high dimensionality. The number of data points necessary to reliably estimate the structure of a distribution may be enormous if the intrinsic structure is high. Dimensionality reduction techniques also need to give up generality for biases, at the expense of possibly missing "important" structures in the data. Nevertheless, the existence of low-dimensional somatosensory maps in cortex clearly demonstrates that brain structures are particularly adept at reducing high-dimensional inputs to lower-dimensional representations (see Kaas, 1995, for a review). Furthermore, there is now growing support for the notion that these natural processes of dimensionality reduction are flexible, allowing different types of reorganizations of cortical maps following different forms of sensory deprivation (Kaas, 1995).

In analogy to the functional (re)organization of somatosensory maps, we would like the formal definition of "important lower-dimensional structures" to be closer to the categorization task the system needs to solve. Recent approaches to dimensionality reduction have incorporated measures of "feature goodness" in the algorithm for determining good dimensions of recoding. For example, Intrator (1994; Intrator and Gold, 1993) discusses a technique in which input data are projected onto dimensions that have many distinct clusters of data points (multimodal distributions). This unsupervised technique is more likely to discover dimensions useful for distinguishing categories under the assumption that different categories produces clusters within the data. Intrator (1994) reports that his technique worked on stimuli with 3969 and 5500 dimensions and that few training data were necessary for extracting robust features. This and related techniques based on projection pursuit (Friedman & Stueltze, 1981) provide methods with interesting biases for exploring high- dimensional data spaces.

In the reviewed dimensionality reduction techniques, the feature extraction stage operates independently of higher-level processes; thus there is no guarantee that the extracted features will be useful for higher-level processes (Mozer, 1994). The functionality principle suggests that the categorizations being learned should influence the features that are extracted. In other words, top- down information should constrain the search for relevant dimensions/features of categorization. Thus, we believe the serial process of (1) projecting high dimension space onto a new lower dimension space, then (2) determining categorization with new dimensions, will have to be modified such that the second process informs the first (see also Intrator, 1993). However, computational considerations make it likely that different aspects of perceptual feature extraction need strong biases that do not trivialize the categorization problem (as fixed features often do), but that are sufficiently constraining to allow the learning of general features from a reasonable number of data points (a similar opinion is defended in Anderson & Rosenfeld, 1988; Geman et al., 1992; Shepard, 1989; among others). It is conceivable, for example, that different constraints will be needed to model the categorization of intrinsically different object classes such as faces, man-made vs. natural objects and textures, natural and artificial scenes, and so forth. The empirical study of these psychological constraints and biases should explicitly account for the reported interactions of categorization and perception, even if they significantly complicate the problem.

4. CONCLUSIONS

The function of a feature is to detect and internally represent commonalties between members of the same category, and differences between categories. Either people come equipped with a complete set of features that account for all present and future categorizations, or, working backwards, people sometimes create new features to represent new categorizations. We argued for an approach in which people create features in order to subserve the categorization and representation of objects. We presented psychological evidence and theoretical arguments for the necessity of flexible features in object categorization theories. Flexible features allow the learning of new but perceptually constrained features when new categorizations must be represented. Thus, given an appropriate history of categorization, a learned set of features may be equivalent, but not limited to, proposed sets of fixed features. As new features are created to detect and represent new categorical contrasts and similarities, a learned set permits an efficient fit between categorization demands and the feature repertoire, which should then be free of useless features. Flexible features are inherently linked to categorization tasks and therefore reduce the need for complex categorization rules by providing efficient representations. In addition, advantages can be accrued by decomposing features into subfeatures, without representing all possible decompositions of a holistic feature a priori. In our view, there is little difference between concepts and features: Someone's unitary concept maybe someone else's decomposable structure, depending on the individuals' histories of categorization.

Experimental materials are more likely to promote feature creation when they are not designed with a priori diagnostic features, leading to obvious feature decompositions. These alternative materials (see Figure 2 and 3) (1) do not limit the information that can be obtained from the input, (2) have many distinct intrinsic structures, (3) are not exhausted by their symbolic descriptions such as "has-legs," "square," "circle," and so forth. In short, alternative stimuli evoke a representation of their structure in a raw, analog form, in a form allowing the possibility of a stimulus reinterpretation if new categorizations require such a reinterpretation.

In our view, two types of category learning should be distinguished. Fixed space category learning occurs when new categorizations are representable with the available feature set. Flexible space category learning occurs when new categorizations are not representable with the available features. Whether fixed or augmented learning occurs depends on the requirements of a particular categorization task. That is, it depends on the featural contrasts and similarities between the new category to be represented and the individual's concepts in memory. Fixed feature approaches face one of two problems when they are confronted with tasks that require new features. If the fixed features are fairly high-level and directly useful for categorizations (such as Biederman's geons), then they will have insufficient flexibility to represent all objects that may be relevant for a new task. If the fixed features are small, subsymbolic fragments (such as pixels), then regularities at the level of functional features, regularities that are required to predict categorizations, will not be captured by these primitives.

Flexible features and the perceptual learning they allow have important similarities, differences and implications for various fields of cognitive science. Perceptual unitization similarly implies that recoding proximal stimuli with new features affects the perceptual appearance of the distal object. However, unitization assumes that stimuli are initially analyzed into components before being unitized, whereas evidence suggests that stimuli are not unequivocally discretized into their smallest structures (or for that matter into a single, preferred scale). Functional constraints influence the scale of discretization. The field of constructive induction in artificial intelligence is concerned with creating new object descriptions to assist categorization. In many cases, the new descriptions are simple symbolic transformations of existing symbolic descriptions. Instead, we have stressed the need to create object features from relatively raw, unprocessed, perceptual representations, and to create new features by incorporating perceptual rather than purely formal constraints. Developmental biases (both theory-based and perceptual) that could constrain feature extraction were reviewed. We argued that neither the shape bias nor a priori theories were sufficiently constraining to predict the actual perceptual features that are discovered in objects. These features are also provided by the structuring role of learned categories. Formal analogies with the principles we discuss are found in statistical techniques of dimensionality reduction and their network implementations. These techniques also attempt to reduce what is initially a high dimensional categorization space to a lower dimensional space representing important features. Supervised learning is closer to the principles we discuss since it explicitly provides feedback to constrain the search for categorization features. However, it needs to be properly constrained to be practically feasible. We believe properly constrained dimensionality reduction techniques (techniques constrained by perceptual and categorical factors) come closest to the principles we discuss.

5. ACKNOWLEDGEMENTS

This work was funded in part by an NSERC Grant awarded to Philippe Schyns and by National Science Foundation grant SBR-9409232 awarded to Robert Goldstone

6. REFERENCES

Aha, D. W., and Goldstone, R. L. (1992). Concept learning and flexible weighting. Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society. (pp. 534-539). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Algom, D. (1992). Psychophysical approaches to cognition. Amsterdam: North Holland.

Anderson, J. A. & Rosenfeld E. (1988). Neurocomputing, Foundations of Research. MIT Press, Cambridge, MA.

Andrews, J., Livingston, K., Harnad, S., & Fisher, U. (in press). Learned categorical perception in human subjects: Implications for symbol grounding.

Barlow, H. B. (1980). The absolute efficiency of perceptual decisions. Proceeding of the Royal Society of London: Biology, 290, 71-82.

Barlow, H. B., & Reeves, B. C. (1979). The versatility and absolute efficiency of detecting mirror symmetry in random dot displays. Vision Research, 19, 783-793.

Bellman, R. E. (1961). Adaptive Control Processes. Princeton University Press, Princeton, NJ.

Biederman, I. (1987). Recognition-by-components : a theory of human image understanding. Psychological Review, 94, 115-147.

Brigham, J.C. (1986). The influence of race on face recognition. In H.D. Ellis, M.A. Jeeves, F. Newcombe, & A.W. Young (Eds.), Aspects of face processing, Dordrecht: Martinus Nijhoff.

Braunstein, M. L., Hoffman, D. D., & Saidpour, A. (1989). Parts of visual objects: An experimental test of the minima rule. Perception, 18, 817-826.

Bruner, J. S., Goodnow, J. J., & Austin, G. A. (1956). A study of thinking. New York: Wiley.

Burns, B., & Shepp, B. E. (1988). Dimensional interactions and the structure of psychological space: The representation of hue, saturation, and brightness. Perception and Psychophysics, 43, 494-507.

Burt, P., & Adelson, E. H. (1983). The Laplacian pyramid as a compact image code. IEEE Transactions on Communications, 31, 532-540.

Carey, S. (1985). Conceptual change in childhood. Cambridge, MA: MIT Press.

Carey, S. (1991). Knowledge acquisition: enrichment or conceptual change? In S. Carey & R. Gelman (Eds.). The epigenesis of mind. Hillsdale, NJ: Lawrence Erlbaum.

Chalmers, D. J., French, R. M. & Hofstader, D. R. (1992). High- level perception, representation and analogy. Journal of Experimental and Theoritical Artificial Intelligence, 4, 185-211.

Chapman, K. L., Leonard, L. B. & Mervis, C. B. (1986). The effect of feedback on young children's inappropriate word usage. Journal of Child Language, 13, 101-107

Clark, E. V. (1973). What's in a word? On the child's acquisition of semantics in his first language. In T.E. Moore (Ed.), Cognitive development and the acquisition of language (pp. 65-110). New York: Academic Press.

Czerwinski, M., Lightfoot, N., & Shiffrin, R. M. (1992). Automatization and training in visual search. American Journal of Psychology, 105, 271-315.

De Groot, A. D. (1965). Thought and choice in chess. The Hague: Mouton.

De Valois, R. L., & De Valois, K. K. (1990). Spatial Vision. Oxford University Press: New York.

Delk, J. L., & Fillenbaum, S. (1965). Differences in perceived color as a function of characteristic color. American Journal of Psychology, 78, 290-293.

Edelman, S., & Blthoff, H. H. (1992). Orientation dependence in the recognition of familiar and novel views of three-dimensional objects. Vision Research, 32, 2385-2400.

Eimas, P. (1994). Categorization in early infancy and the continuity of development. Cognition, 50, 83-93.

Elio, R., & Anderson, J. R. (1981). The effects of category generalizations and instance similarity on schema abstraction. Journal of Experimental Psychology : Human Learning and Memory, 7, 397-417.

Fisher, D. L. (1986). Hierarchical models of visual search: Serial and parallel processing. Paper presented at the meeting of the Society for Mathematical Psychology, Cambridge, MA.

Foard, C.F., & Kemler Nelson, D.G. (1984). Holistic and analytic modes of processing: The multiple determinants of perceptual analysis. Journal of Experimental Psychology: General, 113, 94- 111.

Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture : a critical analysis. Cognition, 28, 3-71.

French, R. M., & Hofstadter, D. (1991). Tabletop: An emergent, stochastic model of analogy-making. In Proceedings of the Thirteenth Annual Cognitive Science Society Conference, 708- 713. Hillsdale, NJ: Lawrence Erlbaum Associates.

Garner, W. R. (1974). The processing of information and structure. Potomac, MD: Lawrence Erlbaum.

Gelman, S.A. (1988). The development of induction within natural kind and artifact categories. Cognition, 20, 65-95

Geman, S., Bienestock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4, 1-58.

Gibson, E. J. (1969). Principles of perceptual learning and development. New York: Appleton-Century-Crofts.

Gibson, E. J. (1971). Perceptual learning and the theory of word perception. Cognitive Psychology, 2, 351-368.

Gibson, E. J. (1991). An odyssey in learning and perception. MIT Press: Cambridge.

Gibson, E. J., & Walk, R. D. (1956). The effect of prolonged exposure to visually presented patterns on learning to discriminate them. Journal of Comparative and Physiological Psychology, 49, 239-242.

Gluck, M. A., & Bower, G. H. (1988). Evaluating an adaptive network model of human learning. Journal of Memory and Language, 27, 166-195.

Goldstone, R. L. (1994a). Influences of categorization on perceptual discrimination. Journal of Experimental Psychology: General, 123, 178-200.

Goldstone, R. L. (1994b). The role of similarity in categorization: Providing a groundwork. Cognition, 52, 125-157.

Goldstone, R. L. (1995). Effects of categorization on color perception. Psychological Science, 6, 298-304.

Goodman, N. (1965). Fact, Fiction, and Forecast. 2d ed. Indianapolis: Bobbs-Merrill.

Harnad, S. (1987). Category induction and representation. in S. Harnad (ed.) Categorical perception: The groundwork of cognition. Cambridge: Cambridge University Press (pp. 535- 565).

Harnad, S. (1990). The symbol grounding problem. Physica D, 42, 335-346.

Hill, H., Schyns P. G. & Akamatsu, S. (in press). Information and viewpoint dependence in face recognition. Cognition.

Hinton, G., Williams, K., & Revow, M. (1992). Adaptive elastic models for handprinted character recognition. In Moody, J.,

Hanson, S., and Lippmann, R. (Eds.) Advances in Neural Information Processing Systems, IV, San Mateo, CA. Morgan Kaufmann. 341-376.

Hoffman, D. D. & Richards, W. A. (1984). Parts of recognition. Cognition, 18, 65-96.

Intrator, N. (1993). Combining exploratory projection pursuit and projection pursuit regression. Neural Computation, 5, 443-455.

Intrator, N. (1994). Feature extraction using an unsupervised neural network. Neural Computation, 4, 98-107.

Intrator, N. & Gold, J. (1993). Three dimensional object recognition using an unsupervised BCM network: The usefulness of distinguishing features. Neural Computation, 5, 61-74.

Jacobson, R. Fant, G., & Halle, M. (1963). Preliminaries to speech analysis : the distinctive features and their correlates. Cambridge, MA: MIT Press.

Jones, S. & Smith, L. (1993). The place of perception in children's concepts. Cognitive Development, 8, 113-139.

Kaas, J. H. (1995). The reorganization of sensory and motor maps in adult mammals. In M. S. Gazzaniga (Ed.), The Cognitive Neurosciences. Cambridge, MA: MIT.

Katz, J. J., & Fodor, J. A. (1963). The structure of a semantic theory. Language, 39, 170-210.

Keil, F.C. (1989). Concepts, kinds and cognitive development. Cambridge, MA: MIT

Kohonen, T. (1984). Self-organization and associative memory. Berlin: Springer-Verlag.

Koza, J. R (1994). Genetic Programming II. Cambridge, MA: MIT Pess.

Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22-44.

Kurbat, M. A. (1995). Structural description theories: is RBC/JIM a general purpose theory of human entry-level object recognition. Perception.

Landau, B. (1994). Object shape, object name, and object kind: representation and developement. In Medin (Ed.). The Psychology of Learning and Motivation, 31, 253-304. Academic Press: San Diego, CA.

Lawrence, D. H. (1949). Acquired distinctiveness of cues: I. Transfer between discriminations on the basis of familiarity with the stimulus. Journal of Experimental Psychology, 39, 770-784.

Markman, E. (1989). Categorization and naming in children. Problems of induction. Cambridge, MA: MIT Press, Bradford Books.

Markman, E. (1995). Constraints on word meaning in early language acquisition. In L. Gleitman & B. Landau (Eds). The acquisition of the lexicon, 199-227. Cambridge, MA: MIT Press.

Marr, D. (1982). Vision. A computational investigation into the human representation and processing of visual information. San Francisco : W.H. Freeman.

Matheus, C. J. (1991). The need for constructive induction. In L. Birnbaum and G. Collins (Eds.) Proceedings of the eighth international workshop on Machine Learning. Morgan Kaufmann: San Mateo, CA.

McGraw, G., Rehling, J., & Goldstone, R. L. (1994). Letter perception: Toward a conceptual approach. Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society. (pp. 613-618). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Medin, D. L. Goldstone, R. L., & Gentner, D. (1993). Respects for similarity. Psychological Review, 100, 254-278.

Michalski, R. S. (1983) A theory and methodology of inductive learning. Artificial Intelligence, 20, 111-161.

Mitchell, M. (1993). Analogy-making as perception. Cambridge: MIT Press.

Moscovici, S., & Personnaz, B. (1991). Studies in social influence: VI. Is Lenin orange or red? Imagery and social influence. European Journal of Social Psychology, 21, 101-118.

Mozer, M. (1994). Computational approaches to functional feature learning. Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society. (pp. 975-976). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Murphy, G.L., & Medin, D. (1985). The role of theories in conceptual coherence. Psychological Review, 92, 289-316.

Murphy, G. L., & Ross, B. (1994). Predictions from uncertain categories. Cognitive Psychology, 27, 148-193.

Nosofsky, R. M. (1987). Attention and learning processes in the identification and categorization of integral stimuli. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 87-108.

Nosofsky, R. M., Palmeri, T. J., & McKinley, S. C. (1994). Rule- plus-exception mode of classification learning. Psychological Review, 101, 53-79.

Oja, E. (1982). A simplified neuron model as a principal components analyzer. Journal of Mathematical Biology, 15, 267- 273.

Oliva, A. & Schyns, P. G. (1995). Mandatory scale perception promotes flexible scene categorization. Proceedings of the XVII Meeting of the Cognitive Science Society, 159-163, Lawrence Erlbaum: Hilldsale, NJ.

Palmer, S. (1977). Hierarchical structure in perceptual representation. Cognitive Psychology, 9, 441-474.

Palmer, S., Rosch, E., & Chase, P. (1981). Canonical perspective and the perception of objects. In J. Long & A. Baddeley (Eds.), Attention and Performances IX. Hillsdale, NJ: Lawrence Erlbaum.

Peterson, M. A., & Gibson, B. S. (1994). Must figure-ground organization precede object recognition? An assumption in peril. Psychological Science, 5, 253-259.

Pevtzow, R., & Goldstone, R. L. (1994). Categorization and the parsing of objects. Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society. (pp. 717-722). Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Poggio, T., & Edelman, S. (1990). A network that learns to recognize three-dimensional objects. Nature, 343, 263-266.

Quine, W. (1960). Word and object. Cambridge, MA: MIT Press.

Rodet, L. & Schyns, P. G. (1994). Learning features of representation in conceptual context. Proceedings of the XVI Meeting of the Cognitive Science Society, 766-771, Lawrence Erlbaum: Hilldsale, NJ.

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In J. L.

McClelland & D. E. Rumelhart (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge, MA: Bradford Books.

Sammon, J. W. (1969). A nonlinear mapping algorithm for data structure analysis. IEEE Transactions on Computers, C-18, 401- 409.

Sanger, T. D. (1989). Principal components, minor components, and linear neural networks. Neural Neworks, 2, 459-473.

Schank, R. (1972). Conceptual dependency : a theory of natural language understanding. Cognitive Psychology, 3, 552-631.

Schyns, P. G. (1991). A neural network model of conceptual development. Cognitive Science, 15, 461-508.

Schyns, P. G., & Murphy, G. L. (1991). The ontogeny of units in object categories. Proceeding of the XIII Meeting of the Cognitive Science Society, 197-202, Lawrence Erlbaum: Hilldsale, NJ.

Schyns, P. G., & Murphy, G. L. (1994). The ontogeny of part representation in object concepts. In Medin (Ed.). The Psychology of Learning and Motivation, 31, 305-354. Academic Press: San Diego, CA.

Schyns, P. G., & Oliva, A. (1994). From blobs to boundary edges: Evidence for time and scale dependent scene recognition. Psychological Science, 5, 195-200.

Schyns, P. G., & Rodet, L. (in press). Categorization creates functional features. Journal of Experimental Psychology: Learning, Memory & Cognition.

Sekuler, A. B., Palmer, S. E., & Flynn, C. (1992). Local and global processes in visual completion. Psychological Science, 5, 260-267.

Selfridge, O. G. (1959). Pandemonium : a paradigm for learning. In Symposium on the mechanization of thought processes. Proceedings of a Symposium held at the National Physical Laboratory, November 1958, Vol. 1. London : H.M.Stationery Office.

Shepard, R. N. (1957). Stimulus and response generalization: a stochastic model relating generalization to distance in psychological space. Psychometrika, 22, 325-345.

Shepard, R. N. (1989). Internal representations of universal regularities: A challenge for connectionism. In Neural Connections, Mental Computation, L. Nadel, L. A. Cooper, P. Culicover, and R. M. Harnish, (Eds.), 104-134. Bradford/MIT Press, Cambridge, MA, London, England.

Serra, J. (1982). Image analysis and mathematical morphology: Vol 1. London: Academic Press.

Smith, C., Carey, S., & Wiser, M. (1985). On differentiation: a case study of the development of the concepts of size, weight and density. Cognition, 21, 177-237.

Smith, L. B., & Kemler, D. G. (1978). Levels of experienced dimensionality in children and adults. Cognitive Psychology, 10, 502-532.

Spelke, E. (1994). Initial knowledge: six suggestions. Cognition, 50, 431-445.

Tanaka, J., &Taylor, M. (1991). Object categories and expertise : is the basic level in the eye of the beholder ? Cognitive Psychology, 23, 457-482.

Tarr, M. J., & Pinker, S. (1989). Mental rotation and orientation- dependence in shape recognition. Cognitive Psychology, 21, 233- 282.

Thibaut, J.P. (1994). Role of variation and knowledge on stimuli segmentation: developmental aspects. Paper Presented at the Sixteenth Annual Meeting of the Cognitive Science Society. Atlanta.

Thibaut, J.P. (1995). The development of features in children and adults: the case of visual stimuli. Proceedings of the Seventeenth Annual Meeting of the Cognitive Science Society, 194-199, Hillsdale: N.J., Lawrence Erlbaum.

Thibaut, J.P., & Schyns, P.G. (1995). The development of feature spaces for similarity and categorization. Psychologica Belgica, 35, 167-185.

Treisman, A., & Gelade, G. (1980). A feature-integration theory of attention, Cognitive Psychology, 12, 97-136.

Ullman, S. (1984). Visual routines. Cognition, 18, 97-159.

Ullman, S. (1989). Aligning pictorial descriptions: An approach to object recognition. Cognition, 32, 193-254.

Ward, T. B. (1983). Response tempo and separable-integral responding: evidence for an integral-to-separable processing sequencing in visual perception. Journal of Experimental Psychology : Human Perception and Performance, 9, 103-112.

Watt, R. (1987). Scanning from coarse to fine spatial scales in the human visual system after the onset of a stimulus. Journal of Optical Society of America, A 4, 2006-2021.

Widrow, G. & Hoff, M. E. (1960). Adaptive switching circuits. Institute of radio engineers, western electronic show and convention, convention records, 4, 96-194.

Wisniewski, E. J., & Medin, D. L. (1994). On the interaction of theory and data in concept learning. Cognitive Science, 18, 221- 281.

Witkin, A. (1986). Scale-space filtering. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, 1019- 1022. Los Altos, CA: Morgan Kauffman.