Notes

* The author wishes to thank Jerry Fodor, Ilona Kovacs, Thomas Papathomas, Zoltan Vidnyanszky, as well as Tom Carr and several anonymous BBS referees for their comments and advice. Brian Scholl was especially helpful in carefully reading each draft and providing useful additions. This work was initially supported by a grant from the Alfred P. Sloan Foundation and recently by the Rutgers Center for Cognitive Science.

  1. This thesis is closely related to what Fodor (1983) [see also BBS multiple book review: BBS(8) 1985] has called the "modularity of mind" view and this paper owes much to Fodor's ideas. Because there are several independent notions conflated in the general usage of the term "module", we shall eschew the use of this term to designate cognitively impenetrable systems in this paper.
  2. Although my use of the term "early vision" generally corresponds to common usage, there are exceptions. For example, some people use "early vision" to refer exclusively to processes that occur in primary visual cortex. Our usage is guided by an attempt to distinguish a functionally distinct system, regardless of its neuroanatomy. By placing focal attention outside (and prior to) the early vision system we depart somewhat from the use of the term in neuropsychology.
  3. We sometimes use the term "rational" in speaking of cognitive processes or cognitive influences. This term is meant to indicate that in characterizing such processes we need to refer to what the beliefs are about -to their semantics. The paradigm case of such a process is inference, where the semantic property truth is preserved. But we also count various heuristic reasoning and decision-making strategies (e.g. satisficing, approximating, or even guessing) as rational because, however suboptimal they may be by some normative criterion, they do not transform representations in a semantically arbitrary way: they are in some sense at least quasi-logical. This is the essence of what we mean by cognitive penetration: It is an influence that is coherent or quasi-rational when the meaning of the representation is taken into account.
  4. I use the technical term content (as in "the content of perception") in order to disambiguate two senses of "what you see". "I see a dog" can mean either that the thing I am looking at is a dog, regardless of how it appears to me, or that I see the thing before me as a dog, regardless of what it actually is. The second (opaque) sense of "see" is what I mean when I speak of the content of one's percept.
  5. Bruner (1957) characterized his claim as a "bold assumption" and was careful to avoid claiming that perception and thought were "utterly indistinguishable". In particular, he explicitly recognized that perception "appear(s) to be notably less docile or reversible" than "conceptual inference." This lack of "docility" will play a central role in the present argument for the distinction between perception and cognition.
  6. Note that not all cases of Gestalt-like global effects need to involve top-down processing. A large number of global effects turn out to be computable without top-down processing by arrays of elements working in parallel, with each element having access only to topographically local information (see, for example, the network implementations of such apparently global effects as those involved in stereo fusion (described in Marr, 1982) and apparent motion (Dawson & Pylyshyn, 1986)). Indeed many modern techniques for constraint propagation rely on the convergence of locally-based parallel processes onto global patterns.
  7. An independent system may contain its own proprietary (local) memory - as we assume is the case when recent visual information is stored for brief periods of time or in the case of the natural language lexicon, which many take to be stored inside the language "module" (Fodor, 1983). A proprietary memory is one that is functionally local (as in the case of local variables in a computer program). It may, of course, implemented as a subset of long-term memory.
  8. A number of studies have shown a reliable effect due to the lexical item in which the phoneme is embedded (e.g. Connine & Clifton, 1987; Elman & McClelland, 1988; Samuel, 1996). This is perfectly compatible with the independence of perception thesis since, as pointed out by Fodor, 1983, it is quite likely that the lexicon is stored in a local memory that resides within the language system. Moreover, the network of associations among lexical items can also be part of the local memory since associations established by co-occurrence are quite distinct from knowledge, whose influence, through inference from the sentential context, is both semantically compliant and transitory. Since we are not concerned with the independence of language processing in this essay, this issue will not be raised further.
  9. Of course because it is really the bias for the <Fi; Ci> pair that is being altered, the situation is symmetrical as between the Fis and the Cis so it can also be interpreted as a change in the sensitivity to a particular context in the presence of the phoneme in question - a prediction that may not withstand empirical scrutiny.
  10. We cannot think of this as an "absolute" change in sensitivity to the class of stimuli since it is still relativized not only to the class but also to properties of the perceptual process, including constraints on what properties it can respond to. But it is not relativized to the particular choice of stimuli, or the particular response options with which the subject is provided in an experiment.
  11. A view similar to this has recently been advocated by Barlow (1997). Barlow asks where the knowledge that appears to be used by vision comes from and answers that it may come from one of two places: "…through innately determined structure [of the visual system] and by analysis of the redundancy in sensory messages themselves". We have not discussed the second of these but the idea is consistent with our position in this paper, so long as there are mechanisms in early vision that can exploit the relevant redundancies. The early visual system does undergo changes as a function of statistical properties of its input, including co-occurrence (or correlational) properties, thereby in effect developing redundancy analyzers.
  12. The "rigidity" constraint is not the only constraint operative in motion perception, however. In order to explain the correct perception of "biological motion" (e.g. Johansson, 1950) or the simultaneous motion and deformation of several objects, additional constraints must be brought to bear.
  13. There has been at least one reported case where the usual "natural constraint" of typical direction of lighting, which is known to determine perception of convexity and concavity, appears to be superseded by familiarity of the class of shapes. This is the case of human faces. A concave human mask tends to be perceived as convex in most lighting conditions, even ones that result in spherical shapes changing from appearing concave to appearing convex (Ramachandran, 1990) - a result that leads many people to conclude that having classified the image as that of a face, knowledge over-rides the usual early vision mechanisms. This could indeed be a case of cognitive over-ride. But one should note that faces present a special case. There are many reasons for believing that computing the shape of a face involves special-purpose (perhaps innate) mechanisms (e.g. Bruce, 1991) with a distinct brain locus (Kanwisher, McDermott & Chun, 1997).
  14. The question of what constitutes similarity of appearance is being completely begged in this discussion. We simply assume that something like similar-in-appearance defines an equivalence class that is roughly coextensive with the class of stimuli that receive syntactically similar (i.e., overlapping-code) outputs from the visual system. This much should not be problematic since, as we remarked earlier, the output necessarily induces an equivalence class of stimuli and this is at least in some rough sense a class of "similar" shapes. These classes could well be coextensive with basic-level categories (in the sense of Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976). It also seems reasonable that the shape-classes provided by vision are ones whose names can be learned by ostension - i.e., by pointing, rather than by providing a description or definition. Whether or not the visual system actually parses the world in these ways is an interesting question, but one that is beyond the scope of this essay.
  15. One of the useful consequences of recent work on connectionist architectures has been the recognition that perhaps more cognitive functions than had been expected might be accomplished by table-lookup, rather than by computation. Newell (1990) recognized early on the important tradeoff between computing and storage that a cognitive system has to face. In the case of the early vision system, where speed takes precedence over generality (c.f., Fodor, 1983), this could take the form of storing a forms-table or set of templates in a special internal memory. Indeed, this sort of compiling of a local shape-table may be involved in some perceptual learning and in the acquisition of visual expertise (see also note 7).
  16. Needless to say, not everyone agrees on the precise status of subjective experience in visual science. This is a question that has been discussed with much vigor ever since the study of vision became an empirical science. For a recent revival of this discussion see Pessoa, Thompson & Noë (in press) and the associated commentaries.