Evan Thompson[1]
Department of Philosophy
York University
4700 Keele Street
North York, Ontario
Canada M3J 1P3
evant@yorku.ca
Alva Noë
Department of Philosophy
University of California, Santa Cruz
Santa Cruz, CA 95064
anoe@cats.ucsc.edu
http://www2.ucsc.edu/people/anoe/
Blind spot, bridge locus, brightness, consciousness, filling-in, Gestalt theory, illusory contours, isomorphism, linking propositions, perceptual completion, vision
In visual science the term "filling-in" is used in different ways, often leading to confusion. This target article presents a taxonomy of perceptual completion phenomena to organize and clarify theoretical and empirical discussion. Examples of boundary completion (illusory contours) and featural completion (color, brightness, motion, texture, and depth) are examined, and single-cell studies relevant to filling-in are reviewed and assessed. Filling-in issues have to be understood in relation to theoretical issues about neural-perceptual isomorphism and linking propositions. Six main conclusions are drawn: (1) Visual filling-in comprises a multitude of different perceptual completion phenomena. (2) Certain forms of visual completion seem to involve spatially propagating neural activity (neural filling-in) and so, contrary to Dennett's (1991, 1992) recent discussion of filling-in, cannot be described as results of the brain's "ignoring an absence" or "jumping to a conclusion." (3) In certain cases perceptual completion seems to have measurable effects that depend on neural signals representing a presence rather than ignoring an absence. (4) Neural filling-in does not imply either "analytic isomorphism" or "Cartesian materialism." The notion of the bridge locus--a particular neural stage that forms the immediate substrate of perceptual experience--is problematic and should be abandoned. (5) To reject the representational conception of vision in favor of an "enactive" or "animate" conception reduces the importance of filling-in as a theoretical category in the explanation of vision. (6) The evaluation of perceptual content should not be determined by "subpersonal" considerations about internal processing, but rather by considerations about the task of vision at the level of the animal or person interacting with the world.
Figure 1 illustrates the so-called neon color illusion: a red diamond is seen where there is only a lattice of red line-segments. The term "neon color spreading" (van Tuijl 1975) is often used to describe this phenomenon: the figure seems to result from the color having spread between the line-segments into the background. The color is said to "fill in" the background, thereby forming the figure. Nevertheless, one does not see any spreading or filling-in process; one sees only the figure. What line of reasoning, then, lies behind the description of such figures in the terminology of "filling-in"?
Visual scientists use the terms "filling-in" and "perceptual completion" to refer to situations where subjects report that something is present in a particular region of visual space when it is actually absent from that region, but present in the surrounding area. The idea is easiest to understand in the case of the blind spot. We have a blind spot in each eye corresponding to the region where the optic nerve leaves the retina and there are no photoreceptors (Figure 2). In everyday perception we are never aware of the blind spot. The blind spots of the two eyes do not overlap, and something that falls on the blind spot of one retina will fall outside the blind spot of the other retina. Even under monocular viewing the blind spot is not easily revealed. Close one eye and fixate a point on a uniformly colored piece of paper: there is no experience of any gap or discontinuity in the visual field. Now follow the instructions for the blind spot demonstration in Figure 3. The left dot disappears and one perceives a uniform expanse of brightness and color. This is an example of perceptual completion or visual filling-in: the color and brightness surrounding the area corresponding to the blind spot are said to "fill in" that area so that a uniform expanse is perceived.
The existence of such perceptual completion phenomena at the subject-level is uncontroversial. The term "filling-in," however, is often used in a controversial sense that goes beyond what subjects report. In this controversial sense, the term "filling-in" suggests that certain kinds of subject-level perceptual completion phenomena are accomplished by the brain's providing something to make up for an absence--by the brain's actively filling in the missing information. Whether there is neural filling-in, however, is a matter of great debate in visual science. We will argue later that there is considerable evidence for neural filling-in, but that great care needs to be taken in thinking about the relation between neural filling-in and subject-level perceptual completion.
To appreciate the debates about filling-in it is first necessary to review certain facts about vision and second to discuss certain conceptual and methodological points. Many of the objects we perceive have roughly uniform regions of surface color and lightness. Now consider two facts. First, cells in the visual cortex in general do not respond to uniform regions, but rather to discontinuities (Hubel & Wiesel 1962, 1968). In other words, many neurons respond more strongly to boundaries than to regions or surfaces. Second, psychophysical experiments--for example, with stabilized images--have shown the importance of boundaries for proper surface perception (Krauskopf 1963; Yarbus 1967). In the classic study by Krauskopf (1963), an inner green disc was surrounded by a red annulus (Figure 4). When the red-green boundary was stabilized on the retina (so that it always maintained a fixed position on the eye), subjects reported that the central disk disappeared and the whole target, disk plus annulus, appeared red. (This is another case of visual filling-in: Krauskopf's observers perceived the central area as having the red of the surround even though "green" light was striking the corresponding region of the retina.) These and other results (e.g., Land 1977) suggest that even under natural viewing conditions the perceived color of a surface depends not only on the light reflected from the surface but on the change in light across the boundary of the surface.
If boundaries--in fact, transients--are so important, how is the brain able to determine the color and lightness of continuous regions? Is there an active filling-in process at the neural level? Some visual scientists have developed theories and models based on the idea of a neural filling-in process that involves activity-spreading, diffusion, or other forms of neural completion. Others have argued against this idea, suggesting, for example, that contrast measures at borders can be used to assign values of surface featural qualities, such as brightness or color. We will argue later that in the case of brightness perception there is good evidence for neural filling-in that involves spatially propagating activity.
The filling-in controversy is not only empirical, however, for it involves fundamental conceptual and methodological issues. To appreciate these issues it is necessary to introduce the concepts of the bridge locus and neural-perceptual isomorphism. The best way to introduce these concepts is by way of an example. Figure 5 presents another case often discussed in connection with filling-in, the Craik-O'Brien-Cornsweet effect (Craik 1940; O'Brien 1958; Cornsweet 1970). Two largely uniform regions of different brightness are seen while most of the corresponding stimulus regions have exactly the same luminance. In fact, the two regions differ only in the luminance distribution at the "cusp" edge separating the two regions. Why, then, do we see a brightness step?
Here is one route leading to an answer to this question that appeals to neural filling-in (Todorovic' 1987). Suppose one assumes that activity of a particular type in a specific set of neurons is necessary and sufficient for the occurrence of the Craik-O'Brien-Cornsweet effect. These neurons would form the "immediate substrate" for the perceptual effect. Visual scientists use the term bridge locus to refer to this idea of a particular set of neurons "whose activities form the immediate substrate of visual perception" (Teller & Pugh 1983, p. 581). Now suppose one also assumes that there must be a one-to-one correspondence between the perceived spatial distribution of brightness in the effect and the neural activity at the bridge locus. In other words, just as the perceptual content consists of two uniform regions with a brightness step, so too the immediate neural substrate must consist of spatially continuous activity and a step difference. In short, suppose one assumes that there has to be an isomorphism between the perceived brightness distribution and the neural activity at the bridge locus (Todorovic' 1987). One would then arrive at the following sort of explanation of the effect: the brain takes the local edge information and uses it to fill-in the two adjacent regions so that the region with the luminance peak (left) becomes brighter than the region with the luminance trough (right). The end result is the perception of a brightness step in the absence of any corresponding luminance step.
Two basic ideas are involved here. The first is that the way things seem to the subject must be represented neurally in the subject's brain. The second is the idea that, in analyzing visual perception, one must arrive at a "final stage" in the brain--a bridge locus--where there is an isomorphism between neural activity and how things seem to the subject. [The isomorphism can arise at some earlier stage of visual processing, as long as it is preserved up to the bridge locus (see Teller & Pugh 1983, p. 586; Teller 1984, p. 1242; Todorovic' 1987, p. 550).] We will refer to this idea as analytic isomorphism. When applied to perceptual completion phenomena, such as the Craik-O'Brien-Cornsweet effect, analytic isomorphism entails that there must be neural filling-in to make up the difference between how things are and how they seem to the subject.
Analytic isomorphism is essentially a conceptual or methodological doctrine about the proper form of explanation in cognitive neuroscience. The doctrine is that it is a condition on the adequacy of an explanation that there be a bridge locus where an isomorphism obtains between neural activity and the subject's experience. Furthermore, the isomorphism is typically taken to hold for spatial or topographic properties, thus suggesting that vision involves representations having the form of an "internal screen" or "scale model" that preserves the metric properties of the external world (O'Regan 1992). In this article, we argue that analytic isomorphism should be rejected. Nevertheless, we believe that the empirical case for neural filling-in remains strong.
Enter the philosophers. Dennett (1991, 1992) has tried to brand "filling-in" the "F-word" in cognitive science. He thinks that the sort of reasoning epitomized by analytic isomorphism, and hence the idea that there must be neural filling-in, depend on a fundamentally mistaken conception of consciousness. Dennett calls this conception "Cartesian materialism." In the stereotypical version of Cartesian materialism, there is a place in the brain--a "Cartesian theater"--where contents become conscious as a result of being presented to an inner "audience" or homunculus--a viewer of the panoramic "internal screen" (O'Regan 1992). Everybody agrees that this idea is totally wrong. Nevertheless, Dennett thinks that not everybody understands exactly why it is wrong. In Dennett's assessment, the real mistake is a conceptual one: the mistake is to assume that consciousness is a property of individual contents in the way that truth can be considered a property of individual sentences. Given this concept of consciousness, it would seem that there must be a determinate spatio-temporal point in the brain where a content "enters consciousness." Dennett thinks that this concept of consciousness is incoherent and offers in its place the idea that "consciousness is a species of mental fame" (Dennett 1996b). Just as it is impossible to be famous for a second, or to become famous in a second, or to be famous when there are no other people around, so it is impossible for a single, momentary, isolated content to become conscious: for a content to become conscious it has to persist long enough to achieve certain effects on memory and the control of behavior. In short, for Dennett, consciousness is constituted through the joint interaction of spatially and temporally distributed information-processing systems.
Dennett argues that to claim that there is neural filling-in "is a dead giveaway of vestigial Cartesian materialism" (1991, p. 344). Like a number of visual scientists, he believes that there is no reason to suppose that the brain fills in the regions; the brain simply represents the fact that regions are filled-in without itself doing any filling in. For Dennett, perceptual completion is a case of the brain's "finding out" or "judging" that certain features are present, without the brain's having to "present" or fill in those features.
In visual science, O'Regan (1992) has put forward a similar position. He suggests that the need to appeal to neural filling-in would "evaporate if we abandon the idea that 'seeing' involves passively contemplating an internal representation of the world that has metric properties like a photograph or scale model" (1992, p. 483). Instead, he argues that "seeing constitutes an active process of probing the external environment as though it were a continuously available external memory" (p. 484). Seeing depends not on the filling-in of a metric representation, but rather on "interrogating" the external environment directly through eye movements, and then integrating the altered retinal sensations into one's cognitive framework (p. 475).
In the past few years, there has been a growing literature on filling-in among psychologists, neuroscientists, theoretical modelers, and philosophers. Yet the term "filling-in" continues to be used in different ways. Sometimes it is used to describe what the subject perceives; sometimes it is used to refer to what the brain does. The term is also used to describe different sorts of perceptual completion. For example, although illusory contours and brightness perception probably involve different processes, "filling-in" is often used in association with both: a line segment is said to fill-in between the inducers, and brightness is said to fill-in across regions. Although one need not argue which usage is preferable, it should be obvious that without conceptual and terminological clarification there is room for considerable confusion. For example, do these two types of completion involve common principles and mechanisms? Or are they distinct? Given this situation, it can be hardly surprising that in the more theoretical debates the participants often seem to be talking past one another.[2]
In this target article we provide a taxonomy of perceptual completion phenomena with an overview of some recent psychophysical results pertinent to filling-in. The taxonomy is meant as a step toward conceptual and terminological clarification. We would like to emphasize at the outset that our taxonomy is based on salient examples of perceptual completion; it is in no way an exhaustive survey.
In addition we argue that the filling-in issues are best understood in relation to issues about neural-perceptual isomorphism and "linking propositions." Linking propositions are statements that relate perceptual states to physiological states (Teller 1980, 1984, 1990; Teller & Pugh 1983). The concept of a neural-perceptual isomorphism relies on a certain sort of linking proposition, to be discussed later. Nonisomorphic approaches reject the idea that there has to be any structural one-to-one correspondence between perceptual experience and neural processes.
Dennett's discussion of Cartesian materialism is directly relevant to these issues. Dennett has done a service by showing how the filling-in idea often depends on Cartesian materialism (see also O'Regan 1992). But we disagree with Dennett's positive view that perceptual completion is always just a matter of the brain's "finding out." We think that the idea of neural filling-in has to be separated from Cartesian materialism and analytic isomorphism. We will show that there is evidence in visual science to support the idea of neural filling-in. As discussed below, filling-in is not always just finding-out. Nevertheless, we agree that it is mistaken to invoke filling-in as a theoretical category in the explanation of vision. As we see it, such invocations depend on a mistaken representational account of the task of vision. We argue later that the representational account should be replaced by the sort of account variously described as "active" (Aloimonos et al. 1988; Bajcsy 1988), "animate" (Ballard 1991, 1996; Ballard et al. 1997), or "enactive" (Varela et al. 1991; Thompson et al. 1992).
The next section discusses in more detail the notion of linking propositions and the concept of neural-perceptual isomorphism. Section 3 offers a taxonomy of filling-in phenomena to help clarify and organize the empirical findings, plus a detailed look at examples of boundary completion involving illusory contours, and featural completion involving color, brightness, motion, texture, and depth. Section 4 examines some neurophysiological data about perceptual completion. Section 5 summarizes Dennett's position on filling-in and connects it to earlier discussions of isomorphism in visual science. Section 6 presents evidence for neural completion that is inconsistent with Dennett's position. Section 7 looks at some studies that assess the measurable effects of perceptual completion. Section 8 shows how neural filling-in does not entail either Cartesian materialism or analytic isomorphism. Section 9 introduces the personal/subpersonal distinction--the distinction between, on the one hand, the perceiving animal or person as a whole interacting with its environment, and on the other hand, the animal's internal functional organization and processing--and shows its relevance to the filling-in controversy. Section 10 concludes with a statement of directions for further research.
In this section we briefly review some important conceptual and methodological issues about explanation in visual science.
In 1865 Mach stated what has since become known as "Mach's principle of equivalence":
Every psychical event corresponds to a physical event and vice versa. Equal psychical processes correspond to equal physical processes, unequal to unequal ones. When a psychical process is analyzed in a purely psychological way, into a number of qualities a, b, c, then there corresponds to them just as great a number of physical processes a, [beta], [gamma]. To all the details of psychological events correspond details of the physical events (Mach 1865/1965, pp. 269-70).
Thirteen years later in 1878 Hering (1878/1964) asserted that the neural-perceptual parallelism was a necessary condition of all psychophysical research. Müller (1896) then gave a more explicit description of the neural-perceptual mapping. He proposed five "psychophysical axioms" that postulated a one-to-one correspondence between neural and perceptual states (see Teller 1984; Scheerer 1994). In particular, his second axiom stated that perceptual equalities, similarities, and differences correspond to neural equalities, similiarites, and differences. This axiom was not offered as a solution to the so-called mind-body problem, but rather as a methodological principle that could be a guide in inferring neural processes from perceptual experiences (Scheerer 1994, p. 185).
Köhler accepted this idea, but thought that Müller's axioms were not comprehensive enough because they did not include occurrent perceptual states, but covered only the logical order between neural and perceptual states (Scheerer 1994, p. 185). In 1920 he proposed what he would later call the principle of isomorphism (Köhler 1920), building on Müller's earlier formulation, as well as Wertheimer's (1912). In his 1947 book Gestalt Psychology, Köhler wrote: "The principle of isomorphism demands that in a given case the organization of experience and the underlying physiological facts have the same structure" (Köhler 1947, p. 301).
There are several points about Köhler's principle of isomorphism that deserve mention. First, by the phrase "have the same structure" Köhler had in mind structural properties that are topological. Although the concept of neural-perceptual isomorphism has often been taken to mean a geometrical one-to-one mapping, Köhler clearly intended the isomorphism concept to have a topological sense. For example, he argued that spatial relationships in the visual field cannot correspond to geometrical relationships in the brain; they must correspond rather to functional relationships among brain processes (Köhler 1929, pp. 136-141; 1930, pp. 240-249).
Second, Köhler did not hypothesize that neural-perceptual isomorphism obtained for all properties of perceptual experience. In particular, he did not extend the principle of isomorphism to sensory qualities, such as brightness and color (Köhler 1969, pp. 64-66, as quoted in Pomerantz & Kubovy 1981, p. 428). The principle was restricted to "structural properties" of the perceptual field, that is, to characteristics of perceptual organization, such as grouping and part-whole relationships.
Finally, it is not clear whether Köhler espoused what we are calling analytic isomorphism. Two considerations suggest that he did not. First, Köhler upheld a non-localizationist view of brain function, in which field physics was the main analogy for the underlying physiology; hence the notion of a privileged site of perceptual experience in the brain seems foreign to his way of thinking about the neural-perceptual relation.[3] Second--and this is more telling--he seems to have held (at least according to one interpreter) that the isomorphism principle "is not an a priori postulate, but 'remains an hypothesis which has to undergo one empirical test after the other'" (Sheerer 1994, p. 188).
In 1969, Weisstein provided a clear discussion of the relationship between neural states and perceptual states in the context of experimental studies based on recordings from single-cells: "Axiomatically, it can be assumed that any visual event has some corresponding neural circuitry, and that a good deal of neural circuitry in the visual system has a corresponding function in producing a perceptual event" (Weisstein 1969, p. 159). She noted that it is important "to choose some aspect of the behavior shown in single units which appears to have an analogous psychophysical effect in humans," but also that "the corresponding psychophysical effect cannot be strictly analogous to a single unit recording," for there will be more than one single unit activated for almost any conceivable stimulus. As Weisstein rightly insisted, both a spatial and a temporal characterization of neural and perceptual data are essential in establishing a bridge between the two domains that goes beyond identifying rough similarities (see Section 4.5).
In recent years Teller has reintroduced some of these issues into visual science (Teller 1980, 1984, 1990; Teller & Pugh 1983). According to Teller, acceptable explanations within visual science have the following form:
If the question is, what is it about the neural substrate of vision that makes us see as we do, the only acceptable kind of answer is, we see X because elements of the substrate Y have the property Z or are in the state S (Teller 1990, p. 12).
This formulation leaves open another question about form: what is the relation between the form of a given neural response and the form of the corresponding visual appearance? Answers to this question invoke linking propositions--propositions that relate neural states to perceptual states. By analyzing how visual scientists reason, Teller (1984) formulated five families of linking propositions called Identity, Similarity, Mutual Exclusivity, Simplicity, and Analogy.
The Analogy family is the one that concerns us here. It is a "less organized" family of propositions whose form is as follows
[Phi] "Looks Like" [Psi] --> [Phi] Explains [Psi]
where "[Phi]" stands for physiological terms, and "[Psi]" stands for perceptual terms. The arrow-connective "-->" has a conditional sense, thus the formulation reads: If the physiological processes (events, states,) "look like" the perceptual processes (events, states), then the physiological processes explain the perceptual processes.
The arrow is not the connective of logical entailment. It is heuristic, and is meant to guide the search for the major casual factors involved in a given perceptual phenomenon. Thus the term "explains" on the right-hand side is really too strong--the idea is that "[Phi]" is the major causal factor in the production of "[Psi]": "if psychophysical and physiological data can be manipulated in such a way that they can be plotted on meaningfully similar axes, such that the two graphs have similar shapes, then that physiological phenomenon is a major causal factor in producing the psychophysical phenomenon" (Teller 1984, p. 1240).
The Analogy family of linking propositions is similar to Köhler's principle of isomorphism but more general. Isomorphism in Köhler's sense can be seen as a particular instance of the Analogy idea, one in which "looks like" is taken in the sense of structural correspondence. In visual science today, this idea of a one-to-one structural correspondence is often taken to mean a spatial correspondence, so that, for example, spatial variations of brightness in the visual field are explained by analogous spatial variations of neural activity (Todorovic' 1987, p. 548).
To argue that the brain doesn't really fill in, it only "finds out," means that one rejects the principle of isomorphism applied to perceptual completion. In other words, one rejects the hypothesis that perceptual completion depends on neural completion processes that are structurally isomorphic to the perceptual phenomena. We agree that the doctrine of analytic isomorphism should be rejected, but we think there is evidence for neural filling-in. One problem with Dennett's treatment is that he applies his "filling-in is finding out" point across the board, without considering the different kinds of perceptual completion. Before we go further, then, we need to review the different sorts of perceptual completion phenomena.
The following working taxonomy is meant as a step toward conceptual and terminological clarification. The term "perceptual completion" refers to what subjects report and should be taken in a theory-neutral sense. It is not meant to have any implications for whether there are, or are not, neural filling-in mechanisms in the operation of the visual system. This is a matter to be taken up later. All that the term "perceptual completion" is meant to imply is that subjects report that something seems to be present in a particular region of visual space when it is actually absent from that region, but present in the surrounding area.
There are two general divisions in the classification:
(1) Amodal Completion versus Modal Completion
(2) Boundary Completion versus Featural Completion
These two divisions cross-classify each other, and so there is no hierarchical organization implied in this listing. We will introduce each division briefly using examples and then consider them in depth in the discussion section.
Michotte et al. (1964) distinguished between two types of perceptual completion, modal and amodal. In modal completion, the completed parts display the same type of attributes or "modes" (e.g., brightness) as the rest of the figure. Illusory figures provide a particularly interesting example. Figure 6 shows the famous "Kanizsa triangle" (Kanizsa 1955, 1979). Here there are illusory contours--clear boundaries where there is no corresponding luminance gradient--and a brightening within the figure. The illusory contours and the central brightening are modal in character: they are perceptually salient and appear to belong to the figure rather than the ground.[4]
"Amodal completion" refers to the completion of an object that is not entirely visible because it is covered or occluded by something else (Kanizsa & Gerbino 1982). Thus "amodal completion" denotes the perception of parts of objects--the completed regions--that entirely lack visible attributes. For example consider Figure 7. Although the circles are occluded, they are easily recognized, and are seen as lying underneath the rectangles. The parts of the circles occluded by the rectangles are said to be amodally present.
The distinction between boundary completion and featural completion was first proposed in the theoretical work of Grossberg and Mingolla (Grossberg & Mingolla 1985; Grossberg 1987a, 1987b).[5] Illusory figures too can serve to introduce the distinction. In the Kanizsa triangle, there is boundary completion--the illusory contours complete to form a triangular outline--and there is featural completion, an illusory brightening within the figure compared to the background in the absence of any luminance difference.
In the next section (Section 3.3) we review both boundary and featural completion phenomena. Before doing so we wish to state explicitly that we are not at all committed to the currently popular "feature-based" appoach to visual perception, that is, to the idea that features such as color, brightness, texture, and so on, are the "visual primitives" out of which visual perception is composed. Our intention is not to subscribe to this paradigm; it is simply to review some psychophysical studies that provide evidence for various sorts of perceptual completion, and these studies typically focus on the visual attributes just mentioned.
Amodal completion is the perceptual completion of occluded objects; modal completion is perceptual completion in the foreground. What does it mean to say that the amodally present parts of the figure are "seen" or "recognized"?
Kanizsa and Gerbino (1982) have explored these questions in terms of the relation between seeing and thinking. They describe the amodal presence of the occluded parts as having an "encountered" character (their translation of Metzger's angetroffen), and they say that "The name 'amodal presence' is reserved for the 'encountered' presence of parts not directly visible" (pp. 171-2). They contrast this encountered presence with a "purely mental completion of an inferential kind," saying that "amodal completion transforms a collection of pieces into a reality of complete things of a phenomenal 'encountered' character" (p. 173). Kanizsa and Gerbino also try to differentiate between "cognitive completion" and "perceptual completion" by arguing that the latter always has a functional effect on the visual aspect of a situation, whereas the former does not suffice to produce such effects. Of course, these two types of completion are not mutually exclusive--they can act together. But it seems they can be dissociated too: the cognitive sort can be present in the total absence of the perceptual sort (pp. 173-4).
Cognitive explanations have also been proposed for modal completion phenomena. For example, during the 1970's, the preferred explanation of illusory contours involved appealing mainly to cognitive-like processes of postulation and hypothesis formation (Gregory 1972; Rock & Anson 1979) (see Sections 4.1, 6.1, 7.3).
We think that a profitable approach to these issues would be to determine to what extent modal and amodal completion involve common mechanisms. For example, in a series of investigations with illusory contours and occluded figures (among other stimuli), Kellman, Shipley and colleagues have gathered evidence that common interpolation mechanisms are involved (Kellman & Shipley 1991). On the other hand, Anderson (1995) suggests that two kinds of boundary interpolation are involved in modal and amodal completion phenomena.
Another way to probe completion mechanisms is to test current theories of surface perception. For example, Grossberg 's FACADE theory tries to explain many challenging completion phenomena involving form, color, and depth (Grossberg 1994).
Several studies of the blind spot have recently appeared (Brown & Thurmond 1993; Komatsu & Murakami 1994; Tripathy & Levi 1994; Murakami 1995; Tripathy et al. 1995). Here we mention a study related to amodal and modal perceptual completion. Durgin et al. (1995) argue that filling in of the blind spot be considered a case of surface interpolation. They compared percepts involving the blind spot to percepts involving occlusion (for example, a disk lying on a thick line). In all the tasks they investigated, including motion stimuli and amodal completion, the percepts were similar in detail.
Durgin et al. interpret their results as "consistent with the null hypothesis that the blind spot is treated visually as a region of little or no information" (1995, p. 837). Although the statement in this particular form is uncontroversial--there are no photoreceptors originating from the blind spot region--Durgin et al. go on to state: "we do not consider that the content of visual perception in the blind spot must directly reflect the activity of 'filled in' visual maps" (p. 837).
Durgin et al.'s perceptual demonstrations support the notion that the blind spot may be treated as an "occluded" region of vision "without an occluder." But at present their demonstrations do not speak to the nature of the underlying neural processes. Durgin et al. propose that blind spot completion is similar to amodal completion. But the perceptual similarities between blind spot completion and amodal completion are not enough to draw conclusions about the neural processes involved in blind spot filling-in.
Experimental studies in psychophysics provide evidence for the existence of two separate types of perceptual completion--boundary completion and featural completion. Two examples can be given--illusory contours and neon color spreading.
There are three defining properties of illusory contours: clarity (or sharpness of the contours), brightness (of the illusory figure), and depth (the "depthfulness" of the illusory figure) (Lesher 1995). Illusory figures need not exhibit all of these properties. For example, illusory contours can appear without an accompanying illusory figure (no depth), such as in offset-grating stimuli. Most important in the present context are illusory figures without clarity (Figure 8a), and illusory figures having clarity but without any accompanying brightening or darkening effect (Figure 8b). [Another example of clarity without brightness is the phenomenon of spontaneously splitting figures (Koffka 1935; Petter 1956; Kellman and Shipley 1991).] We find these two situations (Figure 8) useful for illustrating the distinction between boundary completion and featural completion. In particular, the relative independence of contour clarity and figure brightness (see Lesher 1995) suggests that independent neural mechanisms subserve boundary completion and featural completion (Grossberg & Mingolla 1985).
Watanabe and Sato (1989) studied neon color spreading in the Ehrenstein-plus-cross configuration. They varied the luminances of the cross and the outer segments (Ehrenstein inducers) and were able to show conditions where no illusory contours formed while color spreading occurred. Illusory contours did not occur when the cross (yellow) and the outer segments (white) were equiluminant. In these conditions color spreading was possible. Watanabe and Sato conclude that separate mechanisms subserve the two aspects of the neon color spreading effect. In their view, a luminance difference between the outer segments and their surroundings (the inner segments plus the background) induces the illusory contours. Color spreading depends on the color difference between the inner and outer segments.[6]
Two major types of illusory contours can be distinguished--edge-induced and line-induced. Edge-induced illusory contours consist of solid inducing elements containing edges, or gaps, locally consistent with an occluding figure of the same luminance as the background. In such cases, the illusory contour is collinear with the inducer edges consistent with the occlusion. Line-induced illusory contours, on the other hand, can be seen as the limiting case of edge-induced figures, where the inducers are typically "thin." In this case, the associated illusory contours are not parallel to the inducers, but instead roughly perpendicular to them. Figure 6 (the Kanizsa triangle) presents both types of inducers. The three black, circular "pac-men" act as edge inducers, while the thin lines work as line-end inducers. Figure 8 presents two other line-end-induced illusory figures. Figure 8b also illustrates that there is no sharp separation between edge- and line-end induced illusory contours--a line always has some width and hence provides some edge information for very small spatial scales.
The determinants of illusory contour strength are varied and include both low- and high-level factors. Low-level factors include the spatial extent and proximity of inducers, number of inducers, inducer luminance and contrast, and inducer alignment. High-level factors include perceptual set and memory, depth modifications, and inducer completeness (Lesher 1995). The strength of edge- and line-end-induced illusory contours is enhanced by providing additional stereo and motion cures. Remarkably, illusory contours can be generated without any luminance discontinuities whatsoever.
Random dot stereograms (Julesz 1971) give rise to strong, sharp illusory contours at the (depth-induced) edges. In such cases, the contours are associated with local depth disparity cues. Illusory contours also occur without local disparity (and luminance) discontinuities. If only the edge inducers of a Kanizsa figure are defined by disparity information in a random dot stimuli, contour completion ensues (Prazdny 1985; Mustillo & Fox 1986). In this case, stereo cues give rise to inducers which then behave as luminance defined inducers, resulting in contour formation across regions with no local stereo (and luminance) cues. Analogously, illusory contours can be specified by local motion cues. (See Lesher 1995 for an in-depth treatment of the above issues.)
Illusory contour boundary formation is accompanied by modal completion--the presence of perceptually salient figure-like (as opposed to ground-like) elements. Amodal completion also involves boundary completion. In other words, contours are formed in regions that are not visible. Although many studies of amodal completion involve both the completion of contours and the completion of regions, we include amodal completion in this section because several of the issues related to boundary formation (e.g., good continuation) are critical for the study of amodal completion.
Amodal completion is governed by global and local factors. Global factors include symmetry and simplicity (Buffart et al. 1981, 1983). Defenders of the primacy of global factors see the whole figure as important in determining the percept, which is often postulated to be the simplest organization of the stimulus (compare the Gestalt notion of Prägnanz). Local factors include contour continuity and curvature (Shipley & Kellman 1992). Those who espouse the importance of local factors stress the autonomy of relatively low-level contour processing mechanisms as governing amodal completion. The debate continues on the relative merits of global (Buffart et al. 1981) and local (Kellman & Shipley 1991; Boselie 1994; Takeichi et al. 1995) factors in determining amodal completion, with several hybrid schemes having been proposed (Wouterlad & Boselie 1992; Boselie 1988; Sekuler 1994). Recent studies have also attempted to dissect potential representational stages involved in amodal completion, such as a mosaic stage--where a literal description of the visible parts of an occluded surface (called "mosaic") would be produced--and a completion stage (Sekuler & Palmer 1992; Bruno et al. 1996). Finally, as previously discussed, current research targets the critical issue of whether modal and amodal completion phenomena are subserved by common mechanisms.
For what kinds of feature can there be modal perceptual completion? Experimental studies indicate that featural completion can involve brightness, color, texture, motion, and depth.
We have already seen examples of completion for brightness and color--illusory figures and neon color spreading, as well as the Craik-O'Brien-Cornsweet effect discussed in Section 1. In general, brightness and color filling-in are related to the role played by contours in surface perception. The nature of the underlying neural mechanisms is a matter of considerable debate. Several researchers have advanced models based on the idea of filling-in at the neural level (Fry 1948; Walls 1954; Gerrits et al. 1966; Gerrits & Vendrik 1970; Davidson & Whiteside 1971; Hamada 1984; Cohen and Grossberg 1984; Grossberg & Todorovic' 1988; Arrington 1996; Pessoa et al. 1995; Neumann 1996; Ross & Pessoa 1997). Other researchers have proposed mechanisms that do not actively fill in regions, but use other processes to assign feature (see the discussion in Ratliff & Sirovich 1978; Kingdom & Moulden 1989; Pessoa 1996).
Watanabe and Cavanagh (1991, 1993) investigated whether attributes such as texture can make up an illusory surface in a configuration similar to one that elicits neon color spreading. Subjects viewed a display consisting of crosses filled with black and white textures inserted into the gaps of the Ehrenstein figure (Figure 9). They were asked to report "whether the texture of the crosses appeared to spread outside of the cross," or not. 83.3% of the subjects reported the perception of texture outside of the cross, compared to 0% when the central cross regions were viewed in isolation. In the case of neon color spreading, the spreading effect decreases when the inner cross is disconnected from the Ehrenstein figure--for example, when the cross is rotated around its center (Redies & Spillman 1981). Watanabe and Cavanagh report that their demonstrations with texture spreading show the same tendency. Unlike neon color spreading, however, texture spreading totally disappeared when a textured cross was foveated (which the reader can confirm in Figure 9a).
Studies by Kawabata (1982, 1984, 1990) also provide evidence about the perceptual completion of texture. Kawabata (1982) investigated the stimulus conditions necessary for perceptual completion across the blind spot. Several types of pattern were investigated, including dotted lines (each half to one side of the blind spot), parallel lines (grating pattern), and concentric circles. Grating patterns completed only when they covered two quadrants around the blind spot--for example, covering the top hemi-field or the left hemi-field. Concentric circle patterns also completed across the blind spot as long as the pattern consisted of more than three circles. Perceptual completion of dotted lines occurred as long as there was a small spatial separation between the line ends and the borders of the blind spot, otherwise they were perceived as two independent lines. In a subsequent study, Kawabata (1990) showed that types of completion similar to those obtained for the blind spot were obtained in peripheral vision (15 degrees or more).
A recent series of studies by Ramachandran and colleagues on perceptual completion effects has also attracted considerable attention (Ramachandran & Gregory 1991; Ramachandran 1992a, 1992b, 1993a; Ramachandran et al. 1993). In one study, a homogeneous gray (or pink) square was displayed on a dynamic 2-D noise pattern (twinkling noise). Three effects were reported: (1) the gray square faded after around 5 seconds; (2) the square region appeared filled-in with the surrounding noise pattern; and (3) the noise pattern in the region originally occupied by the gray square persisted. Ramachandran and colleagues described (2) as a filling-in effect and discussed possible neural substrates and mechanisms (see Ramachandran et al. 1993). Finally, Hardage and Tyler (1995) recently compared the actual filling-in (Ramachandran's second effect) with the twinkle aftereffect (the third effect). They showed that the two effects are sensitive to different spatial and temporal parameters. Their results indicate that different mechanisms are involved in the generation of these two percepts.
Watanabe and Cavanagh (1993) (in the study previously mentioned) also employed the Ehrenstein figure-paradigm to investigate motion spreading. The texture inside the crosses was made to move while the crosses themselves remained stationary. All of their subjects reported that motion appeared to spread outside the crosses.
Apparent motion (the phi phenomenon) is a well known effect. Motion perception is induced by stimuli presented at distinct spatial positions and with an appropriate temporal interval between them. Lockhead et al. (1980) investigated a variation of apparent motion known as "sensory saltation" (Geldard & Sherrick 1972). In this perceptual effect (for proper display parameters) the subject reports discrete points of stimulation between the two actual stimulation points. Lockhead et al. attempted to determine whether illusory stimulation points in sensory saltation could be spatially assigned to the (receptorless) blind spot region. The results for three subjects were similar and showed that the saltation crossed the blind spot, with the illusory stimulation points often localized within the blind spot region.
The completion of depth is vividly illustrated by the stereogram shown in Figure 10 (Nakayama & Shimojo 1990a). For such an untextured figure, it is not clear what classical stereopsis would predict for the perception of the center of the cross. Should the percept assume the depth of the nearby vertical segment, or that specified by the horizontal limbs? For the fusion of left and center images, although the vertical limb is closer to the center, one perceives a horizontal bar in front of a vertical bar, illustrating that disparity information can "propagate" when necessary.
Another example of depth completion is given by the phenomenon of Da Vinci stereopsis (Gillam & Borsting 1987; Nakayama & Shimojo 1990b). When we view a farther surface that is partly occluded by a nearer surface, one eye typically registers more of the farther surface than the other eye does. The perception of the farther surface is often derived from the view of the eye that registers more of this surface. Da Vinci stereopsis is illustrated in Figure 11. Observers see the right-eye view of the surface BD in depth, although the region that lies between the vertical lines B and C is registered monocularly by only the right eye. An outstanding question in the study of 3-D vision is how the monocularly viewed region BC inherits the depth of the binocularly viewed region CD (Grossberg 1994; Grossberg & McLoughlin in press).
A final example is a stereogram by Julesz (1971, p. 336) in which each image contains 5% black dots on a 95% white background. A portion of the black dots has disparity, while the remaining ones have zero disparity. When the left and right images are stereoscopically viewed, the black dots with disparity are, as expected, seen in front. These black dots, however, cause the white surround that they enclose to be seen, as a whole, as a planar surface lying in front of another planar surface containing the zero disparity black dots and the white region that they enclose.
We offer a taxonomy of perceptual completion whose divisions are amodal versus modal completion, and boundary versus featural completion. Within these divisions there are numerous different sorts of perceptual completion and care must be taken to distinguish among them. In particular, there seems to be an important difference between boundary completion and featural completion, with featural completion occurring for brightness and color, texture, motion, and depth. This means that propositions about "filling-in" as a whole are of limited use and should be greeted with suspicion. For example, in advance of further research, there is no reason to group together together illusory contour formation, for which there is strong evidence for neural completion (see Sections 4.1, 6.1, and 7.3.2), and the texture completion studies by Watanabe and Cavanagh (1991), and Kawabata (1990), where peripheral stimulation is necessary to elicit the completion effects. Indeed, without additional careful experiments to assess the effects of perceptual completion (see Section 7), it is not clear what these studies reveal about perceptual completion at the neural level.
We now turn to examine some recent physiological data relevant to perceptual completion. All the data come from single-cell studies of the response properties of cortical neurons. We wish to state here that our aim is to examine some important representative studies, not to give an exhaustive review.
In an influential paper, von der Heydt and colleagues (von der Heydt et al.1984) presented results from single cell recordings suggesting neural correlates of illusory contours in area V2 of the macaque monkey. Almost half the cells examined exhibited sizeable responses to drifting bars or edges and also to the illusory contour induced by drifting line gratings. Cells were not simply responding to individual line-ends, however, since the typical cell would not respond to a grating with only 2 or 3 bars, but would respond with increasing strength as other bars were added, until a saturated level of activity was reached.
Von der Heydt et al. (1984) also studied neural responses to notch stimuli--dark rectangles with parts missing, forming an illusory rectangle. Cellular activity fell off with increasing notch separation and was greatly reduced when only a single notch was present, in parallel with the perceptual disappearance of the illusory figure. In all, the cellular recordings of von der Heydt et al. revealed cells whose responses to illusory contour variations resembled human psychophysical responses to similar variations (see also Redies et al. 1986 for similar results in cat visual cortex). Although some have described these findings as the discovery of "illusory contour cells" (Lesher 1996), von der Heydt et al. (1984) tried to draw a clear distinction between the stimulus-response relationship, on the one hand, and perceived entities, on the other. For instance, they used the term "illusory contour stimuli," rather than "illusory contour cells," and they borrowed the term "anomalous contours" from Kanizsa (1955, 1979) to define a stimulus property without reference to perception.
In a recent study, Grosof et al. (1993) suggested the existence of neurons in V1 of macaque that respond to line-end stimuli similar to offset gratings. The results remain controversial, however, and await further controls to establish the role these neurons play in illusory contour perception (see Lesher 1995).
Fiorani and his colleagues recorded the activity from single neurons in parts of area V1 corresponding to the blind spot (Fiorani et al. 1992). A cortical region that corresponds to the blind spot for the contralateral eye (the eye on the opposite hemispheric side) will also correspond to a normal area of the retina for the ipsilateral eye (the eye on the same hemispheric side). How do neurons corresponding to the blind spot for the contralateral eye respond when a stimulus is presented across this area and the ispsilateral eye is closed? One might predict that the neurons wouldn't respond at all, that they would respond only to stimuli presented to the non-blind region of the ipsilateral eye. But this isn't what happens to 20% of the cells. In other words, some neurons do respond to stimuli presented through the blind spot. Especially interesting is a sub-population of cells that Fiorani et al. call "completion neurons." These neurons, whose receptive fields are located inside the blind spot, respond to a bar longer than the diameter of the blind spot when it is swept across the blind spot, but they respond poorly or not at all to bars restricted to one side of the blind spot. Since these neurons retinotopically map the blind spot (of the contralateral eye) where there are no receptors, Fiorani et al. describe them as having "interpolated receptive fields" (see Figure 12).
Two studies involving dynamic changes in receptive field size in primary visual cortex come from Gilbert and colleagues (Gilbert & Wiesel 1992; Pettet & Gilbert 1992; see also Gilbert 1992). Gilbert and Wiesel (1992) recorded from neurons in V1 in the monkey both before and after retinal lesions. They found that, over a period of minutes, neurons whose receptive field centers are originally located near the edge of the retinal scotoma have greatly enlarged receptive field sizes. They also found that, over a period of two months, neurons with receptive fields originally located within the lesioned area regain visual activity, but now corresponding to retinotopic positions outside of the lesioned area, thus providing an enlarged representation of the area surrounding the retinal scotoma. In other words, the retinal lesion temporarily silences cortical areas--in effect creating a cortical scotoma--but over a period of months the cortical area regains activity due to the dynamic changes in the receptive fields.
In a second study Pettet and Gilbert masked areas covering receptive fields of neurons in V1 of the cat, thereby creating an artificial scotoma (Pettet & Gilbert 1992). They found that when the area of the visual field surrounding the scotoma is stimulated, the receptive field can expand an average of 5-fold in area over a period of around 10 minutes. Gilbert suggests that this expansion may help to explain perceptual filling-in, such as color and texture filling-in, and even illusory contours (Gilbert 1992, p. 8). The idea is that the expansion allows stimuli located near the boundary of the original receptive field to drive the cell. The cells would then fire as if the stimuli were close to their receptive field centers--leading to a shifted percept of the location of the stimulus--so that the unstimulated region would appear to fill in.
In a more recent study, DeAngelis et al. (1995a) investigated receptive field plasticity in V1 under artificial scotoma stimulation similar to that in the Pettet and Gilbert experiments. But unlike Pettet and Gilbert, DeAngelis et al. did not encounter changes in receptive field size (but see Chapman & Stone 1996). Instead, they report short-term changes in responsiveness (gain changes) for some cells. But they too conclude that the sort of receptive field changes they observed could account for psychophysical phenomena such as filling in.
A recent study by De Weerd, Gattass, Desimone, and Underleider (De Weerd et al. 1995) takes a major step toward bridging the neural and perceptual levels in the case of perceptual completion. De Weerd et al. discovered cells in extrastriate cortex whose responses correlate well with the perceptual experience of texture filling-in of the sort studied by Ramachandran and colleagues (Ramachandran & Gregory 1991; Ramachandran et al. 1993). De Weerd et al. first determined the time-course of perceptual filling-in for human subjects. They used a large texture with an equiluminant hole in the middle, located 8 degrees from a fixation spot (Figure 13). Subjects were instructed to indicate when they saw the hole fill in. As the hole size was increased from 1 to 12.8 degrees, the time required to see it fill in steadily increased. De Weerd et al. then recorded from two awake behaving rhesus monkeys that viewed the same patterns while they were rewarded for maintaining fixation. For each cell, the hole was centered over the receptive field. There were two main experimental conditions. In the hole condition, the cell responses were recorded for the texture with a hole (the same condition used for the human subjects). The other no-hole condition served as a control to establish the responses to the same texture without a hole.
Cell responses in areas V2 and V3 revealed neurons whose firing rate in the hole condition was initially lower than in the no-hole condition, but that gradually increased their responses to a similar level, exhibiting what the authors term "climbing activity" (Figure 14). In other words, after a few seconds of fixation these extrastriate cells responded to the texture with the hole as if it were a texture without a hole. De Weerd et al. suggest, therefore, that the perceptual filling-in results from a minimization of the response differences in the hole and no-hole conditions.
In this section, we would like to raise several questions about the Pettet and Gilbert explanation of how receptive field plasticity might subserve completion phenomena, assuming for the moment that the experimental results they obtained are valid, despite being contested by the DeAngelis et al. (1995a) study (see Chapman & Stone 1996, for a discussion that reconciles these two studies). [Similar proposals, though less specific, have been advanced by Fiorani et al (1992), Ramachandran (1992a, 1992b), Churchland & Ramachandran 1993), and DeAngelis et al. (1995a)].
The first issue is methodological. The proposal relies on a problematic use of the Analogy family of linking propositions:
[Phi] "Looks Like" [Psi] --> [Phi] Explains [Psi]
In Gilbert's proposal, "[Phi]" refers to the activities of single neurons and "[Psi]" refers to subjective report data. The two sides are then connected through a sort of "resemblance" or "analogy"--the physiological data "look like" the perceptual phenomena because they both involve kinds of completion.
Building on Teller's discussion (1980, 1984), we can raise three worries about this sort of linking proposition when "[Phi]" stands for the activities of single neurons:
(1) What is the proposition's intended range of applicability?
Applied to the present proposal: for how many of the filling-in phenomena is it supposed to hold?
(2) Is there sufficient homogeneity at the neural level on which to base the proposition?
Applied to the present proposal: are there many "completion neurons" and are their responses homogeneous enough to support the link to the perceptual level?
(3) What about interactions between the putatively privileged set of neurons and neural activity elsewhere in the visual system?
Applied to the present proposal: does activity elsewhere in the visual system (for example back-projections to V1 from other cortical areas) affect the response patterns in V1?[7]
To this third worry we can add a fourth related one. In supposing that the activity of single cells is reflected more or less directly in the psychophysically measured response, one thereby assumes that, in the highly controlled conditions of an experiment, nothing else in the system interferes with the influence those cells have on the animal's response. [Teller (1980, p. 164) calls this assumption the "nothing mucks it up proviso."] In the Fiorani et al. experiments, however, the experimental conditions involve the animal's being anaesthetized. One often finds scientists making hypotheses about the physiological correlates of perception based on findings in animals who are not consciously perceiving anything due to anaesthesia. [Zeki's studies of color perception and V4 are a well known case in point (Zeki 1983a, 1983b).] Here the "nothing mucks it up" proviso amounts to assuming that consciousness--in the sense of being awake and alert--makes no difference to what the rest of the visual system is doing. We see no reason to believe this, and good reason not to: experiments have shown that when animals are awake and behaving in normal sensory surroundings many kinds of neuronal response in visual cells become highly dependent on behavioral factors such as the bodily tilt of the animal (Horn & Hill 1969), the animal's posture (Abeles 1984), and auditory stimulation (Morell 1972; Fishman & Michael 1973). Morevoer, studies in alert, unparalyzed monkeys reveal that both attention and the relevance of a stimulus for the performance of a behavioral task can considerably modulate the responses of visual neurons (Moran & Desimone 1985; Haenny et al. 1988; Chelazzi et al. 1993; Treue & Maunsell 1996).
The second problem is empirical and has to do with the observed time-courses for filling in. There are two sorts of time-course for receptive field expansion in the Gilbert data: the first is on the order of minutes; the second is on the order of months. These physiological time-courses do not match the perceptual ones, which typically seem to be on the order of seconds. For artificial scotomata, Ramachandran and Gregory (1991) report that perceptual filling-in happens on the order of 2-3 seconds. Similarly Gerrits reported that filling-in took serveral seconds for stabilized images (Gerrits et al. 1966); in contrast, Gerrits reported several years later that filling-in for patients with retinal scotomata happened "instantaneously" (Gerrits & Timmerman 1969). Finally, to add one more wrinkle, Paradiso and Nakayama (1991), in a study of the temporal dynamics of brightness perception to be discussed in Section 6.2, timed the speed of brightness filling-in on the order of milliseconds--brightness signals appear to propagate at a rate of 110-150 deg/sec (6.7-9.2 msec/deg). Clearly to establish a closer link between the perceptual data and the physiological data we must await the application of new techniques for assessing receptive field changes that might occur on the order of seconds or even shorter time scales (but see De Angelis et al. 1995b).
The study by De Weerd et al. (1995) addresses the two issues just raised. It carefully tries to correlate the time-course of perceptual completion with the cell responses investigated. The authors show that the time-course of climbing activity follows that of perceptual completion as the hole size is increased, and they clearly delimit the range of applicability of their linking proposition to encompass only Ramachandran-type texture filling-in. Finally, the study employed awake, behaving monkeys under stimulation conditions that paralleled those used for the human subjects.
To conclude this section, at the present time the neural processes involved in the different sorts of perceptual completion are largely unknown. On the basis of the foregoing considerations, however, we think that it is best not to view perceptual completion as directly reducible to "atomic" neural properties at the single cell level. In other words, at the neural level perceptual completion might be better described using concepts such as "cell assemblies" or other forms of distributed coordinated activity.[8]
We turn now to Dennett's criticism of the filling-in idea, in particular to his claim that filling-in is really just finding out.
Dennett makes two kinds of points about filling-in, one conceptual and the other empirical.
The conceptual points depend on distinguishing clearly between the content of a representation and the vehicle or medium of representation. Suppose one sees a colored region. This is one's perceptual content. Dennett assumes that there must be states or processes in the brain that bear this very content. But he observes that this could be accomplished by the brain in a number of different ways. First, there could be a representation of that region as colored, or a representation of that region could be absent, but the brain ignores that absence. The point here is to distinguish between the presence of a representation and ignoring the absence of a representation (Dennett 1992, p. 48). Second, suppose there is a representation of that region as colored. This too could be accomplished by the brain in different ways: for example, the representation could be spatially continuous or pictorial, or it could be symbolic.
These conceptual points show up the main mistake made by analytic isomorphism. Analytic isomorphism holds that there must be an isomorphic neural representation for each conscious perceptual content. As Dennett correctly observes, however, there need be no isomorphism between perceptual contents and neural representations, for some perceptual contents might correspond to neural processes that ignore the absence of neural representations, or they might correspond to symbolic representations.
Take the blind spot, for example. From the fact that one has no awareness of a gap in one's visual field, it does not follow that there must be a neural representation of a gapless visual field, for the brain might simply be ignoring the absence of receptor signals at the blind spot. Nor does it follow that the blind spot must be completed with spatially continuous representations, for the region might simply be designated by a symbol.
Dennett's conceptual points still leave open the empirical matter of just what the brain does to accomplish perceptual completion. Here Dennett is not entirely clear about what he means when he says that the brain "jumps to a conclusion." In the case of the blind spot, Dennett asserts that the visual cortex has no precedent of getting information from that retinal region, and so it simply ignores the absence of signals from that area. The moral of this story is that the brain does not need to provide any representation for perceptual completion to occur; completion can be accomplished by ignoring the absence of a representation (cf. Creutzfeld 1990, p. 460). The contrast, then, is between providing a representation and jumping to a conclusion, in the sense of ignoring the absence of a representation. According to this story, the brain does not need to fill in the blind spot in the sense of providing a roughly continuous spatial representation, nor does it need to label the blind-spot region--the absence of any representation for the blind spot region is simply ignored or not noticed in subsequent visual processing. Notice that this is a case of providing content; the point is that there is no representational vehicle specifically devoted to the blind spot.
On the other hand, Dennett sometimes contrasts providing a roughly continuous spatial representation with labeling a region. And he says that "filling in" means the former. Here the contrast is between, on the one hand, providing a spatial representation of each sub-area within a region--filling-in--and, on the other hand, jumping to a conclusion, in the sense of attaching a label to the region all at once. In this story the brain provides both content for the blind-spot region and a representational vehicle devoted to that region, namely, a label.[9]
Dennett's slogan is: "The brain's job is not 'filling in.' The brain's job is finding out" (1992, p. 47). The principle of brain function being assumed here Dennett calls "the thrifty producer principle": "If no one is going to look at it, don't waste effort providing it." For example, to see a region as colored, all the brain needs to do is to arrive at the judgement that the region is colored. Whether Dennett thinks that the brain accomplishes this by ignoring the absence of a representation or by providing a label ("color by number"), he clearly thinks that filling in the color of each sub-area ("color by bit map") is not the thriftiest way to do it.[10]
In visual science there has been a great deal of debate about neural-perceptual isomorphism in relation to filling-in, and the debates all pre-date Dennett's treatment (Ratliff & Sirovich 1978; Grossberg 1983; Bridgeman 1983; Todorovic' 1987; Kingdom & Moulden 1989; see also O'Regan 1992). In fact, in 1978 Ratliff and Sirovich argued against the need for a neural filling-in process in a way similar to Dennett. They argued that to assume that there must be neural filling-in to account for the homogeneous appearance of bounded regions is to misinterpret Mach's principle of equivalence as requiring that there be an isomorphic mapping from the form of the neural process to the form of the perceptual response. But such an isomorphism is not logically necessary. Therefore, neither is a neural filling-in process (see also Bridgeman 1983; and Kindgom & Moulden 1989).
Ratliff and Sirovich went on to make some remarks that are interesting in relation to Dennett's discussion of Cartesian materialism:
The neural activity which underlies appearance must reach a final stage eventually. It may well be that marked neural activity adjacent to edges [rather than neural filling-in between the edges]... is, at some level of the visual system, that final stage and is itself the sought-for end process. Logically nothing more is required (1978, p. 847).
This point is similar to Dennett's that, once discriminations have been made, they do not need to be re-presented to some central consciousness system--a "Cartesian theater" (1991, p. 344). But there is a dissimilarity too: as Dennett's critique of Cartesian materialism and his alternative "multiple drafts" model of consciousness makes plain, the notion of a "final stage" may have no application at all. In fact, given the dense connectivity of the brain, with reciprocal forward and backward projections, it is not clear what "final stage" could mean in any absolute sense (see Section 8.1). For this reason, Dennett's discussion of filling-in represents an advance over Ratliff and Sirovich's.
Although neural filling-in may not be logically necessary, whether there is neural filling-in has to be an empirical question. Ratliff and Sirovich admitted this: "we cannot by any reasoning eliminate a priori some higher-order stage or filling in process... But parsimony demands that any such additional stage or process be considered only if neurophysiological evidence for it should appear" (1978, p. 847). Dennett too admits this (1991, p. 353; 1992, pp. 42-43). What sort of evidence is there, then, for neural filling-in?
In Section 3 we reviewed a large number of perceptual completion phenomena. We would like to draw attention to two cases here--illusory contours and the temporal dynamics of brightness and color induction. Both strike us as counterexamples to the idea that perceptual completion is accomplished by the brain's ignoring an absence.
Several researchers have suggested cognitive theories of illusory contour perception, most notably Gregory (1972) and Rock (Rock & Anson 1979). In these theories, illusory contour formation is largely the result of a cognitive-like process of postulation. Illusory contours are viewed as solutions to a perceptual problem: "What is the most probable organization that accounts for the stimulus?" Although there is ample evidence for the role of cognitive influences in illusory contours, current studies point to the importance of relatively low-level processes in the formation of illusory contours.
Two lines of evidence point to an early neural mechanism for illusory contour completion: (1) neurophysiological data, and (2) psychophysical studies of the similiarities between real and illusory contours.
As we discussed in Section 4.1, von der Heydt and colleagues have shown that figures in which we see illusory contours evoke responses in a large number of cells in V2 of alert monkeys (von der Heydt et al. 1984; von der Heydt & Peterhans 1989; Peterhans & von der Heydt 1989). The cells respond as if the illusory contours were formed by real edges or lines, and they respond to variations in the figure in a way that resembles human psychophysical responses to the same variations. Although making a link between single cell activities and perceptual phenomena is problematic for the reasons reviewed in Section 4.5, the evidence here seems to suggest that the perceptual completion of boundaries involves the neural completion of a presence, rather than "ignoring an absence."
Many psychophysical studies have provided evidence for a common early treatment of both real and illusory contours by the visual system (see Lesher 1996; Spillman & Dresp 1995). For example, Smith and Over (1975, 1976, 1977, 1979) have revealed similarities between the two types of contours in the realm of motion aftereffects, tilt aftereffects, orientation discrimination, and orientation masking.
Tilt aftereffects are particularly interesting. A tilt aftereffect will occur if one adapts for a few seconds by looking at lines oriented counterclockwise from the vertical, and then one is exposed to a test stimulus of vertical lines. The latter will appear to be tilted clockwise, away from the adapting orientation. There is compelling evidence from recent studies showing that tilt aftereffects cross over between real and illusory contours (Berkeley et al. 1994; Paradiso et al. 1989). Thus adaptation with real lines can affect the perception of illusory contour orientation and vice-versa (see Section 7.3.2).
An important question concerns the level at which real and illusory contours have similar status. Motion and tilt aftereffects are often attributed to short term habituation in early visual stages (Barlow and Hill 1963; Movshon et al. 1972). Thus the evidence from psychophysics is that real and illusory contours share internal processes at an early level of the visual system. In fact, there is considerable evidence pointing to the functional equivalence of real and illusory contours in the operation of the visual system (see Table 1 of Lesher 1995; Spillman & Dresp 1995, p. 1347).
There is an enormous literature on the spatial variables determining brightness and color induction. In contrast, there are considerably fewer studies investigating temporal variables (Boynton 1983; Kinney 1967; see Heinemann 1972). But there are a few studies with results that speak directly to the question of evidence for filling-in.
Paradiso and Nakayama (1991) used a visual masking paradigm to investigate two issues--first, the role of edge information in determining the brightness of homogeneous regions, and second the temporal dynamics of perceptual filling-in. They reasoned that if the filling-in process involves some form of activity-spreading, it may be possible to demonstrate its existence by interrupting it. If boundaries interrupt filling-in, what happens when new borders are introduced? Is the filling-in process affected before it is complete?
Figure 15 shows the paradigm they used as well as the basic result. The target is presented first and is followed at variable intervals by a mask. For intervals on the order of 50-100msec the brightness of the central area of the disk is greatly reduced. If the mask is presented after 100msec, the brightness of the central region is largely unaffected. The most striking result was that the brightness suppresion depended on the distance between target and mask. In particular, for larger distances maximal suppression occurred at later times.
Paradiso and Nakayama's results are consistent with the hypothesis that brightness signals are generated at the borders of their target stimuli and propagate inward at a rate of 110-150deg/sec (6.7-9.2 msec/deg). The idea that contours interrupt the propagation is perhaps clearest for the case where a circular mask is introduced, resulting in a dark center, for the brightness originating from the target border seems to be "blocked." Paradiso and Nakayama discuss several alternative accounts, such as lateral inhibition processes, but do not consider them to be plausible explanations of their findings.
Some of these results were anticipated in an earlier study by Stoper and Mansfield (1978). They employed a masking paradigm in which the masks were varied systematically in time. They interpreted their "area suppresion" effects as resulting from the interference of a mask with the process of filling-in of target brightness. Their paradigm enabled them to show that brightness suppression could not be due simply to contour suppression, thereby indicating that brightness and contour processes are subserved by independent systems.
The filling-in model of brightness perception proposed by Grossberg and Todorovic' (1988) has been shown by Arrington (1994) to produce excellent fits to the data from both Stoper and Mansfield (1978) and Paradiso and Nakayama (1991). This sort of close link between psychophysics, neurophysiology, and modeling seems especially promising for investigating the mechanisms responsible for perceptual completion.
Another relevant study comes from De Valois, Webster, and De Valois (1986). They employed center-surround standard (reference) and matching (variable) stimuli, similar to the ones used in classic contrast studies. They compared the results of direct changes in brightness or color where the center of the standard pattern was modulated (as was the matching pattern), to the changes that occurred when the surround was modulated sinusoidally while the center was kept constant at the mean level. These two conditions were referred to as "direct" and "induced" respectively. The purpose of the experiments was to measure the brightness and color changes produced by oscillations at various temporal frequencies between 0.5 and 8 Hz. Their studies revealed two main findings: (1) The temporal frequencies studied had little effect on the apparent brightness change in the direct condition; color variations in the direct condition were present but small. (2) In the induced condition, the amount of brightness change fell drastically as the temporal frequency increased (around 2.5 Hz).[11]
These results can be interpreted in terms of a spreading mechanism of induction that occurs over time, one that would provide a spatially continuous representation for filling-in. Brightness and color signals would be generated at the edges between center and surround, and would propagate inside the center region determining the appearance. An optimal temporal frequency would reflect the time interval necessary for the signal to propagate from the edges. The drastic fall-off found by De Valois et al. would result from a change in the surround before the edge signal was able to reach the middle of the center region.
Rossi and Paradiso (1996) have replicated the brightness induction results of De Valois et al. (1986) and have studied the role of pattern size on the effect by varying the spatial frequency of the inducing pattern. The correlation found between spatial scale, degree of induction, and cutoff frequency indicates that there is a limited speed at which induction proceeds and that larger areas take more time to induce. Rossi and Paradiso conclude that the limits on the rate of induction are consistent with an active filling-in mechanism initiated at the edges and propagated inward.
In a remarkable study, Rossi, Rittenhouse, and Paradiso (Rossi et al. 1996) showed that a significant percentage of neurons in cat primary visual cortex respond in a manner that correlates with perceived brightness, rather than responding strictly to the light level in the receptive field of the cells. Rossi et al. studied cell responses in conditions analogous to the direct and induced conditions studied psychophysically by De Valois et al. (1986) and Rossi & Paradiso (1996). In the induced condition, neural responses were largest at low temporal frequencies and decreased as the rate of modulation increased over 1.0Hz. In the direct condition, however, response amplitudes progressively increased with increasing temporal frequencies. These results, as well as other findings, are closely paralleled by psychophysical findings, suggesting that such cell responses may contribute to the perception of brightness.[12]
The studies discussed in this section provide strong evidence for featural filling-in. In brightness filling-in the brain seems to be providing something, and it seems to be doing so through a roughly continuous propagation of signals, a process that takes time. On the other hand, ignoring a region by jumping to the conclusion that it has the same label as its surround doesn't take time in the same way: although labeling would involve brain processes with their own temporal limitations, there seems no reason to suppose that it would be subject to the same kind of temporal constraints as those involved in signals having to propagate through some spatially extended area.
Many studies of perceptual completion have provided results based directly on an observer's report of the percept. For example, an observer is asked to give a verbal report of what he sees in his blind spot given surrounding stimulation (e.g., Brown & Thurmond 1993). In other cases, subjects are asked to draw the shape of an occluded figure (Moravec & Beck 1986; Takeichi et al. 1995) Some studies simply ask subjects to indicate whether they perceive a given completion (as in Watanabe and Cavanagh 1995). Although many of these studies provide important information about the type of perceptual completion, how they are to be interpreted in relation to neural processes is unclear.
On the other hand, many experiments have probed perceptual completion by directly assessing the effects of the completion processes themselves. By investigating whether there are measurable effects of completion it becomes possible to evaluate more precisely the mechanisms involved. Dennett (1993, p. 208) raises this point in a useful way: "The way to test my hypothesis that the brain does not bother filling-in the 'evidence' for its conclusion is to see if there are any effects that depend on the brain's having represented the step, rather than just the conclusion... The detail would not just seem to be there; it would have to be there to explain some effect." In fact, visual scientists have investigated this issue of whether there are measurable effects of filling-in, as the following discussion of some relevant studies will demonstrate.
The motion aftereffect consists in the perception of motion when one views, say, a screen containing stationary dots, after being exposed to motion (of the opposite direction) during a previous adaptation phase. Murakami (1994) studied the motion aftereffect after monocular adaptation to filled-in motion at the blind spot. Does the region of the blind spot (which contains no photoreceptors and so is not stimulated) also generate an aftereffect?
Instead of directly assessing whether a regular aftereffect is produced, Murakami assessed the interocular transfer of the effect, that is, whether the motion aftereffect could be measured at the corresponding visual field of the other eye. It is well known that a standard motion aftereffect transfers interocularly. Murakami found that the aftereffect also transfers interocularly in the blind spot case. In other words, adaptation to filled-in motion at the blind spot of one eye can cause a motion aftereffect at the corresponding visual field of the other eye. This result provides evidence for the perception of real motion and the perception of filled-in motion sharing a common neural pathway in an early stage of the visual system (see also note 8). If the brain treated perceptually completed motion at the blind spot and real motion differently, then one would not expect the motion aftereffect to transfer. Murakami's study thus provides a measurable effect of what appears to be the brain's having taken the trouble to fill in the motion at the blind spot, though not necessarily in a topographic manner.
On the other hand, in an earlier study, Cumming and Friend (1980) compared the strength of the tilt aftereffect induced by (partial) gratings completed across the blind spot with control gratings. Gratings were seen as completed across the blind spot, but the magnitude of the tilt aftereffect they induced suggested that the perceptually completed portions of the gratings did not contribute to the aftereffect. This negative result has to be interpreted with care, however. One cannot rule out the possibility of completion contributing to the effect because the mechanisms involved may be at a "higher" processing level than the ones involved in the effect being probed (the tilt aftereffect).
Several studies have shown that the changes in perceived color associated with color filling-in due to stabilized retinal images produce indirect effects that can be measured psychophysically. Here we briefly review some recent studies.
The sensitivity to a small-field, flickering, blue test-light is significantly altered by adaptation to yellow light. Piantanida (1985) studied whether yellow-adaptation induced by the filling-in of a stabilized image would have the same effect on S-cone flicker sensitivity when compared to an actual yellow light illuminating the retina. His results showed that a yellow background induced by the filling-in of a stabilized image is as effective in reducing flicker sensitivity as an actual yellow background applied directly to the retina.
According to current thinking, flicker sensitivity can be affected by attenuation of the signals in the corresponding pathways. For example, a yellow background is known to reduce the flicker sensitivity of the S-cone system. Pugh and Mollon (1979) have proposed a two-stage model, where attenuation would occur at the S-cones themselves and at a site where S-cone signals interact antagonistically with other classes of cones. The precise mechanisms posited to account for these effects need not concern us here, but it is clear that an actual yellow light modifies the state of possibly several stages in the visual system. At the same time, Piantanida's results show that the perception of yellow in a region that is physically dark produces equivalent results. His results suggest that the perception of yellow due to perceptual filling-in is associated with the same types of changes that occur when the visual system is presented with an actual yellow light.
Nerger et al. (1993) investigated how hue cancelation and increment thresholds are affected for backgrounds that are retinally stabilized compared to non-stabilized backgrounds. They employed a disk-annulus stimulus where the outer border was always unstabilized, and the disk could be either stabilized or not. The disk and annulus were illuminated by either 575 or 640nm light; when one was 575 nm, the other was 640 nm. In the non-stabilized condition, subjects saw, for example, a red disk surrounded by a yellow annulus. In the stabilized condition, the color of the 640nm disk changed from red to yellow. When subjects were asked to perform a hue cancellation of the test probe (so that it appeared neither reddish nor greenish), the stabilized and non-stabilized conditions produced different settings. (Note, however, that in both conditions the same 640nm disk was imaged on the retina. )This result shows that the color appearance of the background can influence the color of lights superimposed on it. Nerger et al. also evaluated increment thresholds (where an incremental test flash has to be detected) for both conditions; they showed that these did not differ.[13] Thus increment thresholds are unaffected by the appearance of the adapting field and depend only on its spectral energy distribution. Nerger et al. propose a two-site model, where filling-in affects the second (color-opponent) adaptation site, but not the first site where gain changes occur.
Dresp (1992) asked whether threshold elevation (for the detection of a small light spot) would be similar throughout an illusory Kanizsa square. She reasoned that if filling-in mechanisms were responsible for producing uniform brightness levels throughout the figure, thresholds should be similar at the center of a square and near inducing elements. The results instead showed that thresholds decreased at the center of the figure.
Although Dresp interprets her results as being at odds with a brightness filling-in mechanism responsible for illusory figure brightness, there are several critical points that have to be considered. "Increment sensitivity and subjective brightness are not necessarily related by any simple function" (Fiorentini, 1972, p. 195; see also Cornsweet & Teller 1965). For example, increment thresholds follow (more or less) the brightness of light Mach bands (Békéséy 1968), but not of dark Mach bands (see also Burkhardt 1966). Although a positive finding--constant threshold elevation throughout--would provide some indication of the levels within the visual system where contrast elevation and brightness processes are operative, a negative finding such as Dresp's is hard to interpret because threshold elevation might be probing rather early processes.
Paradiso et al. (1989) studied whether adaptation to illusory contours produces tilt aftereffects comparable to those obtained for regular real lines. They initially established that illusory contours used in both adaptation and test phases produce strong tilt aftereffects. Can adaptation to illusory contours induce an aftereffect when real lines are used in the test phase (or vice versa)? Paradiso and colleagues showed that the answer is yes. Adaptation to real lines induces a strong aftereffect when testing with illusory contours, but a significantly weaker aftereffect is obtained when adaptation to illusory contours is used and real lines are tested. The authors attribute this asymmetry to the corresponding asymmetry in the distribution of receptive field types in areas V1 and V2 (cells responding to illusory contour stimuli are typically found only in V2).
In another experiment, Paradiso et al. (1989) evaluated the degree of interocular transfer of the tilt aftereffect, which was found to be stronger when the test stimulus was illusory than when it was real. As the authors observe, these results are consistent with the idea that neural mechanisms activated by illusory contours are more binocular than those activated by real lines (assuming that cortical binocularity underlies interocular transfer).
In summary, the existence of a tilt aftereffect with illusory contours and its dependence on adaptation angle indicate the existence of orientation-selective neurons that respond to illusory contour stimuli. Moreover, the interocular transfer shows that real and illusory contours share an early visual pathway.
When two equiluminant colored fields abut, no clearly visible contour is seen. This occurs even for color differences of twice threshold (Eskew and Boynton 1987). This stimulus configuration can sometimes produce chromatic diffusion (Eskew and Boynton 1987; Eskew 1989). For small fields differentially exciting the S cones, the violet and green colors "bleed" across the (invisible) contour, thereby producing a larger uniform area after an initially apparent color difference sinks below threshold.
A remarkable property of the juxtaposed-color-patch stimulus configuration is that color discrimination may be severely impaired. When the two fields are separated slightly discrimination improves substantially. Boynton et al. (1977) named this effect the "gap effect." According to Boynton et al., the gap effect is related to a spatial averaging mechanism that integrates the two patches together. The small border or gap prevents such integration and improves sensitivity.
Eskew (1989, p. 717) suggested that chromatic diffusion (the perceptual bleeding of colors across an invisible contour) was related to a physical process of integration: "chromatic diffusion ... seemed as if it could be the visible appearance of such an integrating process, observed in real time." He determined the chromatic discrimination thresholds for juxtaposed fields as a function of stimulus duration--that is, the fields were flashed for a certain time. Discriminations were maximal at 400 milliseconds and declined linearly (on a log scale). Eskew interpreted the approximately exponential time-course of the decrease in sensitivity in terms of a diffusive mechanism. The optimal presentation time was linked to the lack of time for the integration process to reduce the color differences across the border. Note that for such exposure durations the introduction of the gap has little or no effect. This is not the case for longer exposure durations, where the introduction of the gap improves discrimination. Also consistent with a spatial diffusive mechanism is the finding by Eskew and Boynton (1987), in which the change in sensitivity as a function of time is reduced for a short wide stimulus when compared to a tall narrow one.[14]
Experimental investigations of the blind spot, stabilized images, illusory figures, and chromatic diffusion and the gap effect provide suggestive evidence about certain consequences of perceptual completion. In particular, they point to measurable effects that seem to depend on representing a presence rather than ignoring an absence.
We now return to the conceptual issues surrounding the neural-perceptual relation. We have seen that there is considerable evidence for neural filling-in. The main point of this section is that the existence of neural filling-in does not entail either analytic isomorphism or Cartesian materialism.
Discussions of neural filling-in have been closely tied to the doctrine of analytic isomorphism. Visual scientists sometimes interpret the evidence for neural filling-in within the framework of analytic isomorphism (see Section 8.1). On the other hand, Dennett rejects Cartesian materialism and with it neural filling-in: although he appears to concede that neural filling-in is an "empirical possibility," and says that he does not wish "to prejudge the question" (1991, p. 353), he nevertheless asserts that the "idea of filling in... is a dead giveaway of vestigial Cartesian materialism" (p. 344).
We agree that any argument for neural filling-in based on Cartesian materialism should be rejected. But the empirical case for neural filling-in as reviewed above can be separated from Cartesian materialism. Hence theories and models in visual science that appeal to neural filling in on the basis of such evidence need not be motivated by Cartesian materialism. One must distinguish sharply between the existence of neural filling-in as an empirical matter and Cartesian materialist interpretations of filling-in. Visual scientists are mistaken when they interpret the evidence for neural filling-in within the framework of analytic isomorphism, but it is equally mistaken to say that talk of filling-in has to mean a commitment to Cartesian materialism.
As we discussed in Section 2, the term "isomorphism" first gained prominence in visual science through the work of Köhler. Although the isomorphism concept has often been interpreted to mean a spatial or topographic correspondence, Köhler held that neural-perceptual isomorphism should be thought of as topological or functional. Our view is that there is nothing conceptually wrong with these sorts of isomorphism as such. Whether there are either spatial/topographic or topological/functional neural-perceptual isomorphisms in any given case is an empirical question for cognitive neuroscience to decide.
What we find problematic is the doctrine of analytic isomorphism, which holds that cognitive neuroscientific explanation requires the postulation of a "final stage" in the brain--a bridge locus--where there is an isomorphism between neural activity and how things seem to the subject. There are two critical points to be made here, one concerning the role played by the concept of the bridge locus and the other concerning the concept of isomorphism.
In their framework for mapping between the neural and the perceptual domains, Teller and Pugh (1983) call the neural structure that "forms the immediate substrate of visual perception" the bridge locus. They write:
Most visual scientists probably believe that there exists a set of neurons with visual system input, whose activities form the immediate substrate of visual perception. We single out this one particular neural stage, with a name: bridge locus. The occurence of a particular activity pattern in these bridge locus neurons is necessary for the occurence of a particular perceptual state; neural activity elsewhere in the visual system is not necessary. The physical location of these neurons in the brain is of course unknown. However, we feel that most visual scientists would agree that they are certainly not in the retina. For if one could set up conditions for properly stimulating them in the absence of the retina, the correlated perceptual state would presumably occur (Teller & Pugh, 1983; p. 581.)
This passage expresses a number of different ideas that need to be disentangled. First, Teller and Pugh state explicitly that a particular pattern of activity at the bridge locus is necessary for the occurence of a particular perceptual state. But at the end of the passage they also explicitly state that retinal stimulation is probably not necessary (assuming one could stimulate the bridge locus neurons directly), thereby suggesting that the bridge locus activity pattern is sufficient for the perceptual state. Therefore, it seems that part of what they mean by the "bridge locus" is a particular set of neurons having a particular pattern of activity that is necessary and sufficient for a particular perceptual state. Second, in calling the bridge locus a particular "neural stage," and in saying that this stage is not likely to be found in the retina, Teller and Pugh seem to be conceiving of the bridge locus in a localizationist manner as a particular cortical region or area.
Analytic isomorphism relies on the concept of the bridge locus. Consider the following statement by Todorovic' (1987, p. 549): "A logical consequence of the isomorphistic approach is that a neural activity distribution not isomorphic with the percept cannot be its ultimate neural foundation." By "ultimate neural foundation" Todorovic' indicates that he means the bridge locus. The doctrine of analytic isomorphism states that it is a condition on the adequacy of cognitive neuroscientific explanation that there be an ultimate neural foundation where an isomorphism obtains between neural activity and the subject's experience.
We are suspicious of this notion of the bridge locus. Why should there have to be one particular neural stage whose activity forms the immediate substrate of visual perception? Such a neural stage is not logically necessary; moreover--to borrow Ratliff and Sirovich's point about neural filling-in--parsimony demands that any such stage be considered only if neurophysiological evidence for it should appear. On this score, however, the evidence to date does not seem to favor the idea. First, brain regions are not independent stages or modules; they interact reciprocally due to dense forward and backward projections, as well as reciprocal cross-connections (Zeki & Shipp 1988). There is ample evidence from neuroanatomy, neurophysiology, and psychophysics of the highly interactive, context-dependent nature of visual processing (DeYoe & Van Essen 1995). Second, cells in visual areas are not mere "feature detectors," for they are sensitive to many sorts of attributes (Martin 1988; Schiller 1995). One of the main ideas to emerge from neuroscience in recent years is that the brain relies on distributed networks that transiently coordinate their activities (Singer 1995; Vaadia et al. 1995), rather than centralized representations. Finally, Dennett and Kinsbourne (1992) have argued that the notion of a single neural stage for consciousness hinders our ability to make sense of neural and psychophysical data about temporal perception.
Some of these critical points could perhaps be met by relying on a less localizationist conception of the bridge locus, which, as Todorovic' (1987, p. 550) observes, is probably an "oversimplified notion," for "there is no compelling reason to believe that the bridge locus is confined to neurons of a single type within a single cortical area."[15] Although this is a step in the right direction, the term "bridge locus"--defined as "the location [our emphasis] at which the closest associations between [Phi] [physiological] and [Psi] [psychological] states occur" (Teller & Pugh 1983, p. 588)--does not strike us as particularly useful for thinking about the distributed neural correlates of perceptual experience. For example, such correlates might involve neural assemblies where membership is defined through a temporal code, such as response synchronization (Singer 1995; Varela 1995). For this reason, we think that the concept of the bridge locus should be abandoned.
To abandon the concept of the bridge locus means rejecting analytic isomorphism, for analytic isomorphism depends on this concept. Some visual scientists, however, reject analytic isomorphism while nevertheless adhering to the concept of the bridge locus. For example, Ratliff and Sirovich (1978) denied analytic isomorphism, but asserted that the neural processes involved in perception "must reach a final stage eventually." The notion of a "final stage" seems equivalent to the notion of the bridge locus. We would reject any framework that depends on the concept of the bridge locus, whether isomorphic or nonisomorphic.
We now return to the concept of isomorphism. A good example of what we object to in analytic isomorphism can be found in a statement made by Todorovic' (1987) in his discussion of "isomorphistic" versus "nonisomorphistic" theories of the Craik-O'Brien-Cornsweet effect. Todorovic' admits that any mapping from neural to perceptual states "is an aspect of the notorious mind-body problem," but then goes on to say: "conceptually the idea of an isomorphism between certain aspects of neural activity and certain aspects of percepts may be more acceptable [than a nonisomorphic mapping], at least within a general reductive stance that assumes that, at some level of description, perceptual states are neural states" (1987, p. 550). We disagree. On the one hand, as Todorovic' recognizes, and as Köhler himself observed over thirty years ago (Köhler 1960, pp. 80-81), the thesis of neural-perceptual isomorphism does not logically entail mind-brain identity. On the other hand, suppose one does assume that "at some level of description, perceptual states are neural states." Still, neural-perceptual analytic isomorphism would be plausible only if perceptual states are strictly identical to neural states (so that each type of perceptual state is identical to a particular type of neural state). But isomorphism would not be plausible if the identity is weak, that is, if perceptual states are multiply realizable with respect to neural states (so that, although every perceptual state is identical to some neural state, one and the same type of perceputal state can be realized in many different types of neural states, or in many different types of non-neural physical states for that matter). This issue of strong (or type) identity versus weak (or token) identity is indeed "an aspect of the notorious mind-body problem," and nothing that Todorovic'says favors the strong identity thesis. Hence no basis has been given for the a priori claim that isomorphism is conceptually preferable to nonisomorphism in cognitive neuroscientific explanation.
The final matters we wish to discuss are open-ended and programmatic, for they concern some of the broad conceptual and methodological issues raised by our discussion of filling-in. We have seen that arguments for filling-in based on either analytic isomorphism or Cartesian materialism must be rejected. During the course of our discussion, a fundamental conceptual point emerged, namely, that one cannot infer anything about the nature of the neural representational medium of visual perception from the character of the subject's perceptual content (see Section 5.1). For example, suppose one has a perceptual experience that there is something red in front of one. It does not follow on logical, conceptual, or methodological grounds that there is a spatial or pictorial representation of the red region in one's brain.
We think that the full significance of this conceptual point has to do with an important distinction--the distinction between the personal and the subpersonal (this terminology comes from Dennett 1978, pp. 153-154). One has to distinguish between attributions of content to the person or animal, and attributions of content to the brain or nervous system (McDowell 1994). Personal-level attributions treat the animal as an embodied whole embedded in an environment, and as constrained by norms of rationality. In contrast, attributions of content to the brain (e.g., the visual system) involve hypotheses about the animal's internal functional organization. In this section, we wish to show the revelance of this distinction to the filling-in controversy in visual science.
9.1 The personal/subpersonal distinction and task-level conceptions of vision
During the past two decades there has been considerable research into the subpersonal mechanisms of visual perception. One prominent research program, based on the work of Marr (1982), Poggio, and their colleagues (Poggio et al. 1985), conceives of vision as a kind of "inverse optics"--a process of producing representations in the brain of the three-dimensional layout of objects from the limited information encoded in the two-dimensional retinal image. The central idea of this approach is that the visual system has to construct an accurate representation of the world on the basis of the limited information available to the retina. Different, non-representational, lines of research have also emerged in the past two decades, however. In particular, the "ecological approach" of Gibson (1979) and his followers (Turvey et al. 1981), as well as more recent "active" and "animate vision" approaches (Aloimonos et al. 1988; Bajcsy 1988; Ballard 1991, 1996; Ballard et al. 1997), emphasize not the information available to the retina, but rather the information available to the animal as it explores its environment.
We think that the distinction between the personal and the subpersonal has a direct bearing on the debate between representational and non-representational approaches to visual perception (McDowell 1994; Noë 1995; Thompson 1995, pp. 232-242), and in turn on the filling-in controversy. Because the representational approach holds that vision comprises a set of complex information-processing tasks, it concentrates on representational/computational processes underlying our perceptual capabilities. These processes are all subpersonal, occurring within the animal's brain. In contrast, Gibson's ecological approach aims to give an account, not of what goes on inside the animal, but rather of what the active, probing animal itself accomplishes in its environment. As Gibson put it: "In my theory, perception is not supposed to occur in the brain but to arise in the retino-neuro-muscular system as an activity of the whole system" (1972, p. 217). "Perceiving is an achievement of the individual, not an experience in the theatre of consciousness" (1979, p. 239). The central point made here is clear: the proper subject of perception is not the brain, but rather the whole embodied animal interacting with its environment. We believe that this point can be accepted even by those who reject the details of Gibson's specific hypotheses.
The subpersonal level is important if we wish to understand the neural mechanisms and processes that underlie our perceptual capabilities. But the subpersonal level has influenced visual science to such an extent that the perceptual subject--the person or animal--has been neglected (with notable exceptions such as Gibson). We find this neglect unacceptable. Attention to the animal as the subject of perception is important for two interconnected reasons: first, it corrects certain conceptual problems that often emerge in the subpersonal, representational understanding of vision; and second, it suggests a better kind of task-level analysis of vision than that found in the representational approach.
Most computational and neural network models of vision are "image-based" in the sense that they follow Marr's idea that "vision is the process of discovering from images what is present in the world, and where it is" (Marr 1982, p. 2). The images are patterns of light on the retinal array, and to represent what is present in the world and where it is, the content contained in the images must be extracted and reconstructed through complex internal processing. This account of vision is subpersonal because the animal, the perceptual subject, has no place in it. The problem with such subpersonal answers to the question, "what is vision?", is that they lead to conceptual confusions and thus to an unsatisfactory task-level account of vision. Consider, for example, another statement from Marr: "The purpose of these representations [the primal sketch and the 21/2 -D sketch] is to provide useful descriptions of aspects of the real world" (1982, p. 43). Who is reading the descriptions? Such an approach seems guilty of the fallacy of supposing that there is an homunculus in the head whose job it is to view the incoming information (Thompson 1995, p. 234-235; for further discussion see Noë 1995). The animal, on the other hand, simply sees aspects of the world. At the level of the animal, there are no images, representations, or descriptions in visual perception (except of course when viewing something in the world that is an image, representation, or description); there is rather a perception-action system that enables the animal to visually guide its activity and thereby visually explore its environment. Thus by attending carefully to the level of the animal--the personal level--we arrive at a task-level conception of vision different from the representational one: the task of vision is not to produce representations from images, but rather to discover through a perceptual system what is present in the world, and where it is (Thompson 1995; McClamrock 1995).
Although this kind of task-level conception of visual perception derives from Gibson (1966, 1979), it is clearly evident in other recent cognitive science research programs, such as "active" and "animate" vision (Aloimonos et al. 1988; Bajcsy 1988; Ballard 1991, 1996; Ballard et al. 1997), embodied AI (Brooks 1991), autonomous systems (Varela & Bourgine 1992), and enactive perception and cognition (Varela et al. 1991; Thompson et al. 1992; Clark 1996). The main idea held in common by these research programs is that proper task-level analyses of perception and cognition are "activity-based" (Brooks 1991)--for example, the task of vision is to guide activity or behavior (such as hand-eye coordination in the manipulation of objects), rather than to construct an elaborate internal model of a scene. The need for representations is minimized through reliance on the perceptually guided action of the animal or system as a whole. As Brooks (1991, p. 139) observes, it is "better to use the world as its own model" than to suppose the world has to be represented in the head. O'Regan (1992, p. 484) expresses the same idea when he suggests that the environment provides an "external memory" for the animal to probe as the need arises.
What is the relevance of the personal/subpersonal distinction to the filling-in controversy? To a large extent, invocations of filling-in as a theoretical category, especially those based on either analytic isomorphism or Cartesian materialism, depend on the subpersonal, representational conception of the task of vision. Indeed, on this conception, filling-in provides a paradigm of the kind of construction on which vision depends: the job of filling-in is to complete images or representations in the brain. For example, Grossberg (1987a, p. 93) writes: "The images that reach the retina can be occluded and segmented by the veins in several places. Somehow, broken retinal contours need to be completed and occluded retinal color and brightness signals need to be filled-in. Holes in the retina, such as the blind spot or certain scotomas, are also not visually perceived... due to a combination of boundary completion and filling-in processes..." And, in another article with Mingolla: "Without featural filling-in, we would perceive a world of colored edges, instead of a world of extended forms" (Grossberg & Mingolla 1985, p. 175). But to reject the representational conception of vision in favor of an animal-centered and activity-based conception is to downgrade the importance of filling-in as a theoretical category in the explanation of vision. This point reinforces from a different angle our earlier points that propositions about filling-in as a whole need careful consideration, and that evidence for neural filling-in must be evaluated on a case-by-case basis. As we argued earlier (Sections 6 & 7), we believe that in particular cases there is evidence for neural filling in. But our present point is that, once shorn of its connections to the representational conception of vision, such filling-in seems a shadow of its former self.
In the previous section, we argued for the importance of the distinction between the personal and the subpersonal in the understanding of vision. In this final section, we wish to discuss a particular assumption about perceptual content that plays a role in certain criticisms of filling-in (Dennett 1991; O'Regan 1992), and that results from neglecting the personal level. The assumption is that there is no difference in kind between perceptual content at the personal level and neural content at the subpersonal level. We reject this uniformity of content thesis. We hold that there is a difference in kind between the content of visual perception at the personal level and the content of neural states at the subpersonal level: perceptual content pertains to the animal as a whole interacting with its environment, and requires for its description an animal-centered task-level account of vision, whereas neural content pertains to the animal's internal functional organization, and requires for its description levels of explanation concerned with internal processing.
To see the uniformity thesis at work we need to consider some examples. Our first example is taken from Dennett's discussion of filling-in (see Section 5.1); this example will be supplemented by two others from visual science.
Suppose someone walks into a room covered with wallpaper whose pattern is a regular array of hundreds of identical images of Marilyn Monroe (Dennett 1991, pp. 354-355; 1992). The person would report seeing that the wall is covered with hundreds of identical Marilyns. But the person can foveate only a few Marilyns at a time and the resolution of parafoveal vision is not good enough to discriminate between Marilyns and colored shapes. One can conclude that the brain represents that there are hundreds of identical Marilyns, but not that there is a spatial or pictorial representation of each identical Marilyn (see O'Regan 1992, pp. 474-475, 481). Conceptually, this example is analogous to the filling-in cases discussed earlier: just as the experience of a gapless visual field does not entail neural filling-in of the blind spot, so seeing that the wall is covered with hundreds of identical Marilyns does not entail a neural representation of each individual Marilyn. Indeed, in putting forth this example, Dennett conjectures that the brain does not bother to fill in the Marilyns, in the sense of propagating a high-resolution, foveated Marilyn image "across an internal mapping of an expanse of wall"; rather, the brain just "jumps to the conclusion that the rest are Marilyns, and labels the whole region 'more Marilyns' without any further rendering of Marilyns at all" (Dennett 1991, p. 355). Yet he goes on to say: "it does not seem that way to you. It seems to you as if you are actually seeing hundreds of identical Marilyns." The implication is that in some sense the person's experience of the Marilyns is mistaken or illusory, and the reason seems to be that there is no picture in the person's brain that represents each Marilyn distinctly. The blind spot is treated in a similar way. Here too the hypothesis is that the brain jumps to a conclusion, but again "it certainly does not seem that way from the 'first-person point of view'" (Dennett 1992, p. 47). The person's experience of the blind spot being filled in is an illusion because there is no picture in the brain being filled in (the brain is really just jumping to a conclusion in the senses discussed in Section 5.1). In general, the moral is supposed to be that although our field of view seems to be full of detail, the detail is actually an illusion (see also pp. 366, 408, 467-468). It is this conclusion and the reasoning behind it that depend, we think, on the uniformity thesis.
There are two problematic steps in the reasoning, both of which depend on the uniformity of content thesis. First, it is assumed that in the absence of a brain-level pictorical representation of each of the identical Marilyns the person cannot have a percept with the content that there are hundreds of identical Marilyns. What is striking about this reasoning is that it relies on analytic isomorphism: the underlying assumption is that it is a necessary condition of a person's having an experience that there be states in the person's brain isomorphic to how things are represented as being (for similar criticism, see Sedivy 1995, p. 475). Such analytic isomorphism is also evident in Dennett's idea that for consciousness to be really continuous, the subpersonal neural processes would have to be continuous, but they are not, so the continuity of consciousness is an illusion: "One of the most striking features of consciousness is its discontinuity. Another is its apparent continuity. One makes a big mistake if one attempts to explain its apparent continuity by describing the brain as 'filling in' the gaps" (Dennett 1992, p. 48). In Dennett's case, the analytic isomorphism appears to be driven by the uniformity thesis--by the idea that perceptual content at the personal level just is the content of brain states at the subpersonal level. We accept the general thesis that facts about brain-level content determine what the person sees or experiences, but we deny that the general thesis entails the uniformity thesis, and that there must be an isomorphic neural representation of the perceptual content.
The second problem comes from making the following assumption about perceptual experience: in having an experience of (for example) hundreds of identical Mariyns on the wall, it seems to one that the Marilyns are all there in one's mind or brain. Thus Dennett says of someone who claims to see all the Marilyns: "The hundreds of Marilyns in the wallpaper seem to be present in your experience, seem to be in your mind, not just on the wall... But why should your brain bother importing all those Marilyns in the first place?" (Dennett 1991, pp. 359-360). Once again, the conclusion being drawn is that the person's experience of the Marilyns is mistaken, and the reason given is that there is no picture in the brain that represents each Marilyn distinctly. The reasoning depends on the assumption that it seems to the person that there is such a picture in his or her mind or brain. Put more explicitly: the reasoning depends on the idea that visual experience is pictorial, in the sense that to have a visual experience that is really of hundreds of identical Marilyns is to have a picture in the mind or brain with precisely that content. Clearly, to think of visual experience as being pictorial in this way depends on the uniformity of content thesis.
The assumption of the pictorial nature of visual experience is widespread in visual science, as we have seen in considering the analytic isomorphism argument for filling-in (see Todorovic' 1987, and our discussion of the Craik-O'Brien-Cornsweet effect in Section 1). But it also plays a role in the interpretations given to some experimental studies by researchers critical of filling-in. For example, building on Dennett's discussion, Blakemore et al. (1995) investigated the ability to register changes in visual scenes across saccadic eye movement. In the first experiment, they compared cases where the image changed (or did not change), and moved in an unpredictable direction (forcing a saccadic eye movement), with cases where the image stayed in the same place, and changed (or did not change). The image changes involved the appearance, disappearance, or rotation of an object in the scene. Blakemore et al. found that when the image did not move, subjects reliably detected the changes, but when the image moved their performance fell to chance. In the second experiment, they compared cases where the image changed and moved (as in the first experiment), with cases where the image changed but stayed in the same place, and a mid-gray interstimulus interval separated the two images in time. This "gray-out" condition was designed to mimic what happens during a saccade. In the gray-out condition the subjects' performance was considerably reduced, though not to chance levels. Blakemore et al. interpret their results as showing "the fragility of [transsaccadic] visual memory for a complex scene" (p. 1080). They write: "we believe that we see a complete, dynamic picture of a stable, uniformly detailed, and colourful world," but "[o]ur stable visual world may be constructed out of a brief retinal image and a very sketchy, higher-level representation along with a pop-out mechanism to redirect attention. The richness of our visual world is, to this extent, an illusion"(p. 1075).
O'Regan, Rensink, and colleagues have also contributed important studies on the ability to perceive changes in scenes, and have come to the same conclusion as Dennett and Blakemore et al. (Rensink et al. 1996; O'Regan et al. 1996; for a discussion of other relevant research, going back to the seventies, see Grimes 1996). In one study (Rensink et al. 1996), an image of a natural scene was continually alternated with a modified image, with a blank field inserted between each display. The duration of each image was 240 milliseconds; the blank field lasted 80 milliseconds. The modified image was the same as the original except for one change that involved either the removal of an object present in the original scene, or a change in the color or spatial position of an object. Subjects found the changes very difficult to notice under these "flicker conditions," even though the changes were large and easily observable under normal conditions. Rensink et al. interpret these results as indicating that attention is required