To be published in Behavioral and Brain Sciences (2001) 24:5 (issue number to be determined)
© Cambridge University Press 2001



Below is the unedited final draft of a BBS target article that has been accepted for publication. This preprint has been prepared for potential commentators who wish to nominate themselves for formal commentary invitation. Please do not write a commentary unless you receive a formal invitation.

A sensorimotor account of vision and visual consciousness

J. Kevin O'Regan
Laboratoire de Psychologie Expérimentale,
Centre National de Recherche Scientifique,
Université René Descartes,
92774 Boulogne Billancourt,
France
oregan@ext.jussieu.fr
http://nivea.psycho.univ-paris5.fr


Alva Noë

Department of Philosophy,
University of California at Santa Cruz,
Santa Cruz,
CA 95064
anoe@cats.ucsc.edu
http://www2.ucsc.edu/people/anoe/

Abstract:

Many current neurophysiological, psychophysical and psychological approaches to vision rest on the idea that when we see, the brain produces an internal representation of the world. The activation of this internal representation is assumed to give rise to the experience of seeing. The problem with this kind of approach is that it leaves unexplained how the existence of such a detailed internal representation might produce visual consciousness. An alternative proposal is made here. We propose that seeing is a way of acting. It is a particular way of exploring the environment. Activity in internal representations does not generate the experience of seeing. The outside world serves as its own, external, representation. The experience of seeing occurs when the organism masters what we call the governing laws of sensorimotor contingency. The advantage of this approach is that it provides a natural and principled way of accounting for visual consciousness, and for the differences in the perceived quality of sensory experience in the different sensory modalities. Several lines of empirical evidence are brought forward in support of the theory, in particular: evidence from experiments in sensorimotor adaptation, visual "filling in", visual stability despite eye movements, change blindness, sensory substitution, and color perception.


Keywords:

action, change blindness, consciousness, experience, perception, qualia, sensation, sensorimotor.


Kevin O'Regan moved to Paris in 1975 after studying theoretical physics in England, to work in experimental psychology at the Centre National de Recherche Scientifique. After his PhD on eye movements in reading he showed the existence of an optimal position for the eye to fixate in words. His interest in the problem of the perceived stability of the visual world led him to question established notions of the nature of visual perception, and to recently discover, with collaborators, the phenomenon of "change blindness". His current work involves exploring the empirical consequences of the new approach to vision.

Alva Noë is a philosopher at the University of California, Santa Cruz. He received a Ph.D. in philosophy from Harvard University and a B. Phil. from Oxford University, and he has been a Research Associate of the Center for Cognitive Studies at Tufts University. He has published articles on topics in the philosophy of perception, philosophy of mind, and other areas, including a previous Behavioral and Brain Sciences target artice on perceptual completion. He is currently at work on a book on the relation between perception and action, and he is a co-editor of Vision and Mind: Selected Readings in the Philosophy of Perception (MIT Press, forthcoming).


Table of Contents


1 INTRODUCTION

1.1 The puzzle of visual experience

What is visual experience and where does it occur?

It is generally thought that somewhere in the brain an internal representation of the outside world must be set up which, when it is activated, gives us the experience that we all share of the rich, three-dimensional, colorful world. Cortical maps -- those cortical areas where visual information seems to be retinotopically organized --  might appear to be good candidates for the locus of perception.

Cortical maps undoubtedly exist, and they contain information about the visual world. But the presence of these maps and the retinotopic nature of their organization cannot in itself explain the metric quality of visual phenomenology. Nor can it explain why activation of cortical maps should produce visual experience. Something extra would appear to be needed in order to make excitation in cortical maps provide, in addition, the subjective impression of seeing.

A number of proposals have come forth in recent years to suggest how this might come about. It has for example been suggested, from work with blindsight patients, that consciousness in vision may derive from a "commentary " system situated somewhere in the fronto-limbic complex (taken to include prefrontal cortex, insula and claustrum (c.f. Weiskrantz [1997] p. 226). Crick & Koch [1990], Llinas & Ribary [1993] and Singer [1993]; [1995] suggest that consciousness might be correlated with particular states of the brain involving coherent oscillations in the 40-70 Hz range, which would serve to bind together the percepts pertaining to a particular conscious moment[1]. Penrose [1994] and Hameroff [1994] suggest that the locus of consciousness might be a quantum process in neurons' microtubules. Edelman [1989] holds that re-entrant signaling between cortical maps might give rise to consciousness. A variety of other possibilities that might constitute the "neural correlate of consciousness" has been compiled by Chalmers [1996b].

A problem with proposals of this kind is that they do little to elucidate the mystery of visual consciousness (as pointed out by, for example, Chalmers [1996b]). For even if one particular mechanism -- for example coherent oscillations in a particular brain area -- were proven to correlate perfectly with behavioral measures of consciousness, the problem of consciousness would simply be pushed back into a deeper hiding place: the question would now become, why and how should coherent oscillations ever generate consciousness? After all, coherent oscillations are observed in many other branches of science, where they do not generate consciousness. And even if consciousness is assumed to arise from some new, previously unknown mechanism, such as quantum-gravity processes in tubules, the puzzle still remains as to what exactly it is about tubules that allows them to generate consciousness, when other physical mechanisms do not.

1.2 What are sensory modalities?

In addition to the problem of the origin of experience discussed in the preceding paragraphs, a second problem concerns the differences in the felt quality of visual experience. Why is the experience of red more like the experience of pink than it is like that of black? And, more generally, why is seeing red very different from hearing a sound or smelling a smell?

It is tempting to think that seeing red is like seeing pink because the neural stimulation going on when we see something red is similar to that underlying our perception of pink: almost the same ratios of long, medium and short wavelength photoreceptors will be stimulated by red and pink. But note that though this seems reasonable, it does not suffice: there is no a priori reason why similar neural processes should generate similar percepts[2]. If neural activity is just an arbitrary code, then an explanation is needed of what sensory experience will be associated with each element of the code. Why, for example, should more intense neural activity provoke more intense experiences? And what exactly is the mapping function: is it linear, logarithmic, or a power function? And why is it one of these rather than another? Even these questions leave open the more fundamental question of how a neural code could ever give rise to experience at all.

Not very much scientific investigation has addressed this kind of question. Most scientists seem satisfied with some variant of Müller's [1838] classic concept of "specific nerve energy". Müller's idea, in its modern form,[3] amounts to the claim that what determines the particularly visual aspect of visual sensations is the fact that visual sensations are transmitted by particular nerve pathways (namely those originating in the retina and not in the cochlea) that project to particular cerebral regions (essentially cortical area V1). It is certainly true that retinal influx comes together in relatively circumscribed areas of the brain, and that this may provide an architectural advantage in the neural implementation of the calculations necessary to generate visual-type sensations. But what is it about these pathways that generates the different sensations? Surely the choice of a particular subset of neurons or particular cortical regions cannot, in itself, explain why we attribute visual rather than auditory qualities to this influx. We could suppose that the neurons involved are of a different kind, with, say, different neurotransmitters, but then why and how do different neurotransmitters give rise to different experiences? We could say that the type of calculation done in the different cortical areas is different, but then we must ask, how could calculations ever give rise to experience? The hard work is left undone. Much still needs to be explained.

1.3 An alternative approach: the sensorimotor contingency theory

The present paper seeks to overcome the difficulties described above by adopting a different approach to the problem of visual experience. Instead of assuming that vision consists in the creation of an internal representation of the outside world whose activation somehow generates visual experience, we propose to treat vision as an exploratory activity. We then examine what this activity actually consists in. The central idea of the new approach will be that vision is a mode of exploration of the world that is mediated by knowledge of what we call sensorimotor contingencies. We show that problems about the nature of visual consciousness, the qualitative character of visual experience, and the difference between vision and other sensory modalities, can now, from the new standpoint, all be approached in a natural way, without appealing to mysterious or arcane explanatory devices.

2 THE STRUCTURE OF VISION

As stated above, we propose that vision is a mode of exploration of the world that is mediated by knowledge, on the part of the perceiver, of what we call sensorimotor contingencies. We now explore this claim in detail.

2.1 Sensorimotor contingencies induced by the visual apparatus

Imagine a team of engineers operating a remote-controlled underwater vessel exploring the remains of the Titanic, and imagine a villainous aquatic monster that has interfered with the control cable by mixing up the connections to and from the underwater cameras, sonar equipment, robot arms, actuators and sensors. What appears on the many screens, lights and dials, no longer makes any sense, and the actuators no longer have their usual functions. What can the engineers do to save the situation? By observing the structure of the changes on the control panel that occur when they press various buttons and levers, the engineers should be able to deduce which buttons control which kind of motion of the vehicle, and which lights correspond to information deriving from the sensors mounted outside the vessel, which indicators correspond to sensors on the vessel's tentacles, etc.

There is an analogy to be drawn between this example and the situation faced by the brain. From the point of view of the brain, there is nothing that in itself differentiates nervous influx coming from retinal, haptic, proprioceptive, olfactory and other senses, and there is nothing to discriminate motor neurons that are connected to extraocular muscles, skeletal muscles or any other structures. Even if the size, the shape, the firing patterns, or the places where the neurons are localized in the cortex differ, this does not in itself confer them with any particular visual, olfactory, motor or other perceptual quality.

On the other hand, what does differentiate vision, from audition or touch, say, is the structure of the rules governing the sensory changes produced by various motor actions, that is, what we call the sensorimotor contingencies governing visual exploration. Because the sensorimotor contingencies within different sensory domains (vision, audition, smell, etc.) are subject to different (in)variance properties, the structure of the rules that govern perception in these different modalities will be different in each modality.

A first law distinguishing visual percepts from perception in other modalities is the fact that when the eyes rotate, the sensory stimulation on the retina shifts and distorts in a very particular way, determined by the size of the eye movement, the spherical shape of the retina and the nature of the ocular optics. In particular, as the eye moves, contours shift and the curvature of lines changes. For example, as shown in Figure 1, if you are looking at the midpoint of a horizontal line, the line will trace out a great arc on the inside of your eyeball. If you now switch your fixation point upwards, the curvature of the line will change; represented on a flattened-out retina, the line would now be curved. In general, straight lines on the retina distort dramatically as the eyes move, somewhat like an image in a distorting mirror.

Similarly, because of the difference in sampling density of the retinal photoreceptors in central and in peripheral vision, the distribution of information sensed by the retina changes drastically, but in a lawful way, as the eye moves. When the line is looked at directly, the cortical representation of the straight line is fat in the middle and tapers off to the ends. But when the eye moves off of the line, the cortical representation peters out into a meager, banana-like shape, and the information about color is radically undersampled, as shown in the bottom right hand panel of Figure 1. Another law that characterizes the sensorimotor contingencies that are particular to visual percepts is the fact that the flow pattern on the retina is an expanding flow when the body moves forwards, and contracting when the body moves backwards. Visual percepts also share the fact that when the eyes close during blinks, the stimulation changes drastically, becoming uniform (i.e. the retinal image goes blank).

Figure 1. Top: The eye fixates the middle of a straight line and then moves to a point above the line. The retinal stimulation moves from a great arc on the equator of the eye to a different, smaller great arc. Bottom left: flattened out retina showing great arc corresponding to equator (straight line) and off-equator great arc (curved line). Triangles symbolize color-sensitive cone photoreceptors, discs represent rod photoreceptors. Size of photoreceptors increases with eccentricity from the center of the retina. Bottom right: cortical activation corresponding to stimulation by the two lines, showing how activation corresponding to a directly fixated straight line (large central oblong packet tapering off towards its ends) distorts into a thinner, banana shaped region, sampled mainly by rods, when the eye moves upwards. As explained in Section 2.2, if the eye moves along the straight line instead of upwards, there would be virtually no change at all in the cortical representation. This would be true even if the cortical representation were completely scrambled. This is the idea underlying the theory that shape in the world can be sensed by the laws obeyed by sensorimotor contingencies.

In contrast to all these typically visual sensorimotor contingencies, auditory sensorimotor contingencies have a different structure They are not, for example, affected by eye movements or blinks. They are affected in special ways by head movements: rotations of the head generally change the temporal asynchrony between left and right ears. Movement of the head in the direction of the sound source mainly affects the amplitude but not the frequency of the sensory input.

We therefore suggest that a crucial fact about vision is that visual exploration obeys certain laws of sensorimotor contingency. These laws are determined by the fact that the exploration is being done by the visual apparatus.

In summary: the sensorimotor contingencies discussed in this section are related to the visual apparatus and to the way three-dimensional objects present themselves to the visual apparatus. These sensorimotor contingencies are distinctive of the visual sense modality, and differ from the sensorimotor contingencies associated with other senses.

2.2 Sensorimotor contingencies determined by visual attributes

Real objects have properties such as size, shape, texture, and color, and they can be positioned in the three-dimensional world at different distances and angles with respect to an observer. Visual exploration provides ways of sampling these properties which differs from sampling via other senses. What characterizes the visual mode of sampling object properties are such facts as that the retinal image of an object only provides a view of the front of an object, and that when we move around it, parts appear and disappear from view; and that we can only apprehend an object from a definite distance, so that its retinal projection has a certain size that depends on distance. Other characteristics of visual exploration of objects derive from the fact that color and brightness of the light reflected from an object change in lawful ways as the object or the light source or the observer move around, or as the characteristics of the ambient light change.

On the other hand, tactile exploration of an object, even though it may be sampling the same objective properties, obeys different sensorimotor contingencies: you do not touch an object from a "point of view" -- your hand can often encompass it more or less completely for example, and you don't apprehend it from different distances; its tactile aspect does not change with lighting conditions.

There is thus a subset of the sensorimotor contingencies that are engendered by the constraints of visual-type exploration, and which corresponds to visual attributes of sensed objects.

Note that unlike the sensorimotor contingencies that are visual-modality related, the sensorimotor contingencies that are visual-attribute related do nonetheless have strong links to the tactile sense: this is because attributes of three dimensional objects can also sometimes be apprehended via the tactile exploratory mode, where they present themselves as tactile shape, texture, size, distance. As shown eloquently by Piaget's work, the observer's conception of space in general will also have strong links to the laws of sensorimotor contingency discussed in the present section. Similar ideas were developed by Poincaré [1905, p. 47] who wrote: "To localize an object simply means to represent to oneself the movements that would be necessary to reach it. It is not a question of representing the movements themselves in space, but solely of representing to oneself the muscular sensations which accompany these movements and which do not presuppose the existence of space".

A good illustration of sensorimotor contingencies associated with one particular kind of visual attribute, namely visual shape, can be obtained from the records of patients whose vision has been restored after having been born blind with congenital cataract (c.f. reviews by Morgan [1977], Jeannerod [1975] and Gregory [1973]). One such patient, cited by Helmholtz [1925/1909], is surprised that a coin, which is round, should so drastically change its shape when it is rotated (becoming elliptical in projection). The fact that objects also drastically change in extent as a function of distance is poignantly illustrated by the case of a 13-14 year old boy treated by Cheseldon (1728, cited by Morgan [1977], p. 20): "Being shewn his father's picture in a locket at his mother's watch, and told what it was, he acknowledged a likeness, but was vastly surpriz'd; asking, how it could be, that a large face could be express'd in so little room, saying, it should have seem'd as impossible to him, as to put a bushel of any thing into a pint."

These examples make us realize how second nature it is for people with normal vision to witness the perspective changes that surfaces undergo when they are shifted or tilted, or when we move with respect to them. The idea we wish to suggest here is that the visual quality of shape is precisely the set of all potential distortions that the shape undergoes when it is moved relative to us, or when we move relative to it. Although this is an infinite set, the brain can abstract from this set a series of laws, and it is this set of laws which codes shape[4].

Another example of how sensorimotor contingencies can be used as indicators of visual attributes is illustrated in an aspect of Figure 1 we have not yet mentioned. We saw in the introduction that movement of the eye away from a line creates a very strong distortion in its cortical and retinal representation. Under the classical view of what shape perception requires, it would be necessary to postulate that in order to see lines as straight despite eye movements, a transformation mechanism would have to exist that compensates for these distortions. This mechanism would take the cortical representation illustrated in the bottom right of the figure, and transform it so the two dissimilar packets of stimulated neurons shown in the figure now look identical.[5] There would additionally have to be another cortical locus where this new, corrected representation was projected. The view presented here does away with these unnecessary steps

Consider the following fact: if the eye moves along the straight line instead of perpendicular to it, the set of photoreceptors on the retina which are stimulated does not change, since each photoreceptor that was on the image of the line before the eye moves is still on the image after the eye moves. This is due to an essential property of lines -- they are self-similar under translation along their length (we assume for simplicity that the line is infinite in length). Since exactly the same photoreceptors are being stimulated before and after eye movement along the line's length, the cortical representation of the straight line is therefore identical after such a movement: there is this time no distortion at all. Another interesting fact is that the argument we have just made is totally independent of the code used by the brain to represent the straight line. Even if the optic nerve had been scrambled arbitrarily, or if the retina were corrugated instead of spherical, thereby causing the image of the line to be wiggly instead of straight, or if the eye's optics gave rise to horrendous distortions, movement of the eye along the line would still not change the pattern of cortical stimulation. We see that this particular law of sensorimotor invariance is therefore an intrinsic property of straight lines, and is independent of the code used to represent them. Platt [1960] has extended such considerations to other geometrical invariants, and Koenderink [1984a] has considered the more general, but related problem of how spatiotemporal contingencies in the neural input can be used to deduce intrinsic geometrical properties independently of the code by which they are represented.

In general, it will be the case that the structure of the laws abstracted from the sensorimotor contingencies associated with flat, concave, and convex surfaces, corners, etc., will be a neural-code-independent indication of their different natures. In relation to this, some psychophysical work is being done, for example, to determine the respective importance, in determining shape, of cues derived from changes caused by movement of the object versus movement of the observer (e.g. Cornilleau-Peres & Droulez [1994], Dijkstra, Cornilleau-Peres, Gielen, & Droulez [1995], Rogers & Graham [1979]; [1992]). Nonetheless, though it is inherent in the approaches of a number of researchers (c.f. Section 3.3), the idea that the laws of sensorimotor contingency might actually constitute the way the brain codes visual attributes has not so far been greatly developed in the literature. However this idea is essential in the present theory.

2.3 Sensation and perception

Psychologists interested in perception have traditionally distinguished between sensation and perception. While it is difficult to make this distinction precise, perhaps the central point of the distinction is to differentiate between the way the senses are affected by stimuli (sensation) and the results of categorization of objects and events in the environment (perception). It is worthwhile to note that our distinction between two different classes of sensorimotor contingency roughly corresponds to this distinction between sensation and perception. Sensorimotor contingencies of the first sort -- those that are determined by the character of the visual apparatus itself -- are independent of any categorization or interpretation of objects and can thus be considered to be a fundamental, underlying aspect of visual sensation. Sensorimotor contingencies of the second sort -- those pertaining to visual attributes -- are the basis of visual perception.

In this way we can interpret the present theory as attempting to do justice to one of the working doctrines of traditional visual theory.

2.4 Perceivers must have mastery of patterns of sensorimotor contingency

Consider a missile guidance system allowing a missile to home in on an enemy airplane. As the missile zigzags around to evade enemy fire, the image of the target airplane shifts in the missile's sights. If the missile turns left, then the image of the target shifts rightwards. If the missile slows down, the size of the image of the airplane decreases in a predictable way. The missile guidance system must adequately interpret and adapt to such changes in order to track the target airplane efficiently. In other words, the missile guidance system is "tuned to" the sensorimotor contingencies that govern airplane tracking. It "knows all about" or "has mastery over" the possible input/output relationships that occur during airplane tracking.

Now consider what happens when the missile guidance system is out of order. The visual information is being sampled by its camera, it is getting into the system, being registered, but it is not being properly made use of. The missile guidance system no longer has mastery over airplane tracking.

We suggest that vision requires the satisfaction of two basic conditions. First, the animal must be exploring the environment in a manner that is governed by the two main kinds of sensorimotor contingencies (those fixed by the visual apparatus, and those fixed by the character of objects). Second, the animal, or the brain, must be "tuned to" these laws of sensorimotor contingencies. That is, the animal must be actively exercising its mastery of these laws.

Note that the notion of being tuned or having mastery only makes sense within the context of the behavior and purpose of the system or individual in its habitual setting. Consider again the missile guidance system. If exactly the same system was being used for a different purpose, say, for example, as an attraction in a fun fair, it might well be necessary for the system to have a different behavior, with scary lunges and strong acceleration and deceleration which would be avoided in a real system. Thus, "mastery" of the sensorimotor contingencies might now require a different set of laws[6]. In fact even the out-of-order missile guidance system has a kind of ineffectual mastery of its sensorimotor contingencies.

2.5 Important upshot: A sensory modality is a mode of exploration mediated by distinctive sensorimotor contingencies

The present view is able to provide an account of the nature and difference among sensory modalities. In the introduction we stressed the deficiencies of Müller's [1838] view as well as of its modern adaptation[7], according to which it is supposed that what determines the differences between the senses is something about the neural pathways that are involved: this view requires postulating some special extra property which differentiates the neural substrate of these pathways, or some special additional mechanism, whose nature then stands in need of further (and for now at least unavailable) explanation. The present approach obviates this difficulty by saying that what differentiates the senses are the laws obeyed by the sensorimotor contingencies associated with these senses[8]. Hearing and audition are both forms of exploratory activity, but each is governed by different laws of sensorimotor contingency. Just as it is not necessary to postulate an intrinsic "essence" of horseriding to explain why it feels different from motorcycling, it is similarly unnecessary to postulate a Müller-type specific nerve energy to account for the difference between vision and other senses[9].

The sensory modalities, according to the present proposal, are constituted by distinct patterns of sensorimotor contingency. Visual perception can now be understood as the activity of exploring the environment in ways mediated by knowledge of the relevant sensorimotor contingencies. And to be a visual perceiver is, thus, to be capable of exercising mastery of vision-related rules of sensorimotor contingency.

We shall see that this approach, in which vision is considered to be a law-governed mode of encounter with the environment opens up new ways of thinking about phenomena such as synesthesia, the facial vision of the blind, and in particular for tactile visual sensory substitution, where apparently visual experience can be obtained through arrays of vibrators on the skin.

2.6 Visual awareness: Integrating sensorimotor contingencies with reasoning and action-guidance

Thus far we have considered two important aspects of vision: the distinctively visual qualities that are determined by the character of the sensorimotor contingencies set up by the visual apparatus; and the aspect which corresponds to the encounter with visual attributes, that is, those features which allow objects to be distinguished visually from one another. These two aspects go some way towards characterizing the qualitative nature of vision.

We now turn to a third important aspect of vision, namely, visual awareness.

Suppose you are driving your car and at the same time talking to a friend. As you talk, the vista in front of you is impinging upon your eyes. The sky is blue, the car ahead of you is red, there is oncoming traffic, etc. Your brain is tuned to the sensorimotor contingencies related to these aspects of the visual scene. In addition, some of these sensorimotor contingencies are also being used to control your driving behavior, since you are continuously adjusting your steering and adapting your speed to the moment-to-moment changes in the road and the traffic. But, since you are talking to your friend, you do not attend to most of these things. You do not notice that the car ahead is red, you do not think about the sky being blue; you just drive and talk to your friend.

You lack, as we shall say, visual awareness of many of the aspects of the visual scene. For those scene aspects, you are no different from an automatic pilot controlling the flight of an airplane. Your behavior is regulated by the appropriate sensorimotor contingencies, but you remain visually unaware of the associated aspects of the scene.

But if you should turn your attention to the color of the car ahead of you, and think about it, or discuss it with your friend, or use the knowledge of the car's color to influence decisions you are making, then, we would say, you are aware of it. For a creature (or a machine for that matter) to possess visual awareness, what is required is that, in addition to exercising the mastery of the relevant sensorimotor contingencies, it must make use of this exercise for the purposes of thought and planning.[10]

When you not only visually track an environmental feature by exercising your knowledge of the relevant sensorimotor contingencies, but in addition integrate this exercise of mastery of sensorimotor contingencies with capacities for thought and action-guidance, then you are visually aware of the relevant feature. Then, we say, you see it.

Consider an important point about this view of what visual awareness is, namely that our possession of it is a matter of degree. In particular, in our view, all seeing involves some degree of awareness, and some degree of unawareness. For example, if you were to probe an unaware driver waiting at the light, there would probably be some aspects of the red light that were at least indirectly being integrated into the driver's current action-guidance, rational reflection and speech. Perhaps, though not noticing the light's redness, the fact that the light was red may make him realize that he was going to be late. Or though not noting that the light was red, the driver could be noting that it was difficult to see because the sky was too bright. On the other hand, even the driver who was aware of seeing the red light may not have been aware of all its aspects, for example that the shape of the light was different from usual. A visual stimulus has a very large (perhaps infinite) number of attributes, and only a small number can at any moment be influencing one's action-guidance, rational reflection and speech behavior.

A further important fact about this account of visual awareness is that it treats awareness as something nonmagical. There is no need to suppose that awareness and seeing are produced by the admixture of some mysterious additional element. To see is to explore one's environment in a way that is mediated by one's mastery of sensorimotor contingencies, and to be making use of this mastery in one's planning, reasoning, and speech behavior.

2.7 Visual consciousness and experience: forms of awareness

It may be argued that there is still something missing in the present account of vision, namely an explanation of visual consciousness, or of the phenomenal experience of vision. Although there is a great deal of disagreement among philosophers about these notions, there is broad consensus, first, that seeing involves experience in the sense that there is something it is like to see, and second, that it is somehow mysterious how we can possibly explain this subjective character of experience, or, as it is sometimes put, the "raw feel" or the "qualia" of vision, in neural or other physical terms. Is there any reason to believe the sensorimotor contingency approach can succeed here where others have failed?

We will return to some of these issues in Section 6 of this paper. For now, let us note that the present sensorimotor contingency framework would seem to allow for the explanation and clarification, and certainly for the scientific study of, a good deal of what makes for the subjective character of experience. Thus, one important dimension of what it is like to see is fixed by the fact that there is a lawful relation of dependence between visual stimulation and what we do, and this lawful relation is determined by the character of the visual apparatus. A second crucial feature that contributes to what it is like to see is the fact that objects, when explored visually, present themselves to us as provoking sensorimotor contingencies of certain typically visual kinds, corresponding to visual attributes such as color, shape, texture, size, hidden and visible parts. Together, these first two aspects of seeing, namely the visual-apparatus-related sensorimotor contingencies and the visual-object-related sensorimotor contingencies, are what make vision visual, rather than, say, tactile or auditory. Once these two aspects are in place, the third aspect of seeing, namely visual awareness, would seem to account for just about all the rest of what goes into making up the character of seeing. For visual awareness is precisely the availability of the kinds of features and processes making up the first two aspects for the purposes of control, thought and action.

As said, the question of visual experience and consciousness is extremely controversial, and we will defer further discussion of our view until Section 6.

3 REFINEMENTS OF THE VIEW

Vision, we argue, requires knowledge of sensorimotor contingencies. To avoid misunderstanding, it is necessary to discuss this claim in greater detail.

3.1 Knowledge of sensorimotor contingencies is a practical, not a propositional form of knowledge

Mastery of the structure of the rules is not something about which we (in general) possess propositional knowledge. For example, we are not able to describe all the changes that a convex surface should suffer or the distortions that should occur on moving our eyes to all sorts of positions on the surface, or when we move or rotate it. Nevertheless our brains have extracted such laws, and any deviation from the laws will cause the percept of the surface's shape to be modified. Thus, for example, our brains register the fact that the laws associated with normal seeing are not being obeyed when, for example, we put on a new pair of glasses with a different prescription: for a while, distortions are seen when the head moves (because eye movements provoke displacements of unusual amplitudes); or when we look into a fish tank (now moving the head produces unusual kinds of distortions), or dream or hallucinate (now, for example, blinking has no effect). Our impression in such cases is then that something unusual is happening.

3.2 Mastery must be currently exercised

Another important condition that we need to impose in order that sensorimotor contingencies properly characterize vision, is that the mastery of laws of sensorimotor contingency be exercised now. The reason we need this condition is the following.

Over the course of life, a person will have encountered myriad visual attributes and visual stimuli, and each of these will have associated with it particular sets of sensorimotor contingencies. Each such set will have been recorded and will be latent, potentially available for recall: the brain thus has mastery of all these sensorimotor sets. But when a particular attribute is currently being seen, then the particular sensorimotor contingencies associated with it are no longer latent, but are actualized, or being currently made use of. In the language of the missile guidance system: the system may have stored programs that are applicable to the task of following different kinds of planes with different speed and turning characteristics. All these programs are latent, and the system has mastery of them all. But only when the system is following a particular type of plane does it invoke and follow the particular recipe for that plane.

Again: among all previously memorized action recipes that allow you to make lawful changes in sensory stimulation, only some are applicable at the present moment. The set that are applicable now are characteristic of the visual attributes of the object you are looking at, and their being currently exercised constitutes the fact of your visually perceiving that object.

3.3 Historical note: relation to other similar ideas

Consider the following analogy with haptic perception, suggested by MacKay [1962]; [1967]; [1973]. Suppose you are a blind person holding a bottle with your hand. You have the feeling of holding a bottle, you feel the bottle. But what sensations do you really have? Without slight rubbing of the skin, tactile information is considerably reduced, and even temperature sensation will, through adaptation of the receptors, disappear after you have held the bottle for a while. In fact therefore, you may well have very little sensory stimulation coming from the bottle at the present instant. Yet you actually have the feeling of "having a bottle in your hand" at this moment. This is because your brain is "tuned" to certain potentialities: if you were to slide your hand very slightly, a change would come about in the incoming sensory signals which is typical of the change associated with the smooth, sliding surface of glass. Furthermore, if you were to move your hand upwards, the size of what you are encompassing with your hand would diminish (because you are moving onto the bottle's neck), and if you were to move downwards, your tactile receptors would respond to the roughness coming from the transition of glass to the paper label.

MacKay suggests that seeing a bottle is an analogous state of affairs[11]. You have the impression of seeing a bottle, if there is knowledge in your nervous system concerning a certain web of contingencies. For example, you have knowledge of the fact that if you move your eyes upwards towards the neck of the bottle, the sensory stimulation will change in a way typical of what happens when a narrower region of the bottle comes into foveal vision; you have knowledge expressing the fact that if you move your eyes downwards, the sensory stimulation will change in a way typical of what happens when the white label is fixated by central vision. Similarly, motions of an object created by manual manipulation can be part of what visually differentiates objects from one another. Unlike a bottle, an object like a pitcher with a handle can be rotated and the handle made to appear and disappear behind the body of the pitcher. It is the possibility of doing this, which is indicative of the fact that this is a pitcher and not a bottle. The visual nature of pitchers involves the knowledge that there are things that can be done to them which make a protrusion (the handle) appear and disappear.

Ryle [1949/1990] has made similar points. He says (p. 218), of a person contemplating a thimble: "Knowing how thimbles look, he is ready to anticipate, though he need not actually anticipate, how it will look, if he approaches it, or moves away from it; and when, without having executed any such anticipations, he does approach it, or move away from it, it looks as he was prepared for it to look. When the actual glimpses of it that he gets are got according to the thimble recipe, they satisfy his acquired expectation-propensities; and this is his espying the thimble."

Other authors have, over the last decades, expressed similar views. Hochberg [1970] (p. 323), for example, in the context of his notion of schematic maps, refers to: "the program of possible samplings of an extended scene, and of contingent expectancies of what will be seen as a result of those samplings..."., and Sperry [1952] has the notion of "implicit preparation to respond". These ideas are also related to Neisser's [1976] perceptual cycle, to Noton & Stark's [1971] "scanpath" theory, and was also put forward in O'Regan [1992] in relation to the notion of the "world as an outside memory". Although, as noted by Wagemans & de Weert [1992], Gibson's notion of "affordance" (Gibson [1982], Turvey, Shaw, Reed, & Mace [1981], Kelso & Kay [1987]), is sometimes considered "mystical", it is undoubtedly strongly related to the present approach (on this see Noë [submitted]). The importance of action in perception has been stressed by Paillard [1971]; [1991] and Berthoz [1997]. Similar notions have also been found useful in "active vision" robotics (Ballard, Hayhoe, Pook, & Rao [1997]; Brooks [1987]; [1991]). Thomas [1999], in an excellent review, has advocated an "active perception" approach to perception and visual imagery, which corresponds very closely to our second, object-related type of sensorimotor contingency.

Another related viewpoint is to be found in the work of Maturana and Varela. Maturana and Varela also emphasize the importance of sensorimotor coupling for understanding the structure of the animal's cognitive and perceptual capacities and also for understanding the organization of the nervous system. Varela, Thompson, & Rosch [1991] present an "enactive conception" of experience according to which experience is not something that occurs inside the animal, but is something the animal enacts as it explores the environment in which it is situated (see also Thompson, Palacios, & Varela [1992]; Thompson [1995]; Pessoa et al. [1998]; Noë, Pessoa, & Thompson [2000]). A related approach has been put forward by Järvilehto [1998a]; [1998b]; [1999]; [2000], who, in a series of articles with an approach very similar to ours[12], stresses that perception is activity of the whole organism-environment system.

All these views of what it is to see, and particularly MacKay's and Ryle's, are based on the same notion of sensorimotor contingency that is so central to the view we are proposing in the present article. In particular, MacKay's work was the main source of inspiration of our theory. However it should be emphasized that our view contains several novel elements not to be found in the works of these authors.

The first point we have stressed is that there is an important distinction to be made between the two classes of sensorimotor contingencies, those which are particular to the visual apparatus, and those which are particular to the way objects occupy three-dimensional space and present themselves to the eye. Most of the workers cited in the previous paragraphs have been concerned mainly with the sensorimotor contingencies associated with visual object attributes. An exception may be the case of Gibson, who in different terms considered the more apparatus-related sensorimotor contingencies. In any case it seems to us that it is mainly, though not exclusively, through these latter contingencies that we can give a principled account of the qualitative differences in the experienced phenomenology of the different sensory modalities, thereby providing a more principled alternative to Müller's notion of "specific nerve energy".

A second innovative point in our approach will become more evident in Section 6. We shall see that by taking the stance that the experience of vision is actually constituted by a mode of exploring the environment, we escape having to postulate magical mechanisms to instill experience into the brain[13].

4 THE WORLD AS AN OUTSIDE MEMORY

4.1 The world as an outside memory

Under the present theory, visual experience does not arise because an internal representation of the world is activated in some brain area. On the contrary, visual experience is a mode of activity involving practical knowledge about currently possible behaviors and associated sensory consequences. Visual experience rests on know-how, the possession of skills.

Indeed there is no "re"-presentation of the world inside the brain: the only pictorial or 3D version required is the real outside version. What is required however are methods for probing the outside world -- and visual perception constitutes one mode via which it can be probed. The experience of seeing occurs when the outside world is being probed according to the visual mode, that is, when the knowledge being accumulated is of the three kinds described above, that are typical of the visual modality.

Thus, as argued in O'Regan [1992], it could be said that the outside world acts as an external memory that can be probed at will by the sensory apparatus.

To further clarify this, it is useful to make the relation with normal memory. You know many things about where you live. But as you sit in your office, you may not be thinking about them. If you should start doing so, you can conjure up in your mind all manner of things. Each thing can be thought about in detail, but meanwhile, the other things, though latent, are not being thought about. As you think about your kitchen, your bedroom is not in your mind, though you can cause it to come to mind by merely thinking about it. Remembering is casting one's awareness onto parts of latent memories.

Similarly, seeing is casting one's awareness onto aspects of the outside world made available by the visual apparatus. As you look at a visual scene, you can interrogate yourself about different aspects of the scene. As soon as you do so, each thing you ask yourself about springs into awareness, and is perceived -- not because it enters into a cortical representation, but because knowledge is now available about how sensations will change when you move your eyes, or move the object. However, before you actually wonder about some aspect of the scene, although the information is "out there", and although you know you can obtain it by making the appropriate eye movement or attention shift, it is not currently available. It is not currently available for being visually "chewed upon" or "manipulated", and cannot at this moment be used to control judgments and utterances: the third, "awareness" aspect of seeing is missing. Thus, even though the image of the object is impinging on your retina, and even though its aspects may be being analyzed by the feature-extracting modules of your visual system, under the current theory of seeing we must say that the object is not actually being seen.

As will be described in Section 5, this way of thinking about vision brings with it a number of consequences about some classic problems related to the apparent stability of the visual world despite eye movements, and to the problem of "filling-in" or compensating for "imperfections" of the visual apparatus such as the blind spot. It also provided the impetus for the change-blindness experiments described in Section 5.10.

4.2 The impression of seeing everything

A rather counter-intuitive aspect of the world-as-outside-memory idea, and the associated notion that there is no picture-like internal representation of the outside world, is that, in a certain sense, only what is currently being processed is being "seen". How then, if at any moment only a small fragment of the world is actually being seen, could we ever have that strong subjective impression that we continually have of seeing "everything"?

As pointed out by Noë et al. [2000] and Noë [in press], this paradox is actually only apparent, and rests on a misunderstanding of what seeing really is. It is true that normal perceivers take themselves to be aware of a detailed environment. But what this means is that they perceive the environment surrounding them as detailed. It does not mean that they think that inside their brains there is a detailed copy of the environment. It is only those perceivers -- and there are many scientists among them -- who make the mistake of thinking that "seeing" consists of making such a copy, who are led to think there is a problem.

Another way of understanding why our visual phenomenology is of seeing everything in front of us, derives from the fact that since the slightest flick of the eye or attention allows any part of a visual scene to be processed at will, we have the feeling of immediate availability about the whole scene. In other words, despite the fact that we are only currently processing a small number of details of the scene, under the present definition of seeing, we really are seeing the whole scene.

Suppose you should ask yourself, "Am I currently consciously seeing everything there is to see in the scene?" How could you check that you were seeing everything? You would check by casting your attention on each element of the scene, and verify that you have the impression of consciously seeing it. But obviously as soon as you do cast your attention on something, you see it. Conclusion, you will always have the impression of consciously seeing everything, since everything you check on, you see. There is an interesting and unfortunate consequence of this: If for some reason you should not be able to mentally attend to some aspect of the scene, you will not be able to consciously see it. Some examples of this are given in Section 5.10-5.12 on empirical evidence.

One could make the amusing analogy, referred to by Thomas [1999], of the refrigerator light. It seems always to be on. You open the refrigerator: it's on. You close the refrigerator, and then open it again to check, the light's still on. It seems like it's on all the time! Similarly, the visual field seems to be continually present, because the slightest flick of the eye or of attention renders it visible. Brooks [1991] has said that the world should be considered as its own best model, and Minsky [1988] has suggested the notion of "immanence illusion" in a similar vein.

4.3 Vividness through transients

In addition to the "slightest flick of attention" argument there is another, very important, factor which explains the particular vividness of the feeling we have of a rich external visual presence. The visual system is particularly sensitive to visual transients (Breitmeyer & Ganz [1976]; Stelmach, Bourassa, & di Lollo [1984]; Tolhurst [1975]). When a visual transient occurs, an automatic, "alerting" or "attention-grabbing" mechanism appears to direct processing to the location[14] where the transient occurred (Yantis [1998]; Theeuwes [1991]. This means that should anything happen in the environment, we will generally consciously see it, since processing will be directed to it. This gives us the impression of "having tabs" on everything that might change, and so of consciously seeing everything. Were there not the attention-grabbing mechanism, our visual impression would be more similar to the impression we have when we stand with our backs to a precipice: we keenly feel it is there, we know that we can turn and see more of the precipice, but the feeling of presence is much less vivid than when we are actually looking into the precipice. The knowledge of having tabs on any change that might occur in the visual field -- the fact that we know any change will attract our attention, is another thing that makes the "outside memory" that provides vision different from other forms of memory. Whereas any change in the visual field is immediately visible to you, if, say, a Latin noun drops out of your memory overnight, no whistle blows that lets you know.

4.4 Dreaming and mental imagery

It is often claimed that dreaming, or other types of mental imagery, provide a counterexample to our denial that the brain must represent what is seen. Since dreams and mental images are apparently pictorial in nature, this seems to show that we are, after all, capable of creating an internal iconic image. Penfield's classic observations (e.g. Penfield & Jasper [1954]) of visual memories being created by stimulation of visual cortex might also be thought to indicate that there are internal pictorial representations.

It is easy to be misled by these arguments, which for some reason are peculiarly compelling. But it is important to appreciate that they are misleading. Whether dreams, hallucination, or normal vision are at stake, these arguments are another instance of the error of thinking that when we see things as picture-like (be it when we look at reality or when we have a dream), this must be because there is some kind of internal picture. But this is as misguided as the supposition that to see red, there must be red neurons in the brain. The supposed fact that things appear pictorial to us in no way requires there to be pictures in the head. Therefore the fact that we dream, hallucinate and imagine does not provide evidence in favor of the view that brain contains pictures of the detailed environment[15].

A corollary of this confusion about dreams and mental imagery is the idea, expressed by a number of authors (e.g. Zeki [1993], Kosslyn [1994], Farah [1989]) that feedback from higher brain areas into the retinotopic cortical map of area V1 would be a good way of creating mental imagery. This argument is somewhat misleading. It could be taken to be based on the implicit assumption that mental imagery occurs because of activation in V1: the topographic, metric layout of V1 would make it a good candidate for the cortical areas that possess what Zeki [1993] has called an "experiential" quality -- i.e. the capacity to generate experience. But again, the metric quality of V1 cannot in any way be the cause for the metric quality of our experience. It is as though in order to generate letters on one's screen, the computer had to have little letters floating around in its electronics somewhere.

There may also be a second confusion at work in the argument from dreaming that we are considering. We have already noted that from the fact that dreams are pictorial, it does not follow that, when we dream, there are pictures in the head. But do we really have reason to believe that dreams are pictorial? People certainly do say that they are. But does this give us reason to believe it is so? Just as we have observed that the idea that seeing is pictorial reflects a kind of naïve phenomenology of vision, it may very well be that the claim that dreaming is pictorial is similarly ill-founded phenomenologically. Certainly it is not the case that when we dream, it is as if we were looking at pictures. A hallmark of dream-like experiences is the unstable and seemingly random character of dreamt detail. For example, the writing on the card is different every time you look at it in the dream[16]. This suggests that without the world to serve as its own external model, the visual system lacks the resources to hold an experienced world steady.

4.5 Seeing without eye movements

Under the theory presented here, seeing involves testing the changes that occur through eye, body and attention movements. Seeing without such movements is, under the theory, a subspecies of seeing: an exception. This would appear to be a rather dissident claim, given that psychologists studying visual perception have devoted a significant part of their energy precisely to the use of tachistoscopic stimulus presentation techniques, where stimuli are displayed for times shorter than the saccadic latency period of about 150 ms required for an eye movement to occur. Indeed the studies show that observers are perfectly able to see under these conditions. For example, Potter [1976], in now classic experiments, showed that observers could pick out a target picture in a series of pictures presented at rates as fast as one picture every 125 ms. Thorpe, Fize, & Marlot [1996] refined Potter's technique and showed by using event-related EEG potentials, that 150 ms after a stimulus is presented, that is, without any eye movement occurring, there is already information available in the cortex allowing the presence of an animal in a picture to be ascertained.

But because highly familiar stimuli (like words or animals) are used in these experiments, observers may be making use of a few distinctive features available in the images in order to accomplish the task. As argued by Neisser [1976] it probably cannot be said that observers are "seeing" the pictures in the normal sense of the word. As an illustration, consider an experiment in which observers were asked to learn to distinguish three previously unknown symbols resembling Chinese characters (Nazir & O'Regan [1990]). These were presented under the control of a computer linked to an eye movement measuring device. In one experiment, conditions were arranged so that observers could contemplate each Chinese symbol with their eyes fixated at the middle of the symbol, but as soon as the eyes moved, the symbol would disappear. Observers found this procedure extremely disrupting and irritating, and, contrary to what happens when the eye is free to move, hundreds of trials were necessary before they were able to distinguish the symbols. Furthermore, once the task was learnt, observers often failed when asked to recognize the learnt patterns at a new retinal location, only as little as half a degree away from the learnt position. Schlingensiepen, Campbell, Legge, & Walker [1986] also found that without eye movements, observers had difficulty distinguishing patterns composed of arrays of random black and white squares, and Atkinson, Campbell, & Francis [1976] showed by using an after-image technique that it is impossible to count more than four dots that are fixed with respect to the retina: a rather surprising fact. In a task of counting assemblies of debris-like pixel clumps, Kowler & Steinman [1977] found that observers had difficulties when eye movements were not permitted[17]. Because the stimuli used in these experiments were well above the acuity limit, the results are not explicable by acuity drop-off in peripheral vision. Even though a portion of the results may be due to lateral interaction effects (e.g. Toet & Levi [1992]), it seems clear that observers are not at ease when doing a recognition task when eye movements are prohibited. It is like tactually trying to recognize an object lain on your hand without manipulating it.

A further suggestion of the need for visual exploration concerns the phenomenon of fading that occurs when the retinal image is immobilized artificially by use of an optical stabilization device. Under these circumstances a variety of perceptual phenomena occur, ranging from loss of contrast, to fragmentation, to the visual field becoming gray or "blacker than black" (Ditchburn [1973]; Gerrits [1967]). A portion of these phenomena can undoubtedly be accounted for in terms of the temporal response of the first stages of the visual system. Kelly [1982] for example has suggested that detectors sensitive to oriented lines such as those discovered by Hubel and Wiesel actually are silent unless the oriented line stimulation is temporally modulated. Laming [1986]; [1988] has stressed that neural transmission of external stimulation is always differentially coupled, so that, for example, the response of the retina to static stimulation is weak, and temporal modulation is necessary for optimal response (Arend [1973]; Krauskopf [1963]; Kelly [1981]; Gerrits [1978]).

From the point of view of the present theory, these phenomena are compatible with the idea that sensing of the visual world is a dynamic probing process. It could be that even the presence of a static external stimulus is not registered by a static sensory input, but by the dynamic pattern of the inputs that would potentially be produced by changes in the sensor position.

4.6 Why we don't see behind ourselves, but we do see partially occluded objects

Consider objects behind you, or in a box on your desk. Though you know that turning around or opening the box will cause certain changes in your sensory stimulation, some of which are indeed visual in nature, you do not have the feeling of seeing things behind you or in the box. The reason is that while the objects are behind you or in the box, the knowledge you have does not include certain essential visual aspects, namely the knowledge that, say, blinking or moving your eyes will modify the sensations in a way typical of things that you see.

On the other hand, closer to normal seeing, consider an object which is partially occluded by another object. As you move your head, previously occluded parts appear, and previously unoccluded parts may disappear behind the occluder. This ability to make parts of the occluded object appear and disappear is similar to the ability to make objects appear and disappear by blinking, or to make their retinal projections change by moving the eye towards and away from them. This kind of ability is typical of what it is to see, so, even though the object is partially occluded, you nevertheless have the impression of seeing it, or at least "almost" seeing it. Furthermore, if you suddenly close your eyes and ask yourself exactly how much of the object was actually visible just before you closed your eyes, you will not generally know, and indeed, as suggested by results of Intraub & Richardson [1989], you will generally think you saw more than you did (see Figure 2). This demonstrates that seeing is not directly related to having a retinal image, but to being able to manipulate the retinal image.

Figure 2. Subjects tend to remember having seen a greater expanse of a scene than was shown in a photograph. For example, when drawing the close-up view in Panel A from memory, the subject's drawing (Panel C) contained extended boundaries. Another subject, shown a more wide-angle view of the same scene (Panel B), also drew the scene with extended boundaries (Panel D). (Note: to evaluate the drawings in the figure, it is important to study the boundaries of each drawing and its associated stimulus). (Figure and caption from Intraub, http://www.udel.edu/psych/fingerle/intraub.htm)

5 EMPIRICAL DATA.

5.1 Introduction

In this section we will lay out a number of empirical results which are related to the theory of visual experience we have sketched. Before beginning however, it should be stressed that the empirical data to be presented is not intended as a test of the theory in the everyday sense that theories are tested in science. We are providing a general framework for the study of vision and it is not possible to subject a general framework to direct verification. Our new framework provides scientists with new problems and it makes some old problems appear as non-problems (like the problem of visual stability despite eye movements, and the problem of filling-in the blind spot -- see below). The framework highlights links between previously unrelated research streams, and creates new lines of research (like the work on change blindness, which was initiated by the idea of the world as an outside memory). Of course in each case, local, alternate, theories are possible within each of these domains, but the advantage of the present approach is that it brings together all the results so they can be seen from a single viewpoint.

In understanding the epistemological role of the present theory, an analogy can be made with the situation facing physicists in the 19th Century, who were trying to invent mechanisms by which gravitational or electrical forces could act instantaneously at a distance. To solve this problem, Faraday's idea of a field of force was, according to Einstein, the single most important advance in physics since Newton (c.f. Balibar [1992]). But in fact the idea of a field of force is not a theory at all, it is just a new way of defining what is meant by force. It is a way of abandoning the problem being posed, rather than solving it. Einstein's abandoning the ether hypothesis is another example of how advances can be made by simply reformulating the questions one allows oneself to pose.

In the experiments to be described below, a first group relate to the notion that there is no picture-like internal representation of the outside world and that the world serves as an outside memory. These studies concern the problem of the apparent stability of the visual world despite eye movements, the filling-in of the blind spot and other (supposed) visual defects, and "change blindness": the fact that large changes in a visual scene sometimes go unnoticed. The second group of studies is more related to the idea that visual experience only occurs when there is the potential for action. These studies concern sensorimotor adaptation, sensory substitution, and synesthesia- related effects.

5.2 The extraretinal signal

At least since Helmholtz toward the end of the last century a classic problem in vision has been to understand why the perturbations caused by eye movements (shift and smear on the retina) do not interfere with our perception of a stable visual world (c.f. reviews of Grüsser [1986]; Shebilske [1977]; Matin [1972]; [1986]; MacKay [1973]; Bridgeman [1995]). A large portion of the experimental literature on the subject has assumed the existence of an internal representation, like a panoramic internal screen, into which successive snapshots of the visual world are inserted so as to create a fused global patchwork of the whole visual environment. The appropriate location to insert each successive snapshot is assumed to be determined by an "extraretinal signal", that is, a signal reflecting the direction the eyes are pointed at every moment. In total darkness some sort of extraretinal information is certainly available, as can easily be ascertained by noting that the after-image of a strong light source seems to move when the eyes move (Mack & Bachant [1969]). Much debate has occurred concerning the question of whether the extraretinal signal is of efferent or afferent origin, and a convincing estimation of the role of the two components has been made by Bridgeman & Stark [1991]. Irrespective of its origin however, the data concur to show that if the extraretinal signal exists, it is very inaccurate. Measurements from different sources (c.f. e.g. compilations in Matin [1972]; [1986]) show that the signal must incorrectly be signaling that the eye starts to move as much as 200 ms before it actually does. The signal also incorrectly estimates the time and position where the eye lands, becoming accurate only about 1 second after the eye has reached its final position[18]. In any case, as admitted by Matin [1986], it is clear that the extraretinal information is too inaccurate, and also too sluggish, given the frequency of eye movements, to be used under normal viewing conditions to accurately place successive snapshots into a global fused internal image.

These results are not surprising when considered from the point of view of the theory of vision presented here. From this viewpoint there is no need to postulate a mechanism that re-positions the retinal image after eye saccades so that the world appears stationary, because what is meant by "stationary", is precisely one particular kind of sensory change that occurs when the eye moves across an object. Having the feeling of seeing a stationary object consists in the knowledge that if you were to move your eye slightly leftwards, the object would shift one way on your retina, but if you were to move your eye rightwards, the object would shift the other way. The knowledge of all such potential movements and their results constitute the perception of stationarity. If on actually moving the eyes there were no corresponding retinal motion, the percept would not be of stationarity. From this point of view, there is no need to construct a stationary internal "image" of an object in order to see it stationary. If there is such a thing as an internal signal in the brain that signals the eye's instantaneous position, then its purpose could not be to construct such an internal image (for there would be no one to look at it).

The question nevertheless arises of how the brain is able to accurately judge whether an object is stationary, or to control visuomanual coordination. If there is no way for retinal and extraretinal information to be combined to yield the true spatial coordinates of an object, how can the motion of an object ever be accurately ascertained, or how can an object be located with respect to the body and grasped? A possible answer may be that, whereas there is no extraretinal signal, there is nevertheless extraretinal information about the eye's location or velocity in the orbit. This information could be present in distributed form, and confounded with information about retinal stimulation. Such a distributed, representation that mixes sensory and motor information (both of a static kind -- position -- and of a dynamic kind -- velocity, acceleration) could provide the knowledge about sensorimotor contingencies required in the present theory. It could be used to perform accurate localization, but would not require the existence of a metric- preserving representation of the eye's position, or a picture-like internal image of objects on the retina or in space. Perhaps the multisensory neurons observed in parietal cortex, whose responses may be modulated by imminent eye movements are compatible with this idea (Colby, Duhamel, & Goldberg [1996]; Duhamel, Colby, & Goldberg [1992]; see also Zipser & Anderson [1988]). Also of interest with respect to these ideas is a model of visual localization despite eye movements that has been constructed by Pouget & Sejnowski [1997]. The model uses basis functions to code nonlinear mixtures of retinal and eye position. Linear combinations of these basis functions can provide pure retinal position, pure eye position, or head-centered coordinates of a target, despite the fact that no coherent internal map of the visual field has been constructed.

5.3 Trans-saccadic fusion

Over recent decades a new research topic has arisen in regard to the question of visual stability, in which researchers, instead of measuring the extraretinal signal itself, are questioning the notion that underlies it, namely the notion of an internal screen in which successive snapshots are accumulated. The experimental methodology of this work consists in displaying stimuli which temporally straddle the eye saccade, and attempting to see if observers see a fused image -- this would be predicted if an internal screen exists. Excellent reviews of this work (Irwin [1991]; [1992]) conclude that trans- saccadic fusion of this kind does not exist, or at least is restricted to a very small zone, namely the zone corresponding to the target which the saccade is aiming for. Another kind of experiment consists in making large changes in high quality, full color pictures of natural scenes in such a way that the changes occur during an eye saccade (McConkie & Currie [1996]). Even though the changes can occupy a considerable fraction of the field of view (e.g. cars appear or disappear in street scenes, swimming suits worn by foreground bathers change color, etc.), they are often not noticed -- also contradicting the idea of a pictorial-type internal representation of the visual world. Again the conclusion appears to be that if there is an internal screen, it is not this internal screen which is providing us with the sensation of a stable, panoramic, visual world (Irwin & Andrews [1996]; Irwin & Gordon [1998]).

This conclusion is consistent with the theory presented here, where the problem of visual stability is a non-problem. Seeing does not require compensating for the effects produced by eye shifts in order to ensure accurate accumulation of partial views into a composite patchwork projected on some internal screen. There is no need to re-create another world inside the head in order for it to be seen. Instead, as suggested in Section 4, the outside world acts as an "external memory" store, where information is available for probing by means of eye movements and shifts of attention (O'Regan [1992])[19].

5.4 Saccadic suppression

Another issue which has preoccupied scientists concerns the question of why we are not aware of the smear caused by saccades. An enormous literature on the topic has been reviewed by Matin [1974](c.f. also Li & Matin [1997]): it appears that both at that time and still today (e.g. Burr, Morrone, & Ross [1994]; Uchikawa & Sato [1995]; Li & Matin [1990]; Ridder & Tomlinson [1997]) many researchers believe that it is necessary to postulate some kind of suppression mechanism that inhibits transmission of sensory information to awareness during saccades, so that the rather drastic saccadic smear is not seen.

The empirical evidence showing diminished sensitivity to flashes during saccades cannot be denied, and the origin of this effect has been estimated by Li & Matin [1997] to be 20% due to the retinal smearing and masking caused by the image displacement (there may also be mechanical effects, as suggested by Richards [1969]), and 80% due to central inhibitory mechanisms (some portion of this may be due to spatial uncertainty caused by the new eye position, c.f. Greenhouse & Cohn [1991]).

The important point however is that whatever inhibitory effects are occurring during saccades, these certainly do not constitute a suppression mechanism designed to prevent perception of the saccadic smear. If they did, then why would we not perceive a dimming of the world during saccades? Would we have to postulate a further un-dimming mechanism to compensate for the dimming? The notion of saccadic suppression probably constitutes another instance of the homunculus error, and is no less naive than postulating the need for a mechanism to right the upside-down retinal image so that the world appears right-side up. As explained in the theory presented above, there is no need to postulate mechanisms that compensate for the smear that is created by eye saccades, because this smear is part of what it is to see. If the retinal receptors did not signal a global smear during saccades, then the brain would have to assume that the observer was not seeing, and that he or she was perhaps hallucinating or dreaming.

5.5 Filling in the blind spot and perceptual completion

Another classic problem in vision which has recently been revived and generated heated debate (e.g. Ramachandran [1992]; Ramachandran & Gregory [1991]; Ramachandran [1995] versus Durgin, Tripathy, & Levi [1995]) is the problem of why we do not generally notice the 5-7 degree blind spot centered at about 17 degrees eccentricity in the temporal visual field of each eye, corresponding to the blind location on the retina where the optic nerve pierces through the eyeball.

Related problems involve understanding the apparent filling in of brightness or color that occurs in phenomena such as the Craik-O'Brian-Cornsweet effect and neon color spreading; the apparent generation of illusory contours as in the Kanisza triangle; and other phenomena of modal or amodal completion (c.f. reviews of Kingdom & Moulden [1992]; Pessoa et al. [1998]).

Taking the case of the blind spot, from the point of view of the present theory, and in agreement with analyses of a number of theoreticians (Todorovic [1987]; Kingdom & Moulden [1992]; Pessoa et al. [1998]) there is no need for there to be any filling in mechanism (O'Regan [1992]). On the contrary, the blind spot can be used in order to see: if retinal sensation were not to change dramatically when an object falls into the blind spot, then the brain would have to conclude that the object was not being seen, but was being hallucinated. Suppose you explore your face with your hand: you can put your hand in such a way that your nose falls between two fingers. This does not give you the haptic impression of having no more nose. On the contrary, being able to put the nose between two fingers gives information about the size and nature of a nose. It is part of haptically perceiving the nose.

Monitoring the way the sensory stimulation from the retina changes when the eye moves to displace an object in the vicinity of the blind spot, is, for the brain, another way of gaining information about the object.

One can argue however that even though there may be no need for filling in processes, such filling in processes may nevertheless actually exist. In support of this, Pessoa et al. [1998] though critical of some neurophysiological and behavioral studies purporting to be evidence for filling in, concluded that several studies do point to the existence of precisely the kind of mechanisms which would be required for a filling in process. For example, Paradiso & Nakayama [1991], by using a masking paradigm, were able to measure the temporal dynamics of the phenomenal filling in of the inside of a bright disk. De Weerd, Gattass, Desimone, & Ungerleider [1995] found cells in extrastriate cortex whose responses correlate well with the time it takes holes in textures presented in peripheral vision to perceptually fill in.

Just as was the case for the problem of the extraretinal signal or of saccadic suppression, the theory being advocated here does not deny the existence of neural mechanisms that underlie the perceptual phenomena that each of us observe. There can be no doubt that something is going on in the brain which is in relation to the fact that observers have no experience of a blind spot, and which makes Kanisza triangles have illusory contours. The question is: Is whatever is going on, actually serving to create an internal copy of the outside world, which has the metric properties of a picture, and which has to be completed in order for observers to have the phenomenology of a perfect scene? In the example of Paradiso & Nakayama's data for example, there can be no denying that there must be retinal or cortical processes that involve some kind of dynamic spreading activation and inhibition, and that these processes underlie the percept that observers have in their paradigm and possibly also when a disk is presented under normal conditions. But even though these processes act like filling in processes, this does not mean that they are actually used by the brain to fill in an internal metric picture of the world. They may just be providing information to the brain about the nature of the stimulation, but without this information being used to create a picture- like representation of the world.

In other words, our objection is not to the mechanisms themselves, whose existence we would not deny, but to the characterization of these mechanisms as involving "filling in". Consider this caricature: Spatio-temporal integration in the low level visual system is a mechanism which explains much phenomenology (e.g. why fast flickering lights appear continuous and very closely spaced dots look like lines). But surely no one would want to claim that the purpose of spatiotemporal integration is to "fill in" the temporal gaps in what would otherwise look like a stroboscopic world, or to make dotted lines look continuous. Spatiotemporal integration is a mechanism used in our visual systems to sample the environment, but its purpose is not to compensate for gaps in what would otherwise be a granular, pixel-like internal picture.

5.6 Other retinal non-homogeneities and the perception of color

A striking characteristic of the human visual system is its non-homogeneity. Spatial resolution is not constant across the retina, but falls off steadily: even the central foveal area is not a region of constant acuity, since at its edge (i.e. at an eccentricity of about 1 degree), position acuity has already dropped to half its value at the fovea's center (Levi, Klein, & Aitsebaomo [1985]; Yap, Levi, & Klein [1989]). This drastic fall-off continues out into peripheral vision, only slowing down at around 15 degrees of eccentricity.

In additional to this non-homogeneity in spatial sampling, the retina also suffers from a non-homogeneity in the way it processes color: whereas in the macular region, the presence of three photoreceptor cone classes permits color discrimination, in peripheral retina the cones become very sparse (Anderson, Mullen, & Hess [1991]; Marcos, Navarro, & Artal [1996]; Coletta & Williams [1987]). The lack of the ability to accurately locate colors can easily be demonstrated by attempting to report the order of the colors of four or five previously unseen colored pencils when these are brought in from peripheral vision to a position just a few degrees to the side of one's fixation point.

A further, surprising non-homogeneity derives from the macular pigment, a yellowish jelly that covers the macula, that absorbs up to 50% of the light in the short wavelength range (Bone, Landrum, & Cains [1992]), thereby profoundly altering color sensitivity in central vision.

Despite these non-homogeneities, the perception of spatial detail and color does not subjectively appear non-uniform to us: most people are completely unaware of how poor their acuity and their color perception are in peripheral vision. Analogously to the filling-in mechanism that is sometimes assumed to fill in the blind spot, one might be tempted to postulate some kind of compensation mechanism that would account for the perceived uniformity of the visual field, However, from the point of view of the present theory of visual experience, such compensation is unnecessary. This will be illustrated in relation to color perception below.

5.7 "Red" is knowing the structure of the changes that "red" causes

Under the present view of what seeing is, the visual experience of a red color patch depends on the structure of the changes in sensory input that occur when you move your eyes around relative to the patch, or when you move the patch around relative to yourself. For example, suppose you are looking directly at the red patch. Because of absorption by the macular pigment, the stimulation received by the color sensitive retinal cones will have less energy in the short wavelengths when you look directly at the red patch, and more when you look away from the patch. Furthermore, since there is a difference in the distribution and the density of the different color-sensitive cones in central vs. peripheral vision, with cone density dropping off considerably in the periphery, there will be a characteristic change in the relative stimulation coming from rods and cones that arises when your eyes move off the red patch. What determines the perceived color of the patch is the set of such changes that occur as you move your eyes over it[20].

A relevant example arises from the perception of color in dichromats. When carefully tested in controlled conditions of illumination, dichromats exhibit deficiencies in their ability to distinguish colors, generally along the red-green dimension, which can be accounted for by assuming that they lack a particular type of cone, generally either the long or medium wavelength type. Curiously however, in real-life situations, dichromats are often quite good at making red-green distinctions. As suggested by Jameson & Hurvich [1978] (c.f. also Lillo, Davies, Collado, Ponte, & Vitini [1998]) this is undoubtedly because they can make use of additional cues deriving from what they know about objects and what they can sense concerning ambient lighting. Thus for example, when a surface is moved so that it reflects more yellowish sunlight and less bluish light from the sky, the particular way the spectrum of the reflected light changes, disambiguates the surface's color, and allows that color to be ascertained correctly even when the observer is a dichromat.

Though it is not surprising to find observers using all sorts of available cues to help them in their color discriminations, this kind of finding can be taken to support a much more far-reaching, fundamental hypothesis, put forward by Broackes [1992]. This is that the color of a surface is not so much related to the spectrum of the reflected light, but rather, to the way the surface potentially changes the light when the surface is moved with respect to the observer or the light sources.

It must be stressed that more is being said here than was said by Jameson & Hurvich, who merely noted that information is available that allows dichromats to make judgments similar to trichromats. Broackes' idea is that the colors of surfaces are exactly the laws governing the way the surface changes the reflected light[21]. At least as far as reflectivity of surfaces are concerned, the same laws apply to dichromats and trichromats, so they to a certain extent they have the same kinds of color perception: the difference is that dichromats have fewer clues to go by in many situations. Thus Broackes, who has color vision deficiencies[22] himself, claims that he has different experiences for red and green as do normals. His only problem is that sometimes, when lighting conditions are special, he can see certain dark red things as dark green, just as sometimes, in shadow, people with normal vision are convinced a garment is dark blue when in fact it is black, or vice versa. Of course there will be a component of the sensorimotor contingencies, namely those determined by the observer's own visual apparatus, which, to the extent that dichromats lack one of the three color channels, are different in the case of dichromats as compared to trichromats, so colors cannot be completely identical for them.

Broackes' theory of color is strongly related to the theory of visual perception that we have presented here. The difference between Broackes' views and ours is that Broackes is attempting to characterize the nature of color in terms of laws of sensorimotor contingency, whereas we have taken the bolder step of actually identifying color experience with the exercise of these laws, or, more precisely, with activity carried out in accord with the laws and based on knowledge of the laws.

5.8 Eye-position contingent perception

A surprising prediction from this idea, that the sensation of red comes from the structure of changes that is caused by red, is the following armchair experiment. Using a device to measure eye movements connected to a computer, it should be possible to arrange stimulation on a display screen so that whenever an observer looks directly at a patch of color it appears red, but whenever the observer's eye looks away from the patch, its color changes to green. The rather counterintuitive prediction from this is that after training in this situation, the observer should come to have the impression that green patches in peripheral vision and red patches in central vision are the same color.

Whereas exactly this kind of experiment has not yet been done, a variety of related manipulations were performed by McCollough [1965b] and by Kohler [1951][23]. For example, Kohler had observers wear spectacles in which one half of the visual field was tinted with blue, and the other half tinted with yellow. This is similar to the proposed armchair experiment in the sense that perceived color will be different depending on which way the observer moves the eyes. Results of the experiment seem to show that after adaptation, observers apparently came to see colors "normally". Similar phenomena were observed with half-prisms, in which the top and bottom portion of the visual field were shifted by several degrees with respect to each other. Observers ultimately adapted, so that manual localization of objects in the upper and lower visual fields was accurate.

Of particular interest in these studies would have been to know whether observers perceived the world as continuous despite the discontinuity imposed by the colored glasses or prisms. However, it is difficult to rigorously evaluate the reports, as they were only described informally by Kohler. Since then, though a large literature has developed over the last decades concerning many forms of perceptual adaptation, not very much work seems to have been done to investigate the effects of modifications like those imposed by the two-color glasses or the half-prisms, which produce strong discontinuities in the visual field.

Nevertheless partial insight into such situations may be obtained by considering people who wear spectacles with bifocal lenses[24]: here a discontinuity exists in the visual field between the upper and lower part of the glasses. Depending on where an observer directs the eyes, the size and focus of objects will be different, because of the different power of the two parts of the lens. The question is then, does the world appear discontinuous to viewers of bifocals? The answer is that the world does not appear discontinuous, any more than the world appears "dirty" to someone who has not wiped his spectacle lenses clean. This is not to say that the observer cannot become aware of the discontinuity or the dirt on the lenses by attending to the appropriate aspect of the stimulation, just as it is possible to become aware of the blind spot in each eye by positioning a stimulus appropriately. But under normal circumstances the wearer of bifocals takes no notice of the discontinuity. Furthermore, even though image magnification as seen through the different parts of the lens are different, thereby modifying perception of distance, manual reaching for objects seen through the different parts of the lenses adapts and becomes accurate, as does the vestibulo-ocular reflex. Gauthier & Robinson [1975] and Gauthier [1976] have for example shown that wearers of normal spectacles with strong corrections, as well as scuba divers come to possess a bistable state of adaptation, whereby their distance perception and reaching can instantaneously switch from one to the other state, as they take their spectacles on and off, or look through their underwater goggles (see also Welch, Bridgeman, Anand, & Browman [1993] for a similar effect with prisms). In fact an observer can be tricked into inappropriately switching adaptation state by surreptitiously removing the lenses from his or her eyeglasses, so that he or she incorrectly expects magnification to change when the eyeglasses are put on (Gauthier, personal communication).

5.9 Inversion of the visual world

Relevant to the theory of visual experience being proposed here, are the classic experiments performed by Stratton [1897], Kohler [1951], and some less often cited replications by Taylor [1962] and by Dolezal [1982] and Kottenhoff [1961], in which an observer wears an optical apparatus which inverts the retinal image so that the world appears upside-down and/or left-right inverted (c.f. reviews by e.g. Harris [1965]; [1980]). Although at first totally incapacitated, observers adapt after a few days and are able to move around. Ultimately (after about two weeks of wearing the apparatus) they come to feel that their new visual world is "normal" again[25].

What is interesting about these experiments is that during the course of adaptation, perception of the world is subject to a sort of fragmentation, and to a dependence on context and task. For example, Kohler [1951] reports that visual context allows something that is seen upside-down to be righted (e.g. a candle flips when it is lit because flames must go up, a cup flips when coffee is poured into it, because coffee must pour downwards). Ambiguities and inconsistencies abound: Dolezal reports sometimes being unable to prevent both his hands from moving when he tries to move only one. Kohler reports cases where two adjacent heads, one upright, the other inverted, were both perceived as upright. Observer Grill, after 18 days of wearing reversing spectacles, stands on the sidewalk and correctly sees vehicles driving on the "right", and hears the noise of the car motor coming from the correct direction. On the other hand, Grill nevertheless reports that the license plate numbers appear to be in mirror writing. Other observations are that a "3" is seen as in mirror writing, even though its open and closed sides are correctly localized as being on the left and right respectively. The bicycle bell seems on the unusual side, even though the observer can turn the handle bars in the correct direction. Taylor [1962] has performed a study similar to Kohler's, except that instead of wearing the inverting spectacles continuously, his subject wore them only for a limited period each day. Under these conditions the subject rapidly obtains a bistable form of adaptation, adapted to both wearing and not wearing the spectacles. A point stressed by Taylor, in support of his behaviorist theory[26], is that adaptation is specific to the particular body parts (arms, legs, torso) or activities (standing on both feet, on one foot, on the toes, riding a bicycle) that the subject has had training with, and that there is little "interpenetration" from one such sensorimotor system to another.

A theory of vision in which there is a picture-like internal representation of the outside world would not easily account for the fragmentation of visual perception described in these experiments: for example it would be hard to explain the case of the license plate, where one aspect of a scene appears oriented accurately, and yet another aspect, sharing the same retinal location, appears inverted. On the other hand, the present theory, in which vision is knowledge of sensorimotor transformations, and the ability to act, readily provides an explanation: reading alphabetic characters involves a subspecies of behavior connected with reading, judging laterality involves another, independent, subspecies of behavior, namely reaching. An observer adapting to an inverted world will in the course of adaptation only be able to progressively probe subsets of the sensorimotor contingencies that characterize his or her new visual world, and so inconsistencies and contradictions may easily arise between "islands" of visuo-motor behavior[27].

Particularly interesting are cases of double vision when only one eye is open, i.e. not explicable by diplopia. For example, Kohler's observer Grill saw two points of light when only one was presented slightly to the right of the median line (the second point was seen weaker, on the left, symmetrical to the original point). Similar observations of symmetrical "phantoms" were noticed by Stratton [1897], and can be compared to cases of monocular diplopia reported in strabismus (Ramachandran, Cobb, & Levi [1994b]; Ramachandran, Cobb, & Levi [1994a]; Rozenblium Iu & Korniushina [1991]). Taylor [1962] says of his subject wearing left-right inverting spectacles:

"Another of the training procedures he adopted was to walk round and round a chair or table, constantly touching it with his body, and frequently changing direction so as to bring both sides into action. It was during an exercise of this kind, on the eighth day of the experiment, that he had his first experience of perceiving an object in its true position. But it was a very strange experience, in that he perceived the chair as being both on the side where it was in contact with his body and on the opposite side. And by this he meant not just that he knew that the chair he saw on his left was actually on his right. He had that knowledge from the beginning of the experiment. The experience was more like the simultaneous perception of an object and its mirror image, although in this case the chair on the right was rather ghost-like." (p. 201-202)

Presumably what happens in these experiments is that, because the spatial location or orientation of an object with respect to the body can be attributed either with respect to the pre- or the post-adapted frame of reference, during the course of adaptation it can sometimes be seen as being in both. Furthermore, orientation and localization of objects in the field of view can be defined with respect to multiple referents, and within different tasks, and each task may have adapted independently, thereby giving rise to incoherent visual impressions.

The impression we have of seeing a coherent world thus arises through the knitting together of a number of separate sensory and sensory-motor components, making use of visual, vestibular, tactile and proprioceptive information, and in which different behaviors (e.g. reading, grasping, bicycle riding) constitute components that adapt independently, but that each contribute to the experience of seeing. Conclusions of this kind have also been reached in a wealth of research on sensorimotor control, where it is shown that a gesture such as reaching for an object is composed of a number of sub-components (e.g. ballistic extension of the arm, fine control of the final approach and finger grasping, etc.), each of which may obey independent spatial and temporal constraints, and each of which may be controlled by different cerebral subsystems, which adapt separately to perturbations like changes in muscle proprioception, or in vestibular and visual information (for reviews of these results, see Jeannerod [1997]; Rossetti, Koga, & Mano [1993][28]).

5.10 Change blindness experiments

The idea that the world constitutes an outside memory, and that we only see what we are currently attending to, was the impetus for a number of surprising experiments performed recently on "change blindness"[29] (Rensink et al. [1997]; [2000]; O'Regan, Deubel, Clark, & Rensink [2000]; O'Regan, Rensink, & Clark [1999]). In these experiments, observers are shown displays of natural scenes, and asked to detect cyclically repeated changes, such as a large object shifting, changing color, or appearing and disappearing. Under normal circumstances a change of this type would create a transient signal in the visual system that would be detected by low-level visual mechanisms. This transient would exogenously attract attention to the location of the change, and the change would therefore be immediately seen.

However in the change blindness experiments, conditions were arranged such that the transient that would normally occur was prevented from playing its attention-grabbing role. This could be done in several ways. One method consisted in superimposing a very brief global flicker over the whole visual field at the moment of the change. This global flicker served to swamp the local transient caused by the change, preventing attention from being attracted to it. A similar purpose could be achieved by making the change coincide with an eye saccade, an eye blink, or a film cut in a film sequence (for reviews, see Simons & Levin [1997][30]). In all these cases a brief global disturbance swamped the local transient and prevented it attracting attention to the location of the change. Another method used to prevent the local transient from operating in the normal fashion was to create a small number of additional, extraneous transients distributed over the picture, somewhat like mudsplashes on a car windscreen (cf. O'Regan et al. [1999]). These local transients acted as decoys and made it likely that attention would be attracted to an incorrect location instead of going to the true change location.

The results of the experiments showed that in many cases observers have great difficulty seeing changes, even though the changes are very large, and occur in full view -- they are perfectly visible to someone who knows what they are. Such results are surprising if one espouses the view that we should "see" everything that we are looking at: It is very troubling to be shown a picture where a change is occurring repetitively and in full view, without being able to see the change. The experience is quite contradictory with one's subjective impression of richness, of "seeing everything" in the visual field. However, the results are completely coherent with the view of seeing which is being defended here.

Another aspect of these experiments which relates to the present theory is a result which observed in an experiment in which observers' eye movements were measured as they performed the task (O'Regan et al. [2000]). It was found that in many cases, observers could be looking directly at the change at the moment the change occurred, and still not see it. Again, under the usual view that one should see what one is looking at, this is surprising. But under the view that what one sees is the aspect of the scene which one is currently "visually manipulating", then it is quite reasonable to observe that only a subset of scene elements that share a particular scene location should at a given moment be perceived.

A striking result of a similar nature had been observed by Haines [1991] and Fisher, Haines, & Price [1980], who had professional pilots land an aircraft in a flight simulator under conditions of poor visibility, and using a head-up display (or "HUD") -- that is, a display which superimposed flight guidance and control information on the windshield. On various occasions during the pilot's landing approach, they were presented with unexpected "critical" information in the form of a large jet airplane located directly ahead of them on the runway. Although the jet airplane was perfectly visible despite the head-up display (see Figure 3), presumably because of the extreme improbability of such an occurrence, and because the pilots were concentrating on the head-up display or the landing maneuver, two of the eight experienced commercial pilots simply did not see the obstacle on the two occasions they were confronted with it, and simply landed their own aircraft through the obstacle. On later being confronted with a video of what had happened, they were incredulous[31].

Fig. 3. Simulator pilot's forward visual scene at an altitude of 72 feet and 131 knots with runway obstruction clearly visible. From Haines [1991]

Other results showing that people can be looking directly at something and not see it had previously been obtained by Neisser & Becklen [1975], who used a situation which was a visual analogue of the "cocktail party" situation, where party-goers are able to attend to one of many superimposed voices. In their visual analogue, Neisser & Becklen visually superimposed two independent film sequences, and demonstrated that observers were able to single out and follow one of the sequences, while being oblivious of the other. Simons & Chabris [1999] have recently replicated and extended these effects.

Finally Mack & Rock [1998] and Mack, Tang, Tuma, Kahn, & Rock [1992] have done a number of experiments using their paradigm of "inattentional blindness". In this, subjects will be engaged in an attention-intensive task such as determining which arm of a cross is longer. After a number of trials, an unexpected, perfectly visible, additional stimulus will appear near the cross. The authors observe that on many occasions this extraneous stimulus is simply not noticed[32].

5.11 Inattentional amnesia

Related to the idea that the world serves as an outside memory are the intriguing experiments of Wolfe [1997], [1999], and Wolfe, Klempen, & Dahlen [1999] which they interpret in terms of what they call "inattentional amnesia".

Wolfe et al. [1999] use a standard visual search paradigm in which a subject must search for a target symbol among a number of distractor symbols. The authors estimate the efficiency of the search in milliseconds per item searched. However, instead of using a new display of distractors on each trial as is usually done, the authors use exactly the same visual display over a number of repetitions, but each time change the target that the subject is looking for. Since subjects are looking at the same display, which remains continuously visible on the screen for anything from 5 to 350 repetitions, depending on the experiment, one might have expected that an internal representation of the display would have time to build up, allowing search rate to improve over repetitions. However this is not what is found: Over a number of experiments using different kinds of stimuli, Wolfe et al. [1999] find no evidence of improvement in search rate. It seems that no internal representation of the display is being built up over repetitions. In fact, search rate is as bad after many repeated searches as in the normal visual search conditions when the display changes at every trial: in other words, it is as though the subjects think they are searching through a brand new display at each trial, even though it is exactly the same display as before. Furthermore, an experiment done where the display is memorized and not visually presented at all, actually shows faster search speeds than when the display is present.

The results of these experiments are surprising under the view that what we see consists of an internal, more or less picture-like, representation of the visual world. However they are exactly what would be expected under the present view, according to which "seeing" consists, not of having a "picture" in the mind, but of having seeking-out-routines that allow information to be obtained from the environment. Thus, observers generally do not bother to recreate within their minds a "re-"presentation of the outside world, because the outside world itself can serve as a memory for immediate probing. Indeed, the last result showing faster performance in the pure memory search shows that the very presence of a visual stimulus may actually obligatorily cause observers to make use of the world in the "outside memory" mode, even though it is less efficient than using "normal" memory.

This way of interpreting the results is also in broad agreement with Wolfe's point of view (Wolfe [1997]; [1999]) -- Wolfe also refers to the notion of "outside memory". However Wolfe lays additional emphasis on the role of attention in his experiments: Following the approach of Kahneman, Treisman, & Gibbs [1992] adopted by many workers in the attention literature, Wolfe believes that before attention is brought to bear on a particular region of the visual field, the elementary features (such as line segments, color patches, texture elements) analyzed automatically by low-level modules in the visual system constitute a sort of "primeval soup" or undifferentiated visual "stuff". Only once attention is applied to a particular spatial location, can the features be bound together so that an object (or recognizable visual entity) is perceived at that location. Wolfe's interesting proposition is now that when visual attention subsequently moves on to another location, the previously bound-together visual entities disagregate again and fall back into the "primeval soup": the previously perceived entity is no longer seen. This idea prompts Wolfe to use the term "inattentional amnesia", to emphasize the fact that after attention has moved on, nothing is left to see.

The status of the notion of attention in this explanation, and its relation to the theory presented here, is not entirely clear. One possibility would be to assume that what Wolfe means by "attention" is nothing other than visual awareness. In that case the result of the experiment could be summarized by saying "once your awareness has moved off a part of the scene, you are no longer aware of it"... which is tautological. Presumably therefore what Wolfe means by attention is something independent of awareness: there would be forms of attention without awareness and forms of awareness without attention. It is clear that further thought is needed to clarify these questions.

Independently of the framework within which one places oneself, it remains an interesting question to ask: What does the primeval soup "look like"? In other words, what does the visual field look like when the observer is not attending to anything in particular in it? Our preference would be to take the strict sense of attention in which attention = awareness, and to say that without attending to something (i.e. without being aware of anything), by definition the visual field cannot look like anything at all. Only when the observer attends to something will he or she be aware of seeing it. Note that what the observer attends to can be something as basic as overall brightness or color, or something like the variability in these ("colorfulness"?, "texturedness"?), or some attribute like "verticality" or "blobiness". If such features constitute the "primeval soup", then, like normal targets in the search task, the primeval soup would also only be "seen" if it was being attended to.

5.12 Informal examples

While the examples given in the preceding sections are striking experimental demonstrations of the fact that you do not always see where you look, several more informal demonstrations also speak to the issue.

Proofreading is notoriously difficult: when you look at words, you are processing words, not the letters that compose them. If there is an extra, incorrect letter in a word, it will have been processed by your low-level vision modules, but it will not have been "seen". Thus, for example, you will probably not have noticed that the "a"s in the last sentences were of a different shape than elsewhere[33]. Nonetheless on several occasions you were undoubtedly looking directly at them. It may take you a while to realize that the sign below (Fig. 4) does not say: The illusion of "seeing".[34] You may be furious to find confirmation of years of the scientific study of reading showing that in this sentence there are in fact more "f"s than you think (count them!).[35]

Fig. 4. Ceci n'est pas: The illusion of "seeing".

The phenomena of figure-ground competition (see Figure 5) and of ambiguous figures are also striking examples of how you do not see everything that you could see: when looking at such stimuli, you only see one of the possible configurations, even though more than one may be simultaneously available at the same location in your visual field.

Fig. 5. Figure-ground competition.

It sometimes occurs that as you walk in the street you look directly at someone without seeing them. Only when the person gesticulates or manifests their irritation at not being recognized, do you become aware of who they are. While driving; it sometimes happens that you realize that you have been looking for a while at the brake lights of the car ahead of you without pressing on the brake.

5.13 Remote tactile sensing

An immediate consequence of the notion that experience derives not from sensation itself, but from the rules that govern action-related changes in sensory input, is the idea that visual experience should be obtainable via channels other than vision, provided that the brain extracts the same invariants from the structure of the sensori-motor contingencies.

A number of devices have been devised to allow people with deficits in one sensory modality to use another modality to gain information. In the domain of vision, two main classes of such sensory substitution devices have been constructed: echolocation devices and tactile visual substitution devices.

Echolocation devices provide auditory signals which depend on the direction, distance, size, and surface texture of nearby objects, but they provide no detailed shape information. Nevertheless such devices have been extensively studied as prostheses for the blind, both in neonates (Sampaio & Dufier [1988]; Sampaio [1989]; Bower [1977]) and in adults (Ifukube, Sasaki, & Peng [1991]). It is clear that while such devices obviously cannot provide visual experience, they nevertheless provide users with the clear impression of things being "out in front of them".

Particularly interesting is the work being done by Lenay [1997], using an extreme simplification of the echolocation device, in which a blind or blindfolded person has a single photoelectric sensor attached to his or her forefinger, and can scan a simple environment (e.g. consisting of several isolated light sources) by pointing. Every time the photosensor points directly at a light source, the subject hears a beep or feels a vibration. Depending on whether the finger is moved laterally, or in an arc, the subject establishes different types of sensorimotor contingencies: lateral movement allows information about direction to be obtained, movement in an arc centered on the object gives information about depth. Note several interesting facts. First, users of such a device rapidly say that they do not notice vibrations on their skin or hear sounds, rather they "sense" the presence of objects outside of them. Note also that at a given moment during exploration of the environment, subjects may be receiving no beep or vibration whatever, and yet "feel" the presence of an object before them. In other words the experience of perception derives from the potential to obtain changes in sensation, not from the sensations themselves. Note also that the exact nature or body location of the stimulation (beep or vibration) has no bearing on perception of the stimulus -- the vibration can be applied on the finger or anywhere else on the body. This again shows that what is important is the sensorimotor invariance structure of the changes in sensation, not the sensation itself.

Lenay's very simple setup provides a concrete example of what is meant by the laws of sensorimotor contingency. Suppose that the photosensor were mounted on the forearm of an articulated arm, with the arm making an angle [[alpha]] with the torso, and the forearm making an angle [[beta]] with the arm, as shown in the Figure 6. Then we can define the sensorimotor manifold as the two-dimensional space [[alpha]] : [0, [[pi]]/2] and [[beta]] : ]3[[pi]]/2-a, 2[[pi]][. Consider the situation where we are obtaining information about depth by making movement in an arc. If a luminous source at distance L is being "fixated", the angles [[alpha]] and [[beta]] will lie on orbits in the sensorimotor sensorimotor manifold defined by the relation shown in the lower part of the figure. In reality of course the angles [[alpha]] and [[beta]] will be nonlinear functions of high-dimensional neural population vectors corresponding to arm and forearm muscle parameters. But the laws of contingency will be the same.

The arm (with the forearm) has a length of 1. The distance from the target, L (0S), can then be obtained by a trigonometrical relation, according to the following formula: (1) L = sin [[alpha]] - cos [[alpha]] tan([[alpha]]+[[beta]]), where [[alpha]] : [0, [[pi]]/2] and [[beta]] : ]3[[pi]]/2-a, 2[[pi]][

Curve representing angle [[beta]] in relation to angle [[alpha]] (both expressed in radians) for the following values of L = 0,1,...,7. [[alpha]] varies from 0 to [[pi]]/2. According to (1) one can determine [[beta]] for any given L and [[alpha]]: [[beta]] = 2[[pi]] - [[alpha]] + Atan( (sin[[alpha]]-L)/cos[[alpha]] )

Figure 6. Figure from Lenay, Canu, & Villon [1997] showing the sensorimotor contingency orbits for a simple photocell mounted on an arm and a forearm, in the case where the photocell is continuously fixating a luminous source at distance L.

On further reflection it is apparent that the simple device studied by Lenay is an electronic variant of the blind person's cane. Blind persons using a cane do not sense the cane, but the environment outside of them that they are exploring by means of the cane. It has been said that the tactile sensations provided by the cane are somehow "relocated" or "projected" onto the environment. The cane itself is forgotten or ignored. But this way of describing experience with the cane, though in a way correct, is misleading, since it suggests that sensations themselves originally possessed a location which had to be relocated. The present theory shows that in themselves, sensations are situated nowhere. The location of a sensation (and for that matter any perceived aspect including its moment of occurrence) is an abstraction constructed in order to account for the invariance structure of the available sensorimotor contingencies.

Note that similar experiences to those of the blind person with the cane are experienced every day even by sighted persons: Car drivers "feel" the wheels on the road, and extend the sense of their bodies to include the whole car, allowing them to negotiate into parking spaces with only centimeters to spare. A particularly poignant example of having one's perceived body extend outside of the boundary formed by the skin was given to the first author by a friend who is a talented viola player. Spending most of the day with the viola under his chin, on one occasion he went into the kitchen to drink some hot tea, and some drops fell on the viola. He said he was surprised not to have felt the hot drops on the instrument: it felt anesthetized. Another everyday example of remote tactile sensing occurs when you write on a piece of paper with a pen: you feel the paper at the end of the pen: it is rough, it is smooth, it is soft. You locate the contact at the end of the pen, not on your fingers where the force is actually felt (this example is given by James [1890/1950]).

One might consider these examples as surprising at first sight. But then we ask: should it not also be considered surprising that fingertip sensations are felt on the fingertips, since after all, it is presumably in the brain where the sensations are registered? Why would one not tend to think that one should be able to walk through a door no wider than one's brain, since body sensations presumable arrive in the brain? Indeed, given that visual sensation impregnates the retina, why does one not feel the outside world as situated on one's retina, instead of outside one? These obviously ridiculous extensions of the "relocation" idea discussed above make one realize that actually the perceived location of a sensation logically cannot be determined by where the nerves come from or where they go to. Perceived location is, like other aspects of sensation, an abstraction that the brain has deduced from the structure of the sensorimotor contingencies that govern the sensation[36].

Some very interesting experiments of Tastevin [1937] are related to these points. Tastevin had shown that the sensed identity or position of a limb can be transferred to another limb or to a plaster model of the limb. Thus, for example, when an experimenter feigns to touch a subject's forefinger with one prong of a compass, but actually touches the middle finger with the other prong, the subject feels the touch on the forefinger. Sensation has thus been relocated from the middle finger to the forefinger. Whole body parts can be relocalized by this means. A recent experiment along very similar lines was described by Botvinick & Cohen [1998] (and also extended by Ramachandran & Blakeslee [1998]). These authors used a life-size rubber model of a left arm placed before a subject whose real left arm was hidden by a screen. Using two small brushes, the experimenters synchronously stroked corresponding positions of the rubber and real arm. After ten minutes, subjects came to feel that the rubber arm was their own.

All these phenomena show how labile the perceived location of a stimulation can be, and how it depends on correlation with information from other modalities (in this case vision). Even neural representations of body parts are known to be labile, as has been shown by Iriki, Tanaka, & Iwamura [1996] whose macaque monkeys' bimodal visual somatosensory receptive fields moved from their hands to the ends of a rake they used as a tool. However a facile interpretation of such phenomena in terms of "neural plasticity" of cortical maps would be misleading, since such an interpretation would implicitly assume that perceived location of a stimulus is directly related to activity in cortical maps -- an idea we reject.

5.14 Tactile visual sensory substitution

Tactile visual substitution systems (TVSS) use an array of vibratory or electrical cutaneous stimulators to represent the luminance distribution captured by a TV camera on some skin area such as the back, the abdomen, the forehead or the fingertip. For technical reasons and because of the restrictions on tactile acuity, TVSS devices have up to now suffered from very poor spatial resolution, generally having stimulator arrays of not more than 20 x 20 stimulators at the very best. They have also been bulky, expensive, and too sensitive to light level variations, for them to be of practical use by the blind (Easton [1992]; Bach-y-Rita [1983]). Notwithstanding these problems however, as concerns the question of visual experience, a number of highly interesting points have been made about the experiences of individuals who have used these devices (Guarniero [1977]; [1974]; Apkarian [1983])

Figure 7. A blind subject with a "Tactile Visual Substitution system". A TV camera (mounted on spectacle frames) sends signals through electronic circuitry (displayed in right hand) to an array of small vibrators (left hand) which is strapped against the subject's skin. The pattern of tactile sitmulation corresponds roughly to a greatly enlarged visual image. (Photograph courtesy of P. Bach-y-Rita). From Morgan [1977].

A first point concerns the importance of the observer's being able to manipulate the TV camera himself or herself (Bach-y-Rita [1972]; [1984]; Sampaio [1995]).

In the earliest trials with the TVSS device, blind subjects generally unsuccessfully attempted to identify objects that were placed in front of the camera, which was fixed. It was only when the observer was allowed to actively manipulate the camera that identification became possible and observers came to `see' objects as being externally localized (White, Saunders, Scadden, Bach-y-Rita, & Collins [1970]). This important point constitutes an empirical verification