Mental Imagery: In search of a
theory*
Zenon W. Pylyshyn
Rutgers Center for Cognitive Science
New Brunswick, NJ
ABSTRACT
It is generally accepted that there is something special about reasoning that uses mental images. The question of how it is special, however, has never been satisfactorily spelled out, despite over thirty years of research in the post-behaviorist tradition. This article considers some of the general motivation for the assumption that entertaining mental images involves inspecting a picture-like object. It sets out a distinction between phenomena attributable to the nature of mind, to what is called the cognitive architecture, and ones that are attributable to tacit knowledge used to simulate what would happen in a visual situation. With this distinction in mind the paper then considers in detail the widely held assumption that in some important sense images are spatially displayed or are depictive, and that examining images uses the same mechanisms that are deployed in visual perception. I argue that the assumption of the spatial or depictive nature of images is only explanatory if taken literally, as a claim about how images are physically instantiated in the brain, and that the literal view fails for a number of empirical reasons – e.g., because of the cognitive penetrability of the phenomena cited in its favor. Similarly, while it is arguably the case that imagery and vision involve some of the same mechanisms, this tells us very little about the nature of mental imagery and does not support claims about the pictorial nature of mental images. Finally I consider whether recent neuroscience evidence clarifies the debate over the nature of mental images. I claim that when such questions as whether images are depictive or spatial are formulated more clearly, the evidence does not provide support for the picture-theory over a symbol structure theory of mental imagery. Even if all the empirical claims turned out to be true, the view that many people take them to support, that mental images are literally spatial, remain incompatible with what is known about how images function in thought. We are then left with the provisional counterintuitive conclusion that the available evidence does not support rejection of what I call the “null hypothesis”; viz., that reasoning with mental images involves the same form of representation and the same processes as that of reasoning in general, except that the content or subject matter of thoughts experienced as images includes information about how things would look.
Table of Contents
1 Why is there a problem about mental imagery?
1.1 The pull of subjective
experience
1.2 The imagery
debate: What was it about?
2 What is special about image-based reasoning?
3 Why images exhibit certain properties: Cognitive architecture or tacit
knowledge?
3.1 What knowledge is
relevant to the tacit knowledge explanation?
3.2 Methodological Note: cognitive penetrability as a litmus
4 Problem-solving by “mental simulation”: Some examples
4.2 The “size” of mental
images
5.1 Depiction and mandatory
spatial properties of representations. 12
5.2 Real versus functional
space
5.3 Projected mental
images: Inheriting spatial properties from real space
5.4 Visuomotor interaction
with images
6 Are images “seen” by the visual system?
6.1 The experience of
seeing and of imagining
6.2 Interference between
imaging and visual perception
6.3 Visual illusions
induced by superimposing mental images. 17
6.4 Imagined versus
perceived motion
6.5 Extracting novel
information from images: Visual
(re)perception or inference?
7 Can evidence from neuroscience settle the question?
7.1 Searching for the
“mind’s eye” and the “image” in the brain
7.2 What would it mean if
all the neuroscience claims turned out to be true?
7.3 Is the ‘mind’s eye’
just like a real eye?
7.4 What has recent
neuroscience evidence done for the “imagery debate”?
7.5 Is the “picture
theorist” a straw man?
8 Conclusion: What is special about mental imagery?
Cognitive science is rife with ideas that offend our intuitions. It is arguable that nowhere is the pull of the subjective stronger than in the study of perception and mental imagery. It is not easy for us to take seriously the proposal that the visual system creates something like symbol structures in our brain since it seems intuitively obvious that what we have in our mind when we look out onto the world, as well as when we close our eyes and imagine a scene, is something that looks like the scene, and hence whatever it is that we have in our heads must be much more like a picture than a description. Though we may know that this cannot be literally the case, that it would do no good to have an inner copy of the world, this reasoning appears to be powerless to dissuade us from our intuitions. Indeed, the way we describe how it feels to imagine something shows the extent of the illusion; we say that we seem to be looking at something with our “mind’s eye”. This familiar way of speaking reifies an observer, an act of visual perception, and a thing being perceived. All three parts of this equation have now taken their place in one of the most developed theories of mental imagery (Kosslyn, 1994), which refers to a “mind’s eye” and a “visual system” that examines a “mental image” located in a “visual buffer”. Dan Dennett has referred to this view picturesquely as the “Cartesian Theater” view of the mind (Dennett, 1991) and I will refer to it as the “picture theory” of mental imagery.
There has been a tradition of analyzing this illusion in the case of visual perception, going back to Descartes and Berkeley (it also appears in the 17th century debate between Arnaud and Malebranche – see Slezak, 2000), and revived in modern times by (Gibson, 1966), as well as computationalists like (Marr, 1982). More recently (O'Regan, 1992; O'Regan & Noë, 2001) have argued against the intuitive picture-theory of vision on both empirical and theoretical grounds. Despite the widespread questioning of the intuitive picture view in visual perception, this view remains very nearly universal in the study of mental imagery (with such notable exceptions as Dennett, 1991; Rey, 1981; Slezak, 1995); (see also the critical remarks by Fodor, 1975; Hinton, 1979; Thomas, 1999, and others). Why should this be so? Why do we find it so difficult to accept that when we “examine our mental image” we are not in fact examining an inner state, but rather are contemplating what the inner state is about – i.e., some possible state of the visible world – and therefore that this experience tells us nothing about the nature and form of the representation? Philosophers have referred to this displacement of the object of thought from the (possible) world to a mental state as the “intentional fallacy” and it has much of cognitive science in its grip still.
What I try to do in this paper is show that we are not only deeply deceived by our subjective experience of mental imagery, but that the evidence we have accumulated to support what I call the “picture theory” of mental imagery is equally compatible with a much more parsimonious view, namely that most of the phenomena in question (but not all – see below) are due to the fact that the task of “imaging” invites people to simulate what they believe would happen if they were looking at the actual situation being visualized. I will argue that the alternative picture theory, or depiction-theory, trades so heavily on a systematic ambiguity between the assumption of a literal picture and the much weaker assumption that visual properties are somehow encoded. I will also argue that recent evidence from neuroscience (particularly the evidence of neural imaging) brings us no closer to a plausible picture theory than we were before this evidence was available.
There has been a great deal of discussion in the past 30 years that has come to be referred to as “the imagery debate.” Many people even believe that the debate has, at least in general outline, been put to rest because we now have hard evidence from neuroscience showing what (and where) images are (see, e.g., Kosslyn, 1994; and the brief review in Pylyshyn, 1994a). But if one looks closer at the “debate” one finds that what people think the debate is about is very far from univocal. For example, some people think that the argument that has been settled is whether images, whatever their nature, are fundamentally different from the form of representation involved in other kinds of reasoning, whether there are two different systems of mental codes. For others it is the question of whether images have certain particular properties – e.g. whether they are spatial, or depictive, or analogue. Others feel that the question that has been settled is whether imagery “involves” the visual system. I will argue that none of these claims has been sufficiently well posed to admit of a solution. In this paper I will concentrate primarily on a particular class of theory of mental imagery, which I refer to as “picture theories” and will consider other aspects of the “debate” only insofar as they bear on the alleged pictorial nature of images.
In this article I defend the provisional view, which I refer to as the “null hypothesis,” that at the relevant level of analysis – the level appropriate for explaining the results of many experiments on mental imagery – the process of imagistic reasoning involves the same mechanisms and the same forms of representation as are involved in general reasoning, though with different content or subject matter. This hypothesis claims that what is special about image-based thinking is that it is typically concerned with a certain sort of content or subject matter, such as optical, geometrical, or what we might call the appearance-properties of the things we are thinking about. If so, nothing is gained by attributing a special format or special mechanisms to mental imagery. While the validity of this null hypothesis remains an open empirical question, what is not open, I claim, is whether certain currently popular views can be sustained.
In the interest of full disclosure I should add that I don’t really believe that representations and processes underlying imagery are no different from those involved in other forms of reasoning. Nonetheless, I do think that nobody has yet articulated the specific way that images are different and that all candidates proposed to date are seriously flawed in a variety of ways that are interesting and revealing. Thus using the null hypothesis as a point of departure may allow us to focus more properly on the real differences between imagistic and other forms of reasoning.
Section 2 reviews some observations that have led many people to hold what I will call the “picture theory” of mental images (although a detailed discussion of what characterizes such a theory and what it assumes is postponed until section 5). Section 3 introduces a distinction that is central to our analysis. It distinguishes two reasons why imagery might manifest the properties that are observed in experiments. One reason is that these properties are intrinsic to the architecture of the mental imagery system – they arise because of the particular brain mechanisms deployed in imagery. The other reason is that the properties are extrinsic to the mechanisms employed – they arise because of what people tacitly believe about the situation being imagined, which they then use to simulate certain behaviors that would occur if they were to witness the corresponding situation in reality. This distinction is then applied to some typical experiments on mental imagery where I argue that such experiments tell us little about special dedicated imagery mechanisms. Since section 4 discusses some material that has been published elsewhere, readers who have followed the “imagery debate” may wish to skim this section. Section 5 discusses two widely held views about the nature of mental images (Kosslyn, 1994); that images are “depictive” and that they are laid out in a “functional space”. I claim that the preponderance of evidence argues against the inherent spatial nature of mental images. An exception is evidence from experiments in which subjects project their images onto a visual scene. In this case I claim (section 5.3) that the use of visual indexes and focal attention provides a satisfactory explanation for how spatial properties are inherited from the observed scene, without any need to posit spatial properties of images. In section 5.2 I argue that the notion of a functional space is devoid of any explanatory power, since such a “space” is unconstrained and can have whatever properties one wishes to attribute to it (unless the "space" in the model is assumed to be a simulation of a real spatial display, as in the CRT model described in Kosslyn, Pinker, Smith & Shwartz, 1979, in which case the underlying theory really is the literal picture theory). Section 6 discusses a claim that is assumed to be entailed by the depictive nature of images; namely, that information in an image is accessed through vision. Although there is evidence for some overlap between the mechanisms of imagery and those of vision, a close examination of this evidence shows that it does not support the assumption of a spatial display in either vision or imagery. Section 7 considers evidence from neuroscience, which many writers believe provides the strongest case for a picture theory. Here I argue that, notwithstanding the intrinsic interest of these findings, they do not support the existence of any sort of depictive display in mental imagery. Finally, section 8 closes with a brief discussion of where the “imagery debate” now stands and on the role of imagery in creative thinking.
Imagery seems to follow principles that are different from those of intellectual reasoning and certainly beyond any principles to which we have conscious intellectual access. Imagine a baseball being hit into the air and notice the trajectory it follows. Although few of us could calculate the shape of this trajectory none of us has any difficulty imagining the roughly-parabolic shape traced out by the ball in this thought experiment. Indeed, we can often predict with considerable accuracy where the ball will land (certainly a properly situated professional fielder can). It is very often the case that by visualizing a certain situation, we can predict the dynamics of physical processes that are beyond our ability to solve analytically. Is this because our imagery architecture inherently and automatically obeys the relevant laws of nature?
Opposing the intuition that one’s image unfolds according to some internal principle of natural harmony with the real world, is the obvious fact that it is you alone who controls your image. Perhaps, as (Humphrey, 1951) once put it, viewing the image as being responsible for what happens in your imagining puts the cart before the horse. In the baseball example above, isn’t it equally plausible that the reason the imagined ball takes a particular path is that, under the right circumstances, you can recall having seen a ball inscribe such a path? Surely your image unfolds as it does because you, the image creator, made it do so. You can imagine things being pretty much any size, color or shape that you choose and you can imagine them moving any way you like. You can, if you wish, imagine a baseball sailing off into the sky or following some bizarre path, including getting from one place to another without going through intervening points, as easily as you can imagine it following a more typical trajectory. You can imagine all sorts of physically impossible things happening — and cartoon animators frequently do, to our amusement.
Some imagery theorists might be willing to concede that in imagining physical processes we must use our tacit knowledge of how things work, yet insist that the optical and geometrical properties of images are true intrinsic properties, despite that fact that the dynamic properties of images are very often cited in studies of mental images – properties such as mental rotation, mental scanning, or “representational momentum” discussed in sections 3.1 and 4. Nonetheless, the suggestion that the intrinsic properties of images are geometrical rather than dynamic makes sense both because spatial intuitions are among the most entrenched, and because there is evidence (Pylyshyn, 1999) that geometrical and optical-geometrical constraints are built into the early-vision system, as so-called “natural constraints”. While we can easily imagine the laws of physics being violated, it seems nearly impossible to imagine the axioms of geometry and geometrical optics being violated. Try imagining a four-dimensional block or how a cube looks when seen from all sides at once or what it would look like to travel through a non-Euclidian space. However, before concluding that these examples illustrate the intrinsic geometry of images, consider whether your inability to imagine these things might not be due to your not knowing, in a purely factual way, how these things might look (i.e., where edges, shadows and other contours would fall)? The answer is by no means obvious. It has even been suggested (Goldenberg & Artner, 1991) that certain deficits in imagery ability resulting from brain damage, are a consequence of a deficiency in the patient’s knowledge about the appearance of objects. At the minimum we are not entitled to conclude from such examples that images have the sort of inherent geometrical properties that we associated with pictures.
We also need to keep in mind that notwithstanding one’s intuitions, there is reason to be skeptical about what one’s subjective experience reveals about the form of a mental image. After all, when we look at an actual scene we have the unmistakable subjective impression that our perceptual representation corresponds to a detailed three-dimensional panoramic view, yet it has now been convincingly demonstrated that the information available to cognition from a single glance is extremely impoverished, sketchy and unstable and that very little is carried over across saccades (see, for example, Blackmore, Brelstaff, Nelson & Troscianko, 1995; Carlson-Radvansky, 1999; Carlson-Radvansky & Irwin, 1995; Intraub, 1981; Irwin, 1993; O'Regan, 1992; O'Regan & Noë, 2001; Rensink, 2000a; Rensink, 2000b; Rensink, O'Regan & Clark, 1997; Rensink, O'Regan & Clark, 2000; Simons, 1996). Indeed, there is now considerable evidence that we visually encode very little in a visual scene unless we explicitly attend to the items in question and that we do that only if our attention or our gaze is attracted to it (Henderson & Hollingworth, 1999), (although see O'Regan, Deubel, Clark & Rensink, 2000). There are remarkable demonstrations that when presented with alternating images, people find it extremely difficult to detect a difference between the two – even a salient difference in a central part of the image.[1] This so-called change blindness phenomenon (Simons & Levin, 1997) suggests that, despite our phenomenology, we are nowhere near having a detailed internal display since the vast majority of information in a visual scene goes unnoticed and unrecorded. It would thus be reasonable to expect that our subjective experience of mental imagery would be an equally poor guide to the form and content of the information in our underlying cognitive representation.
Nobody denies that the content and behavior of our mental images can be the result of what we intend our images to show, what we know about how things in the world look and work, and the way our cognitive or our imagery system constrains us. The important question about mental imagery is; which properties and mechanisms are intrinsic to, or constitutive of having and using mental images, and which arise because of what we believe, intend, or attribute to the situation we are imagining.
The distinction between effects attributable to the intrinsic nature of mental representations and mechanisms, and those attributable to more transitory states, such as people’s beliefs, utilities, habits, or interpretation of the task at hand, is central not only for understanding the nature of mental imagery, but for understanding mental processes in general. Explaining the former kind of phenomena requires that we appeal to what has been called the cognitive architecture (Fodor & Pylyshyn, 1988; Newell, 1990; Pylyshyn, 1980; Pylyshyn, 1984; Pylyshyn, 1991a; Pylyshyn, 1996) – one of the most important ideas in cognitive science. It refers to the set of properties of mind that are fixed with respect to certain kinds of influences. In particular, the cognitive architecture is, by definition, not directly altered by changes in knowledge, goals, utilities or any other representations (e.g., fears, hopes, fantasies, etc). In other words when you find out new things or when you draw inferences from what you know or when you decide something, your cognitive architecture does not change. Of course, if as a result of your state of beliefs and desires you decide to take drugs or to change your diet or even to repeat some act over and over, this can result in changes to your cognitive architecture, but such changes are not a direct result of the changes in your cognitive state. A detailed technical exposition of the distinction between effects attributable to knowledge or other cognitive states and those attributable to the nature of cognitive architecture is beyond the scope of this article (although this distinction is the subject of extensive discussion in Pylyshyn, 1984, Chapter 7). The following example (discussed at greater length in, Pylyshyn, 1984) will have to do for present purposes.
Suppose we have a box of unknown construction, and we discover that it exhibits particular systematic behaviors. The box emits long and short pulses according to the following pattern: pairs of short pulses most often precede single short pulses, except when a pair of long-short pulses occurs first. What is special about this example is that it illustrates a case where the observed behavior, though completely regular when the box is in its “ecological niche,” is not due to the nature of the box (to how it is constructed) but to an entirely extrinsic reason. The reason this particular pattern of behavior occurs can only be understood if we know that the pulses are codes, and the pattern is due to a regularity in what they represent, in particular that the pulses represent English words spelled out in International Morse Code. The observed pattern does not reflect how the box is wired or its functional architecture; it is due entirely to a regularity in the way English words are spelled (the principle being that generally i comes before e except after c). Similarly, I have argued that in most of the core experiments on mental imagery – such as the mental scanning case described in section 4.1 – the pattern does not reveal the nature of the mental architecture involved in imagery, but reflects a principle that observers know governs the world being imagined. The reason that under certain conditions the behavior of both the code box and the cognitive system does not reveal properties of its intrinsic nature (of its architecture) is that both are capable of quite different regularities if the world they were representing behaved differently. They would not have to change their architecture in order to change their behavior. The latter observation, concerning the plasticity of non-architectural properties of thought, is the key to a methodology I have called “cognitive penetrability” for deciding whether tacit knowledge or cognitive architecture is responsible for some particular observed regularity (see section 3.2).
In interpreting the results of imagery experiments, it is clearly important to distinguish between cognitive architecture and tacit knowledge as possible causes. Take the following example. You are asked what color you see if you look through a yellow filter superimposed on a blue filter. The way that many of us would go about solving this problem, if we did not know the answer as a memorized fact, is to imagine a yellow filter and a blue filter being superimposed; we generally use the “imagine” strategy when we want to solve a problem about how certain things look. What color do you see in your image when the two filters are overlapped? Now ask yourself why you see that color in your mind’s eye rather than some other color? Some people (e.g., Kosslyn, 1981) have argued that the color you see follows from a property of imagery, presumably some property of how colors are encoded and displayed in images. But since there can be no doubt that you can make the overlapping part of the filters be any color you wish, it can’t be that the image format or the architecture involved in representing colors is responsible. What else can it be? It seems clear in this case that the color you “see” depends on your tacit knowledge of the principles of color mixing or a recollection of how these particular colors combine (having seen something like them in the past). In fact, people who do not know about subtractive color mixing generally give the wrong answer: mixing yellow light with blue light produces white light, but superimposing yellow and blue filters allows only green light to pass through.