Preprint of: Hommel, B., Müsseler, J., Aschersleben, G. & Prinz, W. (2001) The Theory of Event Coding (TEC): A Framework for Perception and Action Planning

Behavioral and Brain Sciences 24 (4): XXX-XXX.


This paper is also available in PDF FORMAT



This is the unedited final draft of a BBS target article that has been accepted for publication (Copyright 2000: Cambridge University Press) and is currently being circulated for Open Peer Commentary.

This preprint is for inspection only, to help prospective commentators decide whether or not they wish to prepare a formal commentary.

Please do not prepare a commentary unless you have received a formal invitation indicating that it has been possible to include you in the final list of invited commentators.

For information on becoming a commentator on this or other BBS target articles, write to calls@bbsonline.org

For information about subscribing or purchasing offprints of the published version, with commentaries and author's response, write to:

journals_subscriptions@cup.org (North America)
journals_subscriptions@cup.cam.ac.uk (All other countries).



The Theory of Event Coding (TEC):

A Framework for Perception and Action Planning

Bernhard Hommel, Jochen Müsseler, Gisa Aschersleben, Wolfgang Prinz

Max Planck Institute for Psychological Research, Munich,

Germany

Address correspondence to

Wolfgang Prinz
Max-Planck-Institut für Psychologische Forschung
Amalienstraße 33
D-80799 München
Federal Republic of Germany
%nbsp;
Tel: +int(89)38602-256, Fax: +int(89)38602-290

Email: hommel@fsw.leidenuniv.nl
Email: muesseler@mpipf-muenchen.mpg.de
Email: aschersleben@mpipf-muenchen.mpg.de
Email: prinz@mpipf-muenchen.mpg.de

http://www.mpipf-muenchen.mpg.de/~prinz


Bernhard Hommel (born 1958 in Niederstotzingen, Germany) studied psychology and literature at the University of Bielefeld, where he also worked as research assistant 1987-1990 and completed his dissertation. He then moved to the Max- Planck Institute of Psychological Research in Munich to work as a senior researcher. Since 1999 he is Full Professor (Chair of General Psychology) at the University of Leiden, Netherlands. He has published empirical and theoretical work on human attention, planning and control of action, and the relationship between perception and action.
Jochen Müsseler is senior scientist at the Max Planck Institute for Psychological Research (Department of Cognition and Action) in Munich. After graduating at the University of Bochum, he obtained his PhD at the University of Bielefeld before moving to Munich. His research focuses on the interface of perception and action, the perception of space and time, and attentional mechanisms.
Gisa Aschersleben studied psychology at the Universities of Bielefeld and Braunschweig, Germany und completed her PhD at the University of Munich. She worked as research assistant at the Technical University of Munich 1987-1991. Since 1991 she joined the Max Planck Institute for Psychological Research in Munich where she holds the position of a senior scientist. Scince 2000 she is head of the research group Infant Cognition and Action. Her main research interests are in perception-action coupling, control of action, attention, and intersensory integration and early development of action control.
Wolfgang Prinz studied Psychology, Philosophy and Zoology at the University of Muenster, Germany. He took his Ph.D. in 1970 at the Dept. of Philosophy, Pedagogics and Psychology of the Ruhr-University Bochum. He was a Full Professor of Psychology at the University of Bielefeld (1975-1990) and at the Ludwig-Maxi-milians-University Munich (1990-1998). Since 1990 he is a Director at the Max Planck Institute for Psychological Research, Munich. He has published empirical and theoretical work on perception, action, consciousness and attention as well as on history of psychology.


Short Abstract

We argue that traditional approaches to human information processing tend to deal with perception and action planning in isolation and, therefore, fail to give an adequate account of the perception-action interface. The theory we propose instead assumes that perceptual contents and action goals are cognitively represented by composite codes of their distal features, that is, perceived and to-be-produced events are coded within a common representational medium. Our main assumptions are well supported by available evidence from a wide variety of empirical domains and are likely to stimulate new questions and lines of research.


Long Abstract

Traditional approaches to human information processing tend to deal with perception and action planning in isolation, so that an adequate account of the perception-action interface is still missing. On the perceptual side, the dominant cognitive view largely underestimates, and thus fails to account for, the impact of action-related processes on both the processing of perceptual information and on perceptual learning. On the action side, most approaches available conceive of action planning as a mere continuation of stimulus processing, thus failing to account for the goal-directedness of even the simplest reaction in an experimental task. We propose a new framework for a more adequate theoretical treatment of perception and action planning, a theory postulating that perceptual contents and action plans are coded in a common representational medium by feature codes with distal reference. Accordingly, perceived events (perceptions) and to-be-produced events (actions) are equally represented by integrated, task-tuned networks of feature codes--cognitive structures we call event codes. We give an overview of evidence from a wide variety of empirical domains, such as spatial stimulus-response compatibility, sensorimotor synchronization, or ideomotor action, showing that our main assumptions are well supported by the data.


Keywords: action planning, perception, perception-action interface, event coding, common coding, feature integration, binding.


1. INTRODUCTION AND OVERVIEW

We propose a new theoretical framework for the cognitive underpinnings of perception and action planning, the Theory of Event Coding (TEC). Basically the theory holds that cognitive representations of events (i.e., of any to-be-perceived or to-be-generated incident in the distal environment) subserve not only representational functions (e.g., for perception, imagery, memory, reasoning, etc.) but action-related functions as well (e.g., for action planning and initiation). According to TEC, the core structure of the functional architecture supporting perception and action planning is formed by a common representational domain for perceived events (perception) and intended or to-be-generated events (action).

In a nutshell, we believe that it makes sense to assume that the stimulus representations underlying perception and the action representations underlying action planning are coded and stored not separately, but together in a common representational medium. This implies that stimulus and response codes are not entities of a completely different kind, but only refer to, and thus represent, different events in a particular task and context. Thus, it would be misleading to speak of stimulus codes and response or action codes unless one wishes to refer to the roles played by a code or the event it represents (see section 3.2.4). Irrespective of this role, though, cognitive codes are always event codes--codes of perceived or (to-be-)produced events.

Though TEC is meant to provide a new framework for perception and action planning, its scope is limited in the following sense: As regards perception its focus is on ’late’ cognitive products of p erceptual processing that stand for, or represent, certain features of actual events in the environment. TEC does not consider the complex machinery of the ’early’ sensory processes that lead to them. Conversely, as regards action, the focus is on ’early’ cognitive antecedents of action that stand for, or represent, certain features of events that are to be generated in the environment (= actions). TEC does not consider the complex machinery of the "late" motor processes that subserve their realization (i.e., the control and coordination of movements). Thus, TEC is meant to provide a framework for understanding linkages between (late) perception and (early) action, or action planning. Therefore, we do not claim that TEC covers all kinds of interactions between perception and action exhaustively. The same applies to the representationalist approach inherent in TEC. Though we do believe that the representationalist stance that we adopt forms an appropriate metatheoretical framework for our theory, we do not want to imply, or suggest, that it is necessary or appropriate for understanding other kinds of interactions between perception and action.

As we will point out below, TEC differs from other, in part related approaches to perception and action planning. In contrast to the classical information processing view it does not see perception and action planning as different, functionally separable stages but as intimately related, sometimes even indistinguishable processes producing a wide variety of interactions. This perspective TEC shares with the ecological approach, from which it differs, however, by its emphasis on representational issues–which are anathema to ecological approaches by definition. Finally, TEC does not deal with the question of how consciousness and externally-guided pointing and grasping movements are related, a topic gaining its attraction from demonstrations that both can be dissociated under specific conditions (for reviews, see Milner & Goodale, 1995; Rossetti & Revonsuo, 2000). Although these demonstrations are interesting and challenging, we do not see how they could provide a basis for the general theory of intentional action we aim at.

Our argument for TEC will take three major steps. In the first step we review classical approaches to perception and action planning, with special emphasis on their mutual linkages. As regards perception, this review will show that cognitive approaches to perception that are based on linear stage theory and its derivatives do not give an adequate account of the interface between perception and action planning. Unlike approaches in the domain of spatial orientation they neither have a way to accommodate the interaction between perception and action planning in a satisfactory way nor to account for the impact of action-related knowledge on perception. As regards action and action control, our review will also show that classical approaches are insufficient on both theoretical and empirical grounds. They neither provide an adequate role for action goals nor can they account for evidence witnessing the operation of similarity between perception and action planning. Based on these two reviews we will conclude that a new framework is needed for taking care of the cognitive underpinnings of the mutual interaction between perception and action planning.

Our review of shortcomings of classical theories will help us to put constraints on a new framework claiming to account for these issues in a more adequate way. As a result, TEC is presented in the second step. This theory proposes as its core contention that codes of perceived events and planned actions share a common representational domain, to the effect that perceptual codes and action codes may prime each other on the basis of their overlap in this domain. The structural view underlying this notion regards event codes as assemblies of feature codes, based on temporary integration in a given task context.

As we will show, TEC is not entirely new. It rather takes up elements from old and forgotten theories and (re)combines them in a novel way. This is particularly true of the ideomotor principle (cf. Greenwald, 1970) and of motor theories of cognition that were so broadly discussed a century ago (cf. Scheerer, 1984) and then fell into oblivion because the theoretical zeitgeist of the new century was input-centered throughout and did not much care about action per se. Precursor versions of TEC and treatments of some aspects of its functional logic have been published before (cf. Aschersleben & Prinz, 1995, 1997; Hommel, 1997, 1998a; Müsseler, 1995, 1999; Prinz, 1984, 1987, 1990, 1997a, 1997b; Prinz, Aschersleben, Hommel, & Vogt, 1995). Its present version has emerged from our collaboration on a number of experimental projects and our efforts to understand the major theoretical implications of their results.

In the third step, we will then apply the framework of TEC to a number of pertinent experimental paradigms. As TEC is more a loose framework of theoretical principles, rather than a strict, formal theory, there is no obvious way to test its validity directly, that is, by deriving predictions from it and then confronting those predictions with data. Instead, an additional step of translation will often be required that boils the principles of the general theory down to task-specific models. As a consequence the general theory can only be indirectly tested: it stands the test to the extent that task-specific models, which embody its functional logic, prove to be valid. Unlike specific models, which can be tested for empirical validity the general theory behind them can only be evaluated in terms of the scope and the variety, of successful models derived from it, that is, in terms of its heuristic and explanatory power. Thus, TEC is thought to serve a similar function as the linear-stage approach à la Donders (1868/1969) and Sternberg (1969), which also does not aim at explaining particular effects or phenomena, but rather provides a general framework and a theoretical language for developing more detailed and testable models and theories.

Most of the studies to which we will refer in the empirical section were originally inspired by TEC. At first glance, they seem to address a variety of disparate tasks like, for instance, sensorimotor synchronization, spatial S-R compatibility, identification and detection in backward masking, dual-task performance, ideomotor action, action perception, action imitation, etc. However, at a second, more theoretically inspired glance, they all address the same problem: mutual interactions between representations of perceived and produced events--and the structure of the functional architecture that takes care of generating, transforming, and storing these events.

We will conclude by pointing out some epistemological implications of our approach concerning the role of the interaction between perception and action planning for the construction of reality.

2. PERCEPTION AND ACTION PLANNING

2.1 Views on Perception

In this section we consider relationships between perception and action planning from the viewpoint of perception and perceptual theory. Perceptual views, though recognizing that perception and action planning may be highly interactive in many natural situations, tend to view perceptual processes in separation, that is, more or less independent from processes of action planning and action control. This is not only true for the study of (presumably) early processes (like, e.g., figure-ground segregation, texture segmentation, or structural organization; cf. Beck, 1982; Hochberg, 1978; Metzger, 1975; Palmer, 1982; Pomerantz & Kubovy, 1981; Restle, 1982), but also for the study of (presumably) later processes in the chain of stimulus-induced operations that are responsible for identifying the stimulus or selecting appropriate responses. In the following sections we shall first examine how the linkage between perception and action is studied and conceptualized within the framework of information-processing approaches to perception (cf. Haber, 1969; Massaro, 1990; Posner, 1978; Sanders, 1980). These basically claim an information stream from perception to action with little contact between the two domains. We then survey views that stress the selective aspect of information processing: In order to perform goal-directed actions, this perceptual selection of certain aspects of environmental information is required, while other aspects are ignored or rejected. In that view, selection mechanisms take their way from action to perception. Finally, we consider perceptual mechanisms as part of a functional architecture whose outcome is adapted action. In this view, perceptual mechanisms are studied with respect to their contributions to people’s or other animals’ orientation in space and time. In this framework, perception and action thus have always been studied in common and in close correspondence.

2.1.1 Information-Processing Views: From Perception to Action

In his well-known paper "On the speed of mental processes," which appeared in 1868, the Dutch physiologist Frans Cornelis Donders established a nowadays widely accepted methodological strategy for decomposing the stream of processing between stimulus presentation and response generation into a number of stages. This method that, according to Donders, allows to compute the time components required by the "organ of perception" and the "organ of the will" (for identifying the stimulus and selecting the response, respectively) is today considered one of the early important steps towards a natural science approach to mental phenomena--a major breakthrough in the early history of psychology. Donders’ paper regained importance a hundred years later when the scientific community rediscovered its interest in the structure of the cognitive operations mediating between stimulus and response, an interest that originated in the 1950s and 1960s and continues until today. Since then, research in this field has seen an enormous growth of knowledge about the factors that determine stimulus processing and response generation, and we have gained a much broader understanding of the nature of the stages involved.

Linear Stage Models. Usually, stage-theoretical views encompass all cognitive processes that are settled along the information stream traveling between receptors and effectors, that is, from early perceptual processing of stimuli to the final execution of responses. Applying linear-stage logic allows to establish a sequence of hypothetical processing stages and to assess their functional properties; it does so on the basis of experiments that study how task performance in speeded information transmission tasks depends on manipulations referring to features of the stimuli and/or demands of the task at hand (Massaro, 1990; Sanders, 1980; Sternberg, 1969).

However, though a large number of cognitive psychologists subscribe to the information-processing approach, only few of them have taken interest in the total information stream between stimuli and responses. On the contrary, subdisciplines like perception, memory, problem solving, motor behavior, and others have preserved their relative autonomy. As far as perception is concerned, this autonomy emerges from the understanding that perceptual processing can be studied without taking later operations into account. For instance, it is generally believed (and it actually follows from the logic inherent in the approach) that stimulus-related processing can be studied completely independent from response-related processing. The reason is that the information stream in perception is commonly considered to be data-driven, in the sense of an uni-directional flow of information from peripheral to central stages.

Perception, Memory, and Action. The interaction between information derived from the stimulus and information stored in memory is a topic of central importance in information-processing models. In fact, it is this feature that qualifies them as cognitive models of perception. Such interaction is inherent in the notion of stimulus identification. It is usually believed that the stimulus gets identified by way of matching a stimulus representation against a set of memory representations. Classical studies have addressed the mechanisms underlying this interaction, raising issues like parallel vs. serial, analytic vs. holistic, or exhaustive vs. self-terminating organization of matches in a number of tasks (Egeth, 1966; Hick, 1952; Neisser, 1963; Nickerson, 1972; Posner & Mitchell, 1967; Sternberg, 1967). Interestingly, these studies focus on processes rather than on contents and structure. They try to elucidate the nature of the operations underlying the interaction between stimulus and memory information, but they are much less explicit on the contents of the memory representations involved and the structural relations between them.

Such neglect of structure and content applies even more when it comes to the memory requirements of the response-related operations subsequent to stimulus identification. Though the classical models do not explicitly specify how responses are coded and stored, they do presume, at least by implication, that response codes reside entirely separate from stimulus codes--stimulus codes standing for environmental events and response codes for body movements. For instance, in a choice-reaction time experiment, where participants press one of two keys (say, left vs. right-hand) in response to the color of a stimulus light (say, red vs. green), stimulus codes stand for colors and response codes for movements. Accordingly, a task like this can only be solved by means of a device that translates, as it were, colors into movements. The translation metaphor, which is widely used to characterize the nature of response selection (e.g., Massaro, 1990; Sanders, 1980; Welford, 1968), thus stresses the incommensurability between stimulus and response.

More Elaborate Flow-Charts. In the 1960s and 1970s the linear stage model was taken literally. Stages were arranged in a row, and a given stage would not begin to work before the preceding stage had done its job (Donders, 1868/1969; Sanders, 1980; Sternberg 1969; Theios, 1975). For example, Sanders (1983) proposed a linear stage model with four stages: stimulus preprocessing, feature extraction, response choice, and motor adjustment. Obviously, the first two stages are concerned with perception, and the last two with response-related processing.

Later on, with growing neuropsychological and neurophysiological evidence, further stages and branches were added and the assumption of the strict linearity of the stage sequence was loosened or given up, particularly in the domain of perceptual stages. Stages were seen to overlap in time, and parallel processing was also considered, at least in certain parts of the information stream. For instance, in vision research a distinction between two separate pathways,or information streams, has been proposed based on a number of neuropsychological and neurophysiological findings, and, as a consequence, the notion of a linear change of processing stages has eventually been replaced by the notion of parallel and distributed processing (Desimone & Ungerleider, 1989; Milner & Goodale, 1995): After an initial common "low-level" feature analysis in the striate cortex (V1) two separate "higher-level" analyses are performed in parallel in the parietal and temporal lobes. The ventral pathway (the "what"-pathway in the temporal lobe) is seen as crucial for cognitive performance like identifying objects (inferotemporal cortex via V4), whereas the dorsal pathway (the "how"- or "where"-pathway in the parietal lobe) is seen as crucial for orientation performance like locating objects (mediotemporal cortex). Milner and Goodale (1995) associated this pathway with the visual guidance of actions (but see, e.g., Merigan & Maunsell, 1993; and Rossetti & Revonsuo, 2000, for considerable criticism). Taken together, this results in an elaborate chart with two parallel streams of sequential stages that serve different perceptual functions and can be assigned to different anatomical structures in the brain.

The standard picture of perceptual information processing that has emerged over the last decades can be summarized as follows: When a stimulus is presented, a number of parallel and serial operations is triggered. Some "early" operations are exclusively driven by input information derived from the stimulus, while the "later" ones are controlled by input information from both stimulus and memory information. The end product of perceptual processing--the percept--eventually emerges from combining the distributed outputs of the various operations involved. In any case, there is no action in perception: Perception comes first, and only then comes action (if it comes at all), and there is no direct way they could speak to each other.

2.1.2 S election Views: From Action to Perception

It was certainly not by accident that, with the advent of the information-processing approach, work on attention had a fresh start (for historical reviews, see Van der Heijden, 1992, and Neumann, 1996). Attentional mechanisms account for the various limitations observed in human performance. Thus, they are assumed to enhance or to inhibit the activity in the various streams and at various stages in the flow of information. The details of attentional theories are assumed to depend on the details of the supposed cognitive architecture.

Capacity Limitations. Correspondingly, early theories of attentional mechanisms completely adopted the linear stage view of information flow. They made an effort to cover various processing limitations, as had become apparent in dual-task experiments. For instance, in the dichotic listening paradigm (Cherry, 1953) listeners who are asked to reproduce two dichotically presented messages can usually report only one of them. The filter theory offered by Broadbent (1958, 1971) to capture these and further findings relied on two major assumptions: First, that there is a basic capacity limitation which is due to limitation in the rate of information transmission at the perceptual stage, resulting in a central bottleneck in the flow of information, and, second, that access to the bottleneck is mediated by a filter-like selection mechanism that rejects certain stimulus features from further processing.

Later on, with more progress in experimental research on selective attention tasks, this view became progressively extended: Single-channel conceptions were contrasted with multiple-channel conceptions (e.g., Pashler, 1989; Welford, 1980), early-selection with late-selection mechanisms (Broadbent, 1958; Deutsch & Deutsch, 1963), capacity-demanding with capacity-free processes (Shiffrin & Schneider, 1977), and specific with unspecific capacity limitations (Kahneman, 1973; Navon & Gopher, 1979). Up to the early 1980s, one main doctrine remained amazingly constant, however: that capacity limitations are an inherent feature of the processing system and that selection mechanisms are required to handle these limitations. Capacity limitation was the central concept, and selection was considered its functional consequence. This doctrine had important implications for the study and the conceptualization of attentional mechanisms (Neumann, 1987, 1996). However, over the last decades this doctrine was challenged by two alternatives at least: the selection-for-action view and the premotor view of attention.

The Selection-For-Action View. According to this view, selection came to the fore and lost its role of being just a (secondary) consequence of (primary) capacity limitations. One reason is that empirical research interests have shifted from dual-task paradigms to studies emphasizing selection, like in visual search (e.g., Treisman & Gelade, 1980) or spatial cueing (e.g., Posner, 1980). In the wake of this shift the functional relationship between capacity limitations and selection has become reversed and eventually replaced by an action-based approach: Selection was no longer seen to follow from system-inherent capacity limitations, but to cause such limitations instead--and to do so for the sake of goal-directed action (cf. Allport, 1987; Neumann, 1987, 1990; Van der Heijden, 1992).

The basic principle behind this view is that any integrated action requires the selection of certain aspects of environmental information (that are action-relevant) and, at the same time, ignore, or reject, other aspects (that are action-irrelevant). In this sense, studies on attention take their way from action planning to perception: Action planning requires selection that, in turn, modulates perception. The requirements of action control force the system to limit its processing and, thus, make it virtually indistinguishable from an intrinsically capacity-limited system.

The selection-for-action view speaks to the issue of what gets selected and which criteria are involved. It is explicit on what information is picked up and selected, but it has not much to say about the representational structures underlying the mediation between action planning and perception and maintaining action-dependent selection criteria. Whatever the nature of these mediating structures may be, they must allow for fast and efficient interaction between stimulus-related and action-related processing.

The Premotor View of Attention. This view also suggests that selection does not result from nor requires an attentional control system separate from action-perception cycles. Rather, selective attention derives from an exogenous or endogenous activation of spatial cortical maps, in which spatial information is transformed into movements. As a consequence of activation there is an increase of motor readiness and a facilitation of stimulus processing at locations toward which the motor program is directed (Rizzolatti & Craighero, 1998; Rizzolatti, Riggio, & Sheliga, 1994). The ’premotor’ connotation of the theory results from the assumption that it needs only the preparation of the motor program (the process we call "action planning") but not necessarily its execution. This allows to integrate observations of covert attentional orienting without the overt action component, for example, when attention is shifted to a stimulus in the periphery during eye fixation.

Originally, the premotor view of attention was only applied to those cortical maps that code space for oculomotor behavior (Rizzolatti, Riggio, Dascola, & Umiltà, 1987). Similar ideas of a close link between oculomotor behavior and covert attention shifts have been discussed before, for example, in assuming that attention shifts precede each goal-directed saccade and its programming (Klein, 1980; Posner & Cohen, 1980; Posner & Cohen, 1984; Rayner, 1984). Nowadays, the premotor view is extended to any map that codes space, that is, to maps that control movements of the head, of the arm, or other body parts (Rizzolatti & Craighero, 1998; Rizzolatti et al., 1994). Nevertheless, the view addresses exclusively and is restricted to phenomena of spatial attention. It does not speak to how the requirements of action planning interact with the processing of other stimulus dimensions than space and it has not much to say about the representational structures underlying the mediation between action planning and perception.

2.1.3 Adaption Views: Perception and Action

Another interesting perspective on the relationship between perception and action can be found in the vast literature on spatial and temporal orientation and adaption (cf., e.g., Andersen, 1988; Colby, 1998; Howard, 1982; Howard & Templeton, 1966; Paillard, 1991; Redding & Wallace, 1997). This perspective refers to the notion that perception and action control make use of shared reference frames with respect to space and time. In particular, it posits shared frames for environmental objects and events and for the actor’s body and his or her movements. These reference frames serve to specify the spatial and temporal relationships between environmental events and bodily actions and, thus, to coordinate one with respect to the other. Though this assumption appears to be a prerequisite for the successful functioning of sensorimotor systems it is not often explicitly stated in the literature--perhaps because it is too evident (see, however, Brewer, 1993; Grossberg & Kuperstein, 1989; Howard, 1982; Redding & Wallace, 1997). If no such shared reference frame existed, one could not explain how animals can reach for a target at a perceived location or catch a flying ball at the time and place of expected collision (Lee, 1976).

The principle of shared reference frames has two faces: unity and diversity. The notion of a fundamental unity between perceptual space and action space is derived from functional considerations. Animals are spatially extended systems that move around in their physical environment. An efficient way to successfully interact with the environment is to furnish them with a representational system that allows to represent the Where and the When of both environmental events and body-related events (actions) in a shared spatio-temporal reference frame. Such a shared representation is reliable in the sense that it allows to coordinate represented actions and represented events (e.g., intended actions with perceived events). Moreover, in order to be valid, this representational system needs to contain veridical information in the sense that it recovers the spatial and temporal pattern of happenings in the distal environment (cf. Brewer, 1993; Prinz, 1992).

Recent literature, however, emphasizes diversity over unity at both the functional and the structural level (Jeannerod, 1983; Paillard, 1991; Redding & Wallace, 1997). For instance, in the functional analysis of visually guided movements one has to distinguish, on the sensory side, between a number of independent spatial maps, or reference frames (e.g., eye-, head-, trunk-related) and transformations mediating between them. The same applies to the motor side where movements can be specified in a number of reference frames, too (e.g., joint-, effector-, trunk-related ones). Electrophysiological recordings have also stressed diversity from a structural point of view, suggesting multiple representations of spatial information in a number of maps at different cortical sites (e.g., Andersen, 1988; Colby, 1998; Ohlsson & Gettner, 1995). Some of these representations can be specified with respect to the body parts to which they are anchored (like eye, hand, arm) or with respect to the action patterns to which they contribute (like grasping, reaching). Some others can even be specified in terms of body-independent, allocentric coordinates (such as object-relative movement directions).

Given such modularity of spatial information processing in both functional and structural terms, it is not too surprising that performance dissociations can be observed in a number of experimental tasks (e.g., Aschersleben & Müsseler, 1999; Bridgeman, Kirch, & Sperling, 1981; Fehrer & Raab, 1962), particularly with respect to the distinction between the dorsal and the ventral stream of processing (cf. Desimone & Ungerleider, 1989; Milner & Goodale, 1995). We must not forget, however, that these rare examples of diversity and dissociation are exceptions from the normal case of unity and association and, thus, integrated action (Milner & Goodale, 1995, ch. 7; Rossetti & Revonsuo, 2000). Unless we understand how the brain solves the problem of the binding of distributed representational activities (across maps and across units within maps) we have no way of understanding how unity and diversity can be combined and how one emerges from the other (Singer, 1994; Treisman, 1996).

Yet, despite the intricacies of the functional architecture and the complexity of the transformational computations involved, there can be no doubt that the system eventually uses corresponding frames of reference for perception and action planning. This functional unity works not only from perception to action--as evident from the high degree of precision with which animals fix the spatial and temporal details of their actions on the basis of the information provided by their perceptual systems--but from action to perception as well. As animals move, perceptual information cannot be interpreted in an unambiguous way without reference to action-related information. This, in turn, requires that these two pieces of information interact with each other within the same reference system. Such cross talk can occur at two levels, compensation and adaptation.

The notion of compensation refers to the fact that in order to interpret any change in the spatial distribution of signals at their receptor surfaces, animals must have a way to compensate for their own body movements. In other words: the system has to take the animal’s body movements into account, before it can use the sensory signal to recover the structure of the environmental layout (cf., e.g., Bridgeman, 1983; Epstein, 1973; Matin, 1982; Shebilske, 1977). For instance, according to the classical reafference model, this is performed by a subtractive operation by which perception (the retinal location signal) gets canceled for the effects of action (saccadic eye movements). Clearly, a subtractive model like this implies at least commensurate (or even identical) representations on the perception and action side (von Holst & Mittelstaedt, 1950; MacKay & Mittelstaedt, 1974).

The notion of adaptation refers to the flexibility of sensorimotor couplings and to the fact that perception can, within certain limits, be educated by action planning. For instance, in studies of distorted vision (Kohler, 1964; Redding & Wallace, 1997; Stratton, 1897; Welch, 1978) it has been shown that such education can work either way: Perception may teach action and action may teach perception at the sametime, again suggesting commensurate or identical representations on both sides ( see also Koenderink, 1990; Van der Heijden, Müsseler, & Bridgeman, 1999a; Wolff, 1987, 1999).

In sum, we conclude that the notion of commensurate or even identical representations and shared reference frames for perception and action is widespread in the literature on spatial orientation. It appears to be a natural notion that requires no explicit defense, and this remains to be the case even in the light of what is known about multiple representations of space in the brain.

2.1.4 Challenges for a Cognitive Theory of Perception

Although the functional autonomy of perceptual processes is implicitly or explicitly presupposed in most theoretical approaches to perception, it has been questioned from time to time. In fact, a number of studies have demonstrated that there is more interaction between perception and action planning than the standard framework would allow.

2.1.4.1 A Role for Action in Perceptual Processing

In the literature different accounts can be found that seem to indicate some impact of action-related processes on perception. Already in the late 19th century Lotze introduced a prominent motor theory of perception with the ’theory of local signs’ (Lotze, 1852; for an historical overview see Scheerer, 1984). It basically claimed that space perception arises from a combination of two sources of information. He assumed, firstly, a qualitative map of visual sensation and, secondly, a quantitative map of metric necessary to fovealize an object. What is perceived was assumed to result from the qualitative visual sensation ’enriched’ by the quantitative map of oculomotor behavior (or vice versa; for modern versions of this idea, see K oenderink, 1990; Van der Heijden et al., 1999a; Wolff, 1999). These notions are used, for example, to account for phenomena of visual localization (e.g., Müsseler, Van der Heijden, Mahmud, Deubel, & Ertsey, 1999; Van der Heijden, Van der Geest, De Leeuw, Krikke, & Müsseler, 1999b).

While in the early motor theories motor processes fulfill a representational function (i.e., they provide the system with a given metric -- an assumption difficult to maintain, cf. Scheerer, 1984), modern views assign to them an adaptive function. What is perceived is influenced by previous motor adjustments and, at the same time, it is a precondition for future adjustments (see also below and, for an overview, Welch, 1986). However, nearly all motor theories deal exclusively with phenomena of space perception. The most prominent exception is the motor theory of speech perception (e.g., Liberman, 1982). We will deliberately exclude this example, partly because the language domain exceeds our scope here, partly because empirical support seems weaker than the elegant theoretical principle suggests (cf. Jusczyk, 1986; Levelt, 1989, ch.11).

Another field where interactions between perception and action can be studied is the perception of human action (Cutting & Proffitt, 1981; Freyd, 1987; Johansson, 1950, 1973; for an overview see Stränger & Hommel, 1996). Much of the evidence from this work can be summarized by concluding that the way people perceive other people’s actions appears to rely on specific representational structures that contain information that goes far beyond the information provided by the actual stimulus. For instance, when people are confronted with traces of handwritten letters, drawings, or body movements they are often capable of inferring the kinematics or even the dynamics of the movements by which these traces were generated (Babcock & Freyd, 1988; Freyd, 1987; Kandel, Orliaguet, & Boë, 1995; Runeson & Frykholm, 1981). These studies suggest that the visual perception of actions and their consequences may draw on action-related representational structures subserving both the generation and perception of action patterns.

Related evidence comes from studies showing that semantic judgments about actions are facilitated if preceded by performing hand gestures that match the to-be-performed action in certain respects (Klatzky, Pellegrino, McCloskey, & Doherty, 1989). A further example is provided by studies on apparent biological motion (Heptulla-Chatterjee, Freyd, & Shiffrar, 1996; Shiffrar & Freyd, 1990). These studies demonstrate that when apparent motion is induced by a stimulus pattern in which the human body forms a constituent part the resulting motion does not always follow the principle of the shortest path (as it would usually do with physical objects). Instead, as if to avoid anatomically impossible movements, the motion takes longer paths and detours.

No less dramatic is the implicit impact of action on perception in a series of elegant studies by Viviani and his colleagues. As had been shown in earlier studies, the velocity of drawing movements depends on the radius of the trajectory’s curvature (Viviani & Terzuolo, 1982). Interestingly, the same lawful relationships also seem to be effective in perception (Viviani & Stucchi, 1989, 1992; Viviani, Baud-Bovy, & Redolfi, 1997). For instance, the velocity of a moving dot seems to be uniform if (and only if) it actually follows the law governing movement production (similar effects can also be found with linear motion; see, e.g., Mashhour, 1964; Rachlin, 1966; Runeson, 1974). This suggests that production-related knowledge is implicitly involved in perceptual processing--at least as far as the perception of actions and action effects is concerned.

Very recently, support for shared mechanisms for action perception and action control has also been provided by neurophysiological studies (for an overview, see Decety & Grèzes, 1999). For instance, Rizzolatti and his group describe "mirror neurons" in the premotor cortex of the monkey. These neurons are active both when the monkey performs a given action or when it observes a similar action performed by the experimenter, which typically must involve an interaction between the agent and an object (Gallese, Fadiga, Fogassi, & Rizzolatti, 1996; Di Pellegrino, Fadiga, Fogassi, Gallese, & Rizzolatti, 1992). Similar findings were obtained in PET-studies with human participants when execution, observation, and imagination of grasping movements were compared (Rizzolati, Fadiga, Gallese, & Fogassi, 1996; Grafton, Arbib, Fadiga, & Rizzolatti, 1996), suggesting a shared representational basis for perception and action planning. The same conclusion is suggested by a study in which transcranial magnetic stimulation was applied during action execution and action observation (Fadiga, Fogassi, Pavesi, & Rizzolatti, 1995).

2.1.4.2 A Role for Action in Perceptual Learning

For obvious reasons the study of perceptual learning has seen more discussion about interaction between action and perception. Learning is usually believed to result from actions and their outcomes, and therefore the study of perceptual learning requires to examine long-term consequences of action on perceptual processing (for overviews see Gibson, 1969; Hall, 1991). While the major part of the literature covers issues of spatial orientation, some part is also devoted to the emergence and improvement of perceptual discrimination in the domain of patterns, objects, and events. However, there appears to be more theory than solid evidence.

One of the classical theoretical positions has become known as the differentiation theory of perceptual learning (Gibson, 1969; Gibson & Gibson, 1955). According to this theory perceptual learning is the art of picking up, or extracting, those invariants of stimulation that are suited to directly specify certain body movements or actions. From its beginning this theoretical program was anticognitivist (or, historically speaking, anti-Helmholtzian) because it relies on the extraction of action-relevant information (presumably contained in the distribution of stimulus energy) rather than computational operations generating internal codes and action programs. The theory had initially been developed to account for the psychophysics of spatial orientation (Gibson, 1950) and was later broadened to account for the ecology of perception in general (Gibson, 1979; cf. Turvey, 1977; Fowler & Turvey, 1982). Central to the theory is the notion of affordances which, roughly speaking, stand for specific combinations of objects and events as taken with reference to their functions for the animal--typically with reference to actions of certain kinds.

Unlike differentiation theory, which believes that actions act back on the early perceptual mechanisms of feature extraction, the theory of associative enrichment posits that perceptual learning occurs at a much later stage. The basic assumption is that when a given stimulus is frequently coupled with a given response the information derived from that stimulus will become associatively enriched with response-produced cues that then will help to discriminate this stimulus from other ones coupled with other responses (acquired distinctiveness of cues; cf. Miller & Dollard, 1941; Postman, 1955). Such response-produced cues could come, for instance, from immediate proprioceptive byproducts of the response as well as its more remote effects.

The ecological approach has been quite successful to accommodate for a number of findings in perceptual learning and development (cf. Gibson, 1969; Thelen & Smith, 1994, ch.8/9)--more perhaps than the enrichment approach which has seen only partial support from experimental evidence (cf. Arnoult, 1957; Cantor, 1965; Hall, 1991). Still, the issue appears to be unsettled. One reason may be that the differentiation theory is broader and less specific than the enrichment theory and can therefore easily account for a variety of facts (and, if necessary, be adapted to accommodate new findings). In fact it is not easy to imagine how it could be falsified at all. Moreover, the truth might lie in a combination of the two theories. Obviously, instead of excluding each other on any logical grounds, they share the common belief that action plays an important role in teaching perception in a specific way. So far, however, they both have not spelled out the details of this teaching procedure in a satisfactory way.

2.2 Views on Action Planning

In the past, theorizing about action has been organized in two separate lines of thought and been shaped by two different conceptual frameworks, namely the sensorimotor view and the ideomotor view. Theories from the sensorimotor approach tend to view actions as responses to stimuli, that is, as following from external causes. Conversely, theories from the ideomotor approach tend to view actions as following from internal causes like goals or goal-related cognitions. In this section we argue that a comprehensive account of the cognitive foundations of action requires combining the two views in a new way.

2.2.1 Sensorimotor Views: From Stimuli to Responses

In the sensorimotor view of action, actions are regarded as re-actions, that is, as responses triggered by stimuli. Strict versions of the approach (like classic behaviorism) claim that such reduction to stimulus conditions is a necessary and at the same time sufficient condition for a full account of action. More liberal versions may also consider a role for additional factors that cannot be traced back to the actual stimulus situation.

Historically, the sensorimotor line of thought has been the mainstream in action theory for decades, if not centuries. This is true of both physiology and psychology. One of the early influential sources to which its origin can be traced back is Descartes’ analysis of the relationship between perception and action. According to Descartes, actions must be understood to be the result of the perception of events (Descartes, 1664). This doctrine has laid the groundwork for the notion of the sensorimotor arc, which ever since has deeply influenced psychological theorizing on action. This can be shown in a number of independent historical and theoretical contexts like, for example, the foundation of reaction time measurement (e.g., Donders, 1868/1969), the formation of the behaviorist program or, more recently, the development of the linear stage theory of human performance (e.g., Massaro, 1990; Sanders, 1980, 1998; Theios, 1975). The fact that sensorimotor theories of action have so far been clearly more successful than ideomotor theories may be rooted in the difference between the explanatory strategies of the two views and their methodological implications. Unlike the ideomotor view, which explains actions in terms of mental causes, the sensorimotor view refers to physical causes. As a consequence, the sensorimotor framework offers a tool for both the study and the explanation of action in physical terms. This dual advantage may explain part of its historical success.

Theories of action address two major systematical problems, learning and performance. Learning theories from the sensorimotor domain tend to adopt a structural stance, explaining how associations between stimuli and responses (or their internal representations) are created and how their strength depends on factors like contiguity, reinforcement, and practice. Further, they address the issue of how these associations interact in networks that link large sets of responses to large sets of stimuli. Conversely, theories of performance tend to adopt a functional stance, modeling the chain of processing stages and operations by which stimulus information is translated into motor commands, to the effect that responses are selected on the basis of the information available.

These theories refer to experimental task contexts with well-defined sets of stimuli and responses (such as instrumental conditioning, paired-associate learning, or disjunctive reaction tasks). In these tasks stimuli and responses are individuated on the basis of the task, including the instructions. Task and instructions specify (i) which stimuli can occur, (ii) which responses can be selected, and (iii) which rules govern the mapping of responses to stimuli. The core assumption shared by all brands of sensorimotor theories of action may be called the stimulus trigger hypothesis. It holds that, at least in such well-structured task contexts, the presentation of the stimulus is both a necessary and sufficient condition for triggering the appropriate response. Accordingly, any action is a re-action: it comes into being as a causal consequence of stimulation.

2.2.2 Ideomotor Views: From Goals to Movements

In contrast to sensorimotor views of action, ideomotor views stress the role of internal (volitional) causes of action and at the same time disregard the role of external (sensory) causes. Correspondingly, the evidence ideomotor theories are grounded on does not come from stimulus-controlled reaction tasks but rather from more open situations where individuals pursue certain goals and, from time to time, perform certain actions in an attempt to approach or achieve them. In this view, actions are considered creations of the will--events that come into being because people pursue goals and entertain intentions to realize them.

Historically, ideomotor views to action have entered the scientific discussion much later than sensorimotor views. This is obviously due to the fact that they deal with latent, internal rather than manifest, external causes of action. Moreover, the role of these causes appears to be different. Unlike a stimulus that may be regarded as a direct cause for the responses to follow, a goal appears to be a causal determinant that, at first glance, seems to work backward in time. Lotze (1852), Münsterberg (1888), and James (1890) were the first to solve the puzzle, at least in principle. Their solution was based on a distinction between the goal state itself (as achieved through the action--and, thus, following it in time) and its cognitive representation (as formed before the action--and, thus, potentially involved in its causation).

Though this move solved the puzzle of backward causation it did not help much in bringing goals to the forefront of action theory and assigning to goals (or their representations) the same status and dignity that everybody (and every theory from the sensorimotor domain) would find natural to assign to stimuli (or their representations). Again, what remains is a difference in methodological status: Stimuli are directly observable entities, and therefore stimulus information is easy to record and/or manipulate. Goal representations, however, are unobservable entities that cannot be recorded and manipulated that way. If there is no solution to this problem, the study of the role of goals must, for principled reasons, be much more delicate than the study of the role of stimuli for action. As a consequence it is not too surprising that we do not possess a comprehensive conceptual framework for understanding the role of goals in action.

This is not to say that goals have been completely ignored in action research. First, in the motor control literature, a substantial body of knowledge has been accumulated on target-related performance such as goal-directed hand movements (cf. Jeannerod, 1988, 1990) and saccadic eye movements (cf. Carpenter, 1988; Kowler, 1990). Here, targets may be viewed as goals that are specified in terms of spatial and temporal motor coordinates. Yet, in most of these paradigms the targets (or some aspects thereof) can be perceived in the same way as the triggering stimuli. Therefore this literature is only of limited use for ideomotor theories of action whose main point is to clarify the role of internal goals.

Second, the study of goal selection and goal implementation has ever been one of the key issues in theories of motivation and volition (e.g., Ach, 1905; Bargh, 1996; Gollwitzer, 1996; Lewin, 1926; Mischel, 1996). These theories have much to say about the internal dynamics of processes that precede the action. Yet, most of them are virtually silent when it comes to the details of the operations by which goal representations contribute to action planning proper.

Third, more general theories are available on the nature of the mechanisms for goal-directed control of cognition and action. For instance, as far as cognition is concerned, production-rule-based computational models like ACT-R (Anderson, 1993), or EPIC (Meyer & Kieras, 1997) provide a central role for goals and goal representations in their architecture. Conversely, as far as action is concerned, there has been a long-standing debate on the role of goal representations in learning and performance, that is, on how people learn to build up representations of future goals from consequences of previous actions and how these goal representations later become involved in action control (e.g., Ach, 1905; Greenwald, 1970; Hull, 1952; Konorski, 1967; Miller, Galanter, & Pribram, 1960). Still, none of these approaches has provided a sufficiently detailed framework for the mechanisms underlying the formation and operation of goal representations in action control--sufficient in the sense that they can be spelled out in terms of the involved cognitive structures and mechanisms.

In summary, it seems that goals have not lost much of the elusive character they used to have in the times of Lotze, Münsterberg, and James--at least as far as their role for action control is concerned. Ideomotor theories of action are much less developed and elaborated than sensorimotor theories with respect to both performance and learning.

In the ideomotor domain, theories of learning tend to adopt a structural stance, too, explaining the formation of linkages between body movements and their consequences. In order to lay the ground for the explanation of voluntary action these linkages must meet two requirements (cf. Prinz, 1997a). First, following James’ (1890) analysis of ideomotor action, these linkages need to include both resident and remote effects. Resident effects refer to the mandatory reafferent consequences that go along with performing certain body movements. The perception of resident effects is therefore based on body movements, for example, an arm movement required to throw a ball. Remote effects refer to the various environmental consequences that may follow from performing certain movements. The perception of these effects is based on environmental events, like, for example, a ball bouncing against a wall. Second, though the learning refers to linkages between movements and their effects, the result of this learning needs to be organized in a way that allows to use the linkages the other way round, that is, go from intended effects to movements suited to realize them.1

This is where performance theory of voluntary action comes into play. If one takes for granted that the links between movements and effects can be used either way, a simple conceptual framework for the functional logic of voluntary action offers itself. This framework suggests that actions may be triggered and controlled by goal representations--that is, representations of events the system "knows" (on the basis of previous learning) to be produced by particular movements.

The core assumption shared by various brands of ideomotor theory may be called the "goal trigger hypothesis". It holds that goal representations that are functional anticipations of action effects play a crucial role in action control (Greenwald, 1970, 1972; James, 1890; Lotze, 1852; Prinz, 1987, in press). For instance, Lotze speaks of "Vorstellungen des Gewollten" (image of the intended events; Lotze, 1852, p. 301) and James of the "bare idea of a movement’s sensible effects" which serves the function of a "sufficient mental cue" to the movement itself (James, 1890, p. 522). Given this crucial role for goal representations it is natural that, according to ideomotor theory, the proper way to individuate actions is neither in terms of stimuli nor responses but in terms of goals and goal representations.2

2.2.3 Challenges for a Cognitive Theory of Action Planning

2.2.3.1 Combining the Two Views

Obviously, a full account of the cognitive foundations of action control requires combining the two lines of thought. Action depends on both external and internal causes, and this needs to be reflected in a combined theoretical framework.

(Most) Reactions are Voluntary Actions, too. Consider, as an example, a disjunctive reaction task where, on each trial, the participant selects one out of several reactions in response to the presentation of one of several stimuli. As was indicated above, performance in such tasks is usually analyzed as if response selection is more or less fully determined by the stimuli presented. This, however, is obviously not the case: The presentation of the stimulus is necessary, but it is certainly not a sufficient condition for the response to occur. Nothing will happen upon stimulus presentation until the participant has been instructed to respond in a particular way and he or she is willing to do so. In other words, in order for a response to occur, two conditions must be fulfilled: There must be an appropriate stimulus and an appropriate intention to respond to that stimulus in a particular way (Hommel, 2000a, 2000b). This may be trivial to state--and everybody knows it--but theories in the sensorimotor domain have never really recognized this fact. It is not the stimulus that presses the response key in the reaction task. What is rather required as underlying causal structure is a disposition to press a particular key under particular stimulus conditions.

Voluntary Actions are Reactions, too. Consider, on the other hand, voluntary actions like opening a door or going to the movies. For these examples, too, it is trivial to state that the action is not only dependent on internal causes (goals) but on external causes as well. Whatever the goals in these examples may be, they do not by themselves specify the detail of the actions suited to realize them. All of these details need to be specified by taking the actual pattern of a number of external factors (stimuli) into account. Motivational theory has to some extent given recognition to this fact--for instance in the claim that, in order to perform voluntary actions, internal intentions need to interact with external opportunities--to the effect that both are required to realize the action (Lewin, 1926). Again, it is obviously not the goal itself that opens the door but rather a disposition to realize a certain goal under certain circumstances and to fix the details of the action in response to the details of the circumstances.

Meeting the Challenge. In summary, then, in order to account for stimulus-triggered response performance, we need to develop a novel conceptual framework that treats reactions as goal-directed actions. Actions and reactions need to be regarded as segments of body movements that are individuated on the basis of goals--for instance, like in throwing a ball in order to hit a target or in pressing a key in order to switch the light on. In other words, actions are, by definition, structures that link movements to goals--and vice versa (Prinz, 1997b). Of course, it may not always be easy to conceptually distinguish movements and goals. The distinction is easy to apply to cases where the goal lies beyond the movement and is clearly distinct from it, like in the two examples just mentioned. It is less obvious how it can be applied to cases where the goal appears to reside in the movement itself, like in pressing a response key in a laboratory task. Still, there remains a difference: there is a movement first (finger and key going downward) and a goal state resulting from that movement second (the key being pressed down).

This view opens an interesting perspective not only on goals themselves, but also on the interaction between goals and stimuli as internal versus external causes of actions. With this view, goals and stimuli are commensurate, as they both refer to environmental events--events that are going on and are being perceived in the case of stimuli and events that are being planned and to be effectuated in the case of goals. As we will discuss in the Section "The Theory of Event Coding," one of the central assumptions of TEC will indeed refer to the notion of a common representational domain for stimuli and goals, or perceived and intended events, respectively.

2.2.3.2 Action Induction

One of the major empirical challenges that a theory of the cognitive foundations of action control has to meet comes from action induction. By this term we refer to a variety of observations suggesting much closer functional links between perception and action planning than the standard frameworks provide--links that appear to be based on inherent similarity between perception and action planning rather than on acquired arbitrary connections.

Imitative Action. There is a large variety of imitative actions, ranging from seemingly simple imitations of elementary orofacial gestures in newborns and infants (e.g., Meltzoff & Moore, 1977) to instances of observational learning of habits, attitudes or even traits in adults (Bandura, 1977; Bandura & Walters, 1963). Correspondingly, a number of functional distinctions have been proposed in the literature (Scheerer, 1984). Still, there is one feature common to all varieties of imitative action; namely that the imitator’s action resembles the model’s action in one or the other respect. In the context of social learning this resemblance may refer to the action’s eventual outcome (like wearing the same clothes like the model), whereas in the context of skill acquisition it may refer to the kinematics of the movement pattern itself. Theories of imitation have treated the issue of resemblance in two different ways (Prinz, 1987). One way is to consider it a by-product emerging from the operation of mechanisms that are based on contiguity and reinforcement rather than similarity (e.g., Gewirtz & Stingle, 1968). The other way is to consider similarity the functional basis of imitation--functional in the sense that imitative action occurs by virtue of similarity between perception and action (e.g., Meltzoff & Moore, 1977; Piaget, 1946). In any case the theoretical challenge inherent in the ubiquitous occurrence of imitative action is still unresolved. How is it possible that the mere perception of certain actions in another person can give rise to similar actions in the perceiver? How can similarity be effective between what s/he perceives and what s/he performs?

Sympathetic Action. When a person watches a scene in which he or she is deeply involved, sympathetic actions may sometimes be observed in this person--sympathetic in the sense that they are clearly related to the happening in the scene and appear to be induced by them. Sometimes the term ideomotor action has also been used to denote a particular class of induced actions. Yet, as this term has also been used in a much broader sense (Carpenter, 1874; James, 1890), we suggest the term of sympathetic action in order to avoid confusion (Prinz, 1987). Sympathetic action is clearly different from imitative action. First, to quote a distinction from Katz (1960), sympathetic action tends to be synkinetic (an immediate on-line accompaniment of perception) rather than echokinetic (a delayed off-line follower). Second, sympathetic movements often occur without, sometimes even against the spectator's will. Third, despite their involuntary character, they appear to be strongly dependent on his/her intentional involvement in the scene being watched, suggesting the seeming paradox of an involuntary action that is still under intentional control. Until recently, the details of the relationship between the happenings in the scene and the sympathetic movements accompanying them have not been systematically studied (see, however, Knuf, Aschersleben, & Prinz, 2000). Whatever this relationship may be - sympathetic action also suggests closer links between perception, action, and (tacit) intention than the classical frameworks assume.

Synchronous Action. When exposed to rhythmic sounds many listeners find it easy to dance or carry out other rhythmic movements in accordance with the sound pattern, and some will even find it hard to suppress such synkinetic accompaniment of the sound pattern they are exposed to. As Fraisse (1980) has pointed out, the action-inducing power of such auditory-motor synchronizations is particularly strong, suggesting a special and privileged link between perception and action. In movement synchronization the action-inducing power of perception is restricted to the temporal domain. The perceptual event itself does not specify which movements to perform and which limbs to use. However, once these choices have been made by the listener the timing of the movements is captured by the structure of the sound pattern--provided that it exhibits sufficient regularity to allow for anticipation of temporal structure. Synchronized action is related to both imitative and sympathetic action. With imitative action it shares the feature that the action’s structure is modeled after the structure of the perceptual event. With sympathetic action it shares the features of synkinetic accompaniment and spontaneous, involuntary occurrence. Again, the resemblance between perception and action suggests that one induces the other by virtue of similarity.

Compatible Action. Spatially compatible action can be considered the spatial counterpart of synchronized action. Effects of spatial compatibility between stimuli and responses can be shown in simple experimental demonstrations. A simple example is provided by a choice reaction task with two stimuli and two responses where on each trial a stimulus light is flashed, to the left or to the right of a fixation mark and one of two response keys is operated in response to the stimulus, either the left one or the right one. A setup like this allows for two tasks differing in the assignment of responses to stimuli. When the assignment is spatially compatible, stimuli and responses share a common spatial feature (both left or both right)--in contrast to the incompatible assignment where they will always exhibit two different spatial features (right—left or left—right). As has been shown in a large number of experimental studies, response performance for compatible assignments is clearly superior to incompatible assignments (in terms of response times and error rates; cf. Fitts & Seeger, 1953). Though it may not be impossible to account for the effect in terms of practice and contiguity, it is more natural to suggest that responses can be pre-specified by stimuli on the basis of shared features (e.g., Greenwald, 1970; Kornblum, Hasbroucq, & Osman, 1990). This is not far from concluding that perception induces action by virtue of similarity--and from raising the question of how such induction may occur.

2.3 Views on Perception and Action Planning

Obviously, then, in order to achieve a deeper understanding of the mechanisms of perception and action planning, we need t o come up with a more integrated approach, recognizing the intimate relationships between perception, cognition, and action planning. In other words: We need to argue against what MacKay, Allport, Prinz, and Scheerer (1987) have called the separate-and-unequal approach to perception and action--a view that has been challenged several times (e.g., Decety & Grèzes, 1999; Gallese et al., 1996; MacKay, 1987; MacKay et al., 1987; Neisser, 1985; Turvey, 1977; Viviani & Stucchi, 1992; von Hofsten, 1985) but still dominates the theoretical discussion.

On the one hand, theories of perception need to meet the challenge from various forms of interaction between perception and action planning in processing and learning. On the other hand, theories of action planning need to meet the challenge of providing roles for similarity between perception and action planning as well as the operation of goals in action control. We believe that the time has come to meet these challenges. Recent evidence from a number of fields has provided support for strong linkages between perception and action, for example, brain imaging (Decety, in press; Jeannerod, 1997, 1999; Passingham, 1993), single cell recording (Di Pellegrino, Klatzky, & McCloskey, 1992; Rizzolati et al., 1996; Perrett et al., 1989), executive functions (Allport, Styles, & Hsieh, 1994; Monsell & Driver, in press), voluntary action (Bargh & Barndollar, 1996; Gollwitzer, 1996; Hershberger, 1989), imitation (Meltzoff, 1995; Meltzoff & Prinz, in press; Nadel & Butterworth, 1999; Prinz, in press), and conditioning (Rescorla, 1991). Before we unfold our own ideas on these matters (Section 3) we will briefly sketch two major views on related issues.

In a way, interactions between perception and action are at the heart of the ecological approach to perception and action, as advanced by Gibson (1977, 1979; cf. Michaels & Carello, 1981; Reed, 1982, 1996). According to this approach, a particular role in the mediation between perception and action is played by the notion of affordances. Affordances specify aspects of the environment with reference to the animal’s body and its action capabilities, like, for instance, the "climbability" of a staircase and the "sittability" of a chair. A further claim is that the invariants that inform the animal about the action potential of its environment are inherent in the stimulus array. As a consequence, there is no need for any elaborate processing of the stimulus information. Instead, the animal’s option is to take or leave the invitations inherent in the actual affordances. When it decides to take one of them, action gets tuned to perception, and the details of the upcoming action get directly specified by the information picked up from the environment (cf., e.g., Reed, 1993).

Ecological approaches have introduced important novel perspectives to the study of perception and action. One is that they take the animal’s body and its actions into account as a reference for both the detection of affordances on the perceptual side and their use for the formation of coordinated movement patterns on the action side. At the same time they stress the importance of action for perception, that is, the instrumental role of movements for the pick-up of complex invariants of stimulation. Another important novel perspective is that they adopt a realistic and, thus, anti-computational stance, believing in direct (Gibsonian) detection of information rather than indirect (Helmholtzian) processing.

Cognitive views on perception and action differ from ecological views with respect to both scope and explanatory principles. As regards scope, cognitive approaches focus on the contributions of memory systems and stored knowledge to perception and action. Unlike ecological approaches, which emphasize the role of affordances through which action gets directly tuned to perception, cognitive approaches consider the role of instructions and intentions for the formation and implementation of task-specific cognitive dispositions, or task sets. What they try to explain is action planning, that is, the individual’s ability to select, prepare, and initiate arbitrary voluntary actions in response to arbitrary environmental events, and to do so on the basis of rules that may even change from moment to moment.

As regards explanatory principles, cognitive approaches differ from ecological approaches in two basic ways. First, instead of individuals’ physical bodies and their action capabilities, they consider their knowledge structures as the central reference for both perceptual analysis and action planning. Second, and related to this, they adopt a constructivist and, thus, representational stance, emphasizing information processing rather than detection.

We hold that a full theory of perception and action will eventually have to speak to the relationships both between affordances and movements and between perceptual events and goal-directed actions. TEC is not meant to offer the full theory. Instead, we focus on core issues inherent in cognitive views, that is, the representation of events and the planning of voluntary actions. For these issues we offer a framework that views perception and action as closely coupled as the ecological approach has claimed for movements and affordances.

3. THE THEORY OF EVENT CODING

We argue that a new conceptual framework is needed for a better understanding of the relationship between perception and action planning, and we believe that TEC offers this framework. TEC is based on the central notion that perception, attention, intention, and action share, or operate on, a common representational domain, a notion we will specify and discuss in this section. In constructing TEC, we have drawn on many ideas from other theoreticians especially, of course, those emphasizing the intimate relationship between perception and action planning. For instance, we share the general perspective of Dewey (1896) and Gibson (1979) that perception and action are functionally linked and that it is only their coordination that allows for adaptive behavior. We further adopt the notion put forward by Greenwald (1970), James (1890), and Lotze (1852), that action control is anticipatory, that is, controlled by representations of intended action effects. And we also follow Allport (1987) and Singer (1994) in assuming that representations of perceptual contents and action plans are content-specific composites of codes presumably stored in a distributed fashion.

In our view, TEC forms the core of what one may call a (Meta-) Theory of Perception and Action Planning, thus, a framework that would allow a fully integrated view on a substantial part of human information processing. In this section, we will describe the theory as developed so far. In particular, we will give a commented list of the central assumptions underlying and making up the theory, and describe the basic anatomy of our major theoretical construct: the event code.

3.1 Event Coding: Basic Principles and Assumptions

It is fair to say that how stimulus and response representations are related has not been a major concern of information-processing approaches to human cognition. With very few exceptions, the basic idea underlying most approaches is that stimulus codes are formed in some "perceptual" domain and response codes in some "response" or "motor" domain, and without"voluntary," "intentional," or "controlled" translation processes there is not much contact between them (see, e.g., Rosenbloom & Newell, 1987; Sanders, 1990; Teichner & Krebs, 1974; Theios, 1975). Only recently, a number of so-called dual-route models have been suggested that allow for both "voluntary" translation of stimulus into response codes and some kind of direct, automatic, and stimulus-induced activation of response codes (e.g., De Jong, Liang, & Lauber, 1994; Kornblum et al., 1990; for an overview, see Hommel, 2000a). However, even in these models the relationship between stimulus and response representations is not described in any detail. In fact, the assumptions are usually restricted to postulate that some kind of interaction exists (typically asymmetric effects of stimulus on response processes, not vice versa), but it is not explained why this is so and how it works. As we will show, TEC provides a promising starting point for developing such explanations. But let us begin with discussing the basic principles and assumptions underlying our framework.

3.1.1 Common Coding of Perceptual Content and Action Goals

In contrast to previous approaches to human information processing, we do not share the seemingly obvious (though usually implicit) assumption that perceiving a stimulus object and planning a voluntary action are distinct processes operating on completely different codes. We claim that perceiving and action planning are functionally equivalent, inasmuch as they are merely alternative ways of doing the same thing: internally representing external events (or, more precisely, interactions between these events and the perceiver/actor).

There are obvious objections to our claim: Isn’t perceiving a rather passive mode of merely registering the properties and things of our environment that our actions seek to actively change? In our view, this kind of characterization overlooks that, first, perceiving the world is a process of actively acquiring information about the perceiver-environment relationship, including all sorts of movements of eye, hands, feet, and body, particular allocations of attention and other cognitive resources, and so forth, and that, second, action would actually run blind without being perceptually informed about its bodily and environmental preconditions, its progress, and its consequences (Dewey, 1896; Gibson, 1979). That is, the process of perceiving both presupposes and affords active behavior and performing an action both relies on and produces perceptual information. In that sense, perceptual or stimulus codes and action or response codes all represent both the result of, and the stimulus for, a particular sensorimotor coordination. If so, there is no theoretical reason to draw a conceptual distinction between anticipating a perceptual event and planning an action or between actually perceiving and carrying out an action plan.

Moreover, as already touched on above, voluntary actions can be seen to come into being by anticipating their distal effects (cf. James, 1890; Lotze, 1852). This also implies that perceived events and action-generated distal events are coded and stored together in one common representational domain (Prinz, 1990). Above all, action-generated effects include body-related afferent information, that is, anticipated proprioceptive feedback, but they can also contain visual information about the anticipated position of the arm during and/or after a movement. In motor control literature this view of representing actions in terms of their action goals is widely spread (Jeannerod, 1999; Rizzolatti, Fogassi, & Gallese, 1997; Viviani & Stucchi, 1992). However, TEC is much more radical in that it is based on the assumption that action effects could refer to any kind of response- or action-contingent events (see also Hoffmann, 1993; Hommel, 1997; Meltzoff, Kuhl, & Moore, 1991). In other words, switching on a light will not only produce a body-related tactile and kinesthetic feedback at the hand, but also an afferent visual feedback from the light emissions of the bulb, which represents an action effect as well. Accordingly, TEC is open to explain a much wider range of phenomena, as we will see below.

3.1.2 Feature-Based Coding of Perceived and Produced Events

In theories on perception, attention, and memory it has become common practice to think of stimuli being represented by composites of feature codes (Allport, 1987, 1993). To a considerable degree, this owes to the concentration of research on the visual modality, which again is motivated by the recent progress in understanding the neural basis of visual perception. As we know by now, visual information is projected to several areas throughout occipital, temporal and parietal lobes, partially following anatomically distinct pathways (e.g., Cowey, 1985; DeYoe & Van Essen, 1988), and there is no indication that all information belonging to a stimulus or object would converge onto some common "grandmother cell." Instead, different stimulus features coded in different cortical areas seem to be integrated by coordinating the codes representing them. This may be done by modulating and synchronizing the temporal behavior of neural feature codes (for overviews, see Singer, 1994; Treisman 1996), but for TEC any other neurophysiological integration mechanism may do just as well.

If we combine the common assumption that stimulus representations are composites of feature codes with our claim that representations of perceptual and action events are of the same kind, an obvious conclusion follows: Action plans should also be made of temporarily composites of action-feature codes. The assumption of a composite action representation is not entirely new. In some sense, the seed for it was already set in the theories of Adams (1971) and Schmidt (1975) on motor learning and in Turvey’s (1977) considerations on action control. These authors claimed that the representation of a particular action is not a single, undividable whole, and not as low-level and muscle-related, as Keele’s (1968) earlier definition of a motor program might have suggested. Instead, it comprises at least two different parts or structures, such as the perceptual trace and the memory trace in Adams’ closed-loop model, or the parameters and invariants in Schmidt’s schema theory. Later approaches, as that of Rosenbaum (1980), further developed the idea that response planning involves the specification of action features, and a number of authors have pointed out that these features are coded in distinct functional systems located in different areas of the human brain (Allport, 1993; Jeannerod, 1997; Keele, Cohen, & Ivry, 1990). That is, actions seem to be represented in a way that is at least very similar to how visual objects are represented. If so, the principles underlying the organization of perceptual and action-related information should be comparable and, in fact, it has been suggested that elements of action plans may be recruited and temporarily bound by similar mechanisms as elements of object representations in perception (Murthy & Fetz, 1992; Singer, 1994; Stoet & Hommel, 1999; see section on Activation and Integration of Feature Codes).

3.1.3 Distal Coding of Event Features

We claim that the cognitive codes that represent perceptual objects are identical to those representing action plans because both kinds of code refer to external, that is, distal events. Importantly, this logic only works if we assume that the respective distal codes represent distal attributes of the perceived event (Heider, 1926/1959, 1930/1959; Brunswik, 1944) and/or produced event, but not proximal effects on the sensory surface or muscular innervation patterns (cf. Prinz, 1992)3. Consider, for instance, a person reaching for a cup of coffee standing in front of her. One of many possible ways to analyze this situation would be to conceive of the cup as stimulus and of the reaching movement as suitable response. Clearly, a successful response requires that several features of stimulus and action plan match: The intended traveling distance of the hand should be identical with the perceived hand-cup distance, the intended grip should reflect the perceived size of the cup, and the spatial movement goal should be identical with the cup’s perceived location. According to our considerations, such a task is easy because stimulus and to-be-performed response share a large number of features. As action planning mainly consists in specifying and integrating the codes representing the intended action features, and as these codes are already activated in the course of perceiving the stimulus, there is not much more to be done (see below). Note, however, that distance, size, and location of stimulus and response only match with regard to a distal description of the environmental layout, but not in terms of the particular neural codes or activation patterns by which it is represented. In fact, there is no way in which the sensory code representing a particular spatial distance would be similar to the muscular innervation pattern driving the hand over the same distance, suggesting that a match or mismatch between stimulus- and action-related codes can only be assumed on a more abstract distal-coding level, and it is this level our approach is referring to.

Distal coding of stimulus objects and action plans has several obvious advantages. First of all, it allows perception and action planning to abstract from domain- and modality-specific (e.g., visual, kinesthetic, or muscular) coding characteristics and refer instead to an event’s informational content (for an elaboration on this theme see Prinz, 1992). An important implication of the distal-coding notion refers to the number of feature codes available in the coding system and, thus, the grain of possible discriminations in perception and action planing. While the inventory of proximal codes may be of a fixed size, such as with feature detectors in the visual modality, the resolution of distal coding is virtually unlimited. Even though, for instance, the contribution of the auditory modality to frequency perception (i.e., its proximal resolution) is limited, listeners may be able to increase their judgmental abilities by learning to consider other, nonauditory information, such as vibration cues or pain-receptor responses. Similar strategies can help to increase the accuracy and resolution of motor responses, such as when artificial feedback is used to fine-tune a manual response or to gain control over hitherto autonomous functions. Therefore, our claim of feature-based coding should not be taken to mean that the number of feature codes--and, thus, the number of possible discriminations--is given and fixed from birth. Even if this may be so with sensory and motor codes, there is no reason to believe that it is also true for the distal codes TEC is dealing with.

3.2 Anatomy and Characteristics of Event Codes

3.2.1 Codes of Event Features

TEC’s core concept is the event code, which again consists of the codes that represent the distal features of an event (or, as a short hand notation, feature codes). Feature codes are not specific to a particular stimulus or response, but do both register sensory input from various sensory systems and modulate the activities of various motor systems. As feature codes refer to distal event features, they rely on proximal information, but they do not necessarily underlie the same limitations than a given proximal code. For instance, many features of environmental events are available through more than one sensory modality, so that limitations of one modality can be compensated by considering information from another modality, and it is feature codes that integrate these informations.

Through integrating information from multiple sources, including memory, distal feature codes can become more complex than proximal codes, which are restricted to a particular sensory channel and reflect the characteristics of particular feature-detection receptors. Therefore, we assume that the dimensions distal feature codes refer to need not always be as simple as color or shape, but can also be as complex as "sit-on-ableness", to take one of the standard "affordances" of Gibsonian approaches. Even time and change might be represented by feature codes, so that events like a leftward motion can be coded.

Also important, feature codes are not simply given but evolve and change through the perceiver/actor’s experience. A color will not, or not only and always, be coded as RED, say, but the perceiver/actor will (at least be able to) learn to distinguish several tones of red, such as CRIMSON, ORANGE, and ROSY-RED. As a consequence, a formerly single feature code will become differentiated into a larger number of codes. Likewise, a particular action might not, or not only and always, be coded as LEFT, but one will be able to learn distinguishing, say, LEFT-OF-BODY, LEFT-OF-RIGHT-INDEX-FINGER, and LEFTWARD. In other words, discrimination learning will lead to a continuous change in the ability of a perceiver/actor to represent his or her interactions with the environment. However, we claim that these changes will take place at the level of distally defined feature codes.

A strongly simplified application of our view is given in Figure 1, where two feature codes (f1 and f2) receive input from two sensory systems--say, the visual and the auditory system--and affect the performance of two motor systems--say, hand-movement and speech control.

Figure 1. Feature coding according to TEC. In the example, sensory information coming from two different sensory systems (s1, s2, s3, and s4, s5, s6, respectively) converges onto two abstract feature codes (f1 and f2) in a common-coding system, which again spread their activation to codes belonging to two different motor systems (m1, m2, m3, and m4, m5, m6, respectively). Sensory and motor codes refer to proximal information, feature codes in the common-coding system refer to distal information.

 

Let us assume that, from a perceptual point of view, f1 represents the fact that a particular stimulus tone appears to the left side of the perceiver/actor’s body, while f2 represents the, say, high pitch of this tone. Although some spatial information from the auditory system (coded by s4) is used for coding tone location, visual information (coded by s1 and s2) about the (apparent) auditory source may also be considered--a theoretically nonessential attempt of ours to model the well-known phenomenon of visual dominance in spatial perception (e.g., Posner, Nissen, & Klein, 1976). The perception of pitch is mainly driven by auditory information (coded by s5 and s6), but it may be facilitated by some visual information (coded by s3), such as cues indicating the presence of a violin.

Let us now consider how an action plan is made. Planning an action does not involve specifying every single muscle activity in advance, but is restricted to constraining and modulating sensorimotor coordination to achieve the intended class of movements (Greene, 1982, 1988; Turvey, 1977). In other words, action control deals with the intended outcome of an action, not with the particularities of the movement or the sensorimotor interplay producing that outcome. According to TEC, an action plan is made of several feature codes, with each code modulating a particular aspect of sensorimotor coordination. To take our example, f1 might control the "leftness" of actions by influencing the activity of motor codes in the hand system (say, m1 and m2) to drive the hand leftwards, or towards a left-hand target object. At the same time, the very same code may also bias other motor systems to produce "left" events, such as the speech system (m4) to say "left"4 or the eye system (not shown) to drive the eyes leftwards, thus enabling and supporting multi-effector coordination. Usually, specifying a single feature of an intended action does not suffice, so that action plans will include a number of feature codes. In our example, the code f2 might bias actions to produce "high" outcomes by activating motor code m3 in the hand system to drive the hand upwards (e.g., resulting in a hand movement to an upper-left target location) and by affecting parameters m5 and m6 in the speech system to produce a word in a high pitch (e.g., resulting in uttering the word "left" in high voice).

In the proposed common-coding system several kinds of interactions between perceptual and action-related processes are expected, especially if the features of perceived and to-be-produced events overlap. It can be seen from Figure 1 that perceiving an object possessing particular features (e.g., a high-pitch tone) will prime those actions that produce the same features (e.g., speaking in a high voice), and vice versa. Of course, interactions are also expected between processes dealing with different, but feature-overlapping, perceptual events (e.g., a tone and a light on the left side) or actions (e.g., moving the hand and the foot to a left-side target). Even negative effects between different perceptions and different actions or between perception and action are possible, as we will discuss in the next section. Importantly though, these interactions are not due to the characteristics of particular sensory or motor codes, or to some direct interplay between them--the only thing that matters is whether or not they are mediated by the same feature code in the common-coding system. In other words, perceptual and action-planning processes only interact if the codes they operate on refer to the same (kind of) feature of a distal event.

3.2.2 Activation and Integration of Feature Codes

Each event code consists of several feature codes representing the attributes of the perceived or planned event. For instance, perceiving a cherry will result in the activation of those feature codes that represent the attributes RED, ROUND, and SMALL, among many others. We have already pointed out that merely activating a particular feature code will prime all those events it shares features with, so that registering the cherry will facilitate perceiving other red, round, and small objects, or performing actions directed towards, manipulating, or producing events possessing these features. This logic works either way, so that selecting the features of a to-be-planned action will facilitate both the perception and the production of other events the planned action shares features with.

However, the mere activation of feature codes does not yet make an event code. What if our cherry comes with an apple, which would also be round, but neither red nor that small? Registering the apple should activate the feature codes GREEN, ROUND, and BIG (relatively speaking), so that five feature codes would now be active. How would the system be able to tell, for instance, that the BIG code belongs to the GREEN, not the RED code? Obviously, some kind of integration mechanism is required that binds those feature codes together that have been activated by the same event. For the visual domain, several authors have suggested that feature binding is achieved by temporally coupling or synchronizing the activation of feature codes (for overviews see Abeles, 1991; Singer, 1994; Treisman, 1996). Whatever the details of this mechanism may be, an increasing number of studies provides substantial evidence that feature binding in perception does occur. For example, Kahneman, Treisman, and Gibbs (1992) have shown that repeating a particular stimulus is only of advantage if its relative location is also repeated, which strongly suggests that form and location codes of a stimulus object are bound together. Other studies found very similar effects with other stimuli, features, and tasks (Gordon & Irwin, 1996; Henderson, 1994; Henderson & Anes, 1994; Hommel, 1998b), which indicates that feature binding is a rather general phenomenon.

Binding problems are not restricted to the perceptual domain. Assume, for instance, a person is planning to move his/her right hand upwards to pick an apple from a tree and, at the same time, plans a downward movement with his/her left hand to catch the apple should it fall. To simplify matters, let us assume that only four discriminative codes are involved in specifying these actions: a LEFT and a RIGHT code, an UPWARD and a DOWNWARD code. If action planning consisted of just activating these codes, much confusion would arise in this situation, because it would be impossible to tell whether it is the left or the right hand that needs to be moved upwards or downwards, thus, whether the LEFT (or RIGHT) code goes with the UPWARD or the DOWNWARD code. This is the same binding problem as discussed for the representation of perceptual objects, and it has been suggested that the mechanisms solving it are also the same (Murthy & Fetz, 1992; Stoet & Hommel, 1999). That is, action plans are not just bundles of activated feature codes, but integrated wholes.

Figure 2. Integration of feature codes into event representations. Feature codes that are activated by external stimulation or internal processes are bound into separate, coherent event structures. In the illustrated example, each of the two represented events includes to two unique features (f1, f2, and f4, f5, respectively), but the two events (and, thus, their representations) overlap with respect to one feature (f3).

 

The distinction between activation and integration has profound implications for predicting and understanding interactions between event codes. Figure 2 shows the representations of two events, Event 1 comprising the feature codes f1, f2, and f3, and Event 2 made up of f3, f4, and f5. Note that the two representations overlap in f3, this inviting interaction between them. Now assume that Event 1 is perceived or planned by first registering or selecting (i.e., activating) the corresponding feature codes and then integrating them. As long as the feature codes are merely activated, the representation of Event 2 will also be (partially) activated, this leading to the priming or refreshing of that event code. However, as soon as the features belonging to Event 1 get integrated (indicated by the straight-lined ellipse in Figure 2), the two event representations no longer support, but interfere with, each other. As f3 is now associated or synchronized with Event 1, it is no longer available for representing other events, so that the integrated code of Event 1 will (partially) suppress or hamper the coding of Event 2 (indicated by the broken-lined ellipse).

This scenario does not only allow an account of both facilitation and interference between event-coding processes; it also predicts specific time courses of these phenomena. Regarding perceptual coding, TEC assumes that the first phase of processing a stimulus consists in the parallel activation of all stimulus-feature-related feature codes. If one or more of the activated codes are, or already have been, used to form another event code, performance relying on this event code is facilitated. This is true for perception (i.e., if the respective event code refers to a stimulus event) and for action (i.e., if the code refers to an intended action outcome). Regarding action planning, the first step will also consist in activating the feature codes of the intended action features (a possible exception being highly overlearned actions that may be stored in an already integrated format). Again, this will prime event codes with overlapping features, whether these are used in perception or action planing.

During the second phase, activated feature codes will be integrated, so that they are no longer (easily) available for concurrent coding processes. As a consequence, facilitation of processes operating on feature-overlapping events turns into interference. In perception, integration is likely to be associated with attentional processing (Treisman, 1988) and, therefore, will depend on whether the respective stimulus is attended by the perceiver (see next section). In action planning, integration is likely to be associated with specific preparation of the particular action, not with the general intention to perform the action on some occasion (Stoet & Hommel, 1999).

3.2.3 Attentional and Intentional Modulation of Event Coding

Perceptual and action-planning processes are selective. For instance, if you take a look at your watch to see what time it is, you might be aware of the watch’s orientation relative to you and of the relative location of, and the angle between the hands, because these features are relevant for extracting the correct time information. However, you might not notice--and possibly not remember on a later occasion--the color of the hands or of what material the watch strap is made of, although these features may well have been very salient when you bought the watch. In short, the situational context and the current intentions of the perceiver-actor are likely to have an impact on the processing and coding of perceived events. And the same is true for produced events: If you are reaching out for a cup of coffee, say, it is critical for the success of the action that and how the fingers grip the cup’s handle. Other action features are of much less relevance, such as the hand’s exact angle relative to the cup or the parameters of its trajectory, although they are likely to be of central importance if we replaced the cup by, say, a snake. Again, contexts and intentions are crucial in defining what features of an action are relevant and which are not. Moreover, in order to coordinate perceptual and action-planning processes, selectivity on the perception side needs to be tuned to action requirements (selection for action, Allport, 1987) and vice versa.

In TEC, event coding in perception and action is highly (although not completely) dependent on the perceiver/actor’s current aims and goals. In particular, we assume that event coding is tailored to the situational demands by means of setting and changing the relative weights of feature codes, thereby influencing the degree to which these codes contribute to the resulting event code. If a particular feature is relevant for a given task--whether it codes a stimulus or a response--its code will be primed in advance, which has several consequences. First, the feature code’s basic activation level will be increased relative to the standard resting level. Second, if the code then gets activated during the processing of a stimulus event or an action plan, its activation level is higher than that of codes corresponding to task-irrelevant features. For instance, if the form of a stimulus is task-relevant and its color is not, both form and color codes will receive some degree of activation when the stimulus is presented; however, as the form codes are primed due to task relevance, the net activation will be higher in the form than the color code. Third, the higher activation of task-relevant codes entails that these codes will play a dominant part in feature binding: the higher the activation of a code the more strongly--and/or the more likely--it will become integrated into the resulting event code. Consequently, feature weighting affects both activation and integration. If a feature code is weighted highly, its activation level will be higher than that of other, less highly weighted codes--with the result that during the activation phase it will more strongly facilitate the coding of events possessing the respective feature. However, once integration sets in, the respective feature will be more prominently represented in the emerging event code and, therefore, the coding of other, feature-overlapping events will be hampered more if they overlap in this than in other, less strongly weighted features.

The feature-weighting principle implies that representations of objects and actions may well include information about both relevant and irrelevant features of the represented entities, yet the former will dominate the latter and, thus, will in a certain sense define or characterize the whole entity. We assume that this is so no matter what kind of entity is represented, that is, features will be weighted according to their task relevance in perception as well as in action planning. With reference to perception, feature weighting may be called an attentional process, inasmuch as it selectively prepares the cognitive system for the differential processing of relevant (i.e., to-be-attended) and irrelevant (i.e., to-be-ignored) features of an anticipated perceptual event. With reference to action planning, however, the same kind of feature weighting could rather be called an intentional process, because it reflects the perceiver/actor’s intention to bring about a selected aspect of the to-be-produced event. In other words, feature weighting always implies preparation for and anticipation of forthcoming events, and this may either refer to a to-be-perceived or a to-be-produced event.

On first sight, the feature-weighting principle might seem to imply that the impact of attentional/intentional processes is restricted to single values on feature dimensions. In fact, however, how the principle is applied is likely to vary with the task. First, there are situations where it is only or mainly a single feature that matters, such as in seeking a star in the sky or when raising a finger. Under these conditions increasing the weight of a single value on a feature dimension (e.g., BRIGHT or UP) will suffice to solve the task. Second, there are situations that require discriminative responses to the dimensional value of a stimulus, such as when being confronted by a traffic light or in a binary-choice experiment. Under these conditions, a first selection of signal stimuli from noise, and of valid from invalid responses, is possible by increasing the weights for a whole feature dimension (Ward, 1982), and there is indeed evidence that defining the task-relevant stimulus- and response-feature dimensions is an important part of preparing and implementing a task set (Meiran, in press). Third, given that features can vary in complexity, increasing the weights for a particular feature or feature dimension also implies a selective preparation for a particular level and grain size of event coding. For instance, when being confronted with another person, one can choose to attend to, and act towards the whole person, his/her face, eye, or pupil, and these attentional attitudes are likely to be expressed by weighting codes of event features that differ in complexity. Similarly, in a left-right choice-reaction task, say, actors are likely to specify and code their responses in terms of the categorical feature codes LEFT and RIGHT, while in a positioning task action planning will rely on feature codes of a much finer grain size. That is, perceiver/actor’s will switch between more abstract and more detailed representational or coding levels, whatever is more suitable to performing a task.

3.2.4 Roles of Event Codes

Although we deny any fundamental difference between the event codes that underlie perception and those that are functional in action planning, we admit that it still makes sense to distinguish between stimulus codes and response codes. Let us consider a movement with the right index finger, say. This movement can have internal causes, such as when the actor intentionally lifts the finger in response to a signal, or external causes, such as if the finger is mechanically lifted by a motor to signal a response with another effector. Even though the efferent contribution to these two events will grossly differ, the (re-) afferent information will be more or less identical; thus, their codes will largely overlap. Nevertheless, the representations of the afferent information serve different purposes in the two examples: In the first, active situation, the codes referring to the perceived (self-controlled) finger lift represent the intended action goal, and can therefore be considered action or response codes.In contrast, in the passive situation the very same codes serve to represent the (experimenter-controlled) stimulus, and can therefore be legitimately considered stimulus codes. However, we must not forget that it is only the role of the represented event that decides whether a code is a stimulus or a response code, not the features or characteristics of this event or of the codes representing it. That is, what one regards as stimulus and response depends more on who controls the respective event--the experimenter or the participant--and not so much on the nature of its representation. All that comes down to the conclusion that the role of an event in a given context should not be confused with the type of its cognitive code. Not only can different representations play equivalent roles; the same representation can also play rather different roles.

3.2.5 Hierarchical Coding

For the sake of simplicity, we have used the term "event" as if it would refer to an easily discriminable, well-defined snapshot of the world or a single, discrete movement. Intuitively, it seems justified and not overly problematic to call a light flash or a single keypress an event that is presumably represented by a single, coherent cognitive structure. But what about a series of light flashes or keypresses, or a whole stimulus-response pair? Couldn’t they still count as one event? And what about a movie, a holiday trip, or a scientific career?

Obviously, it is rather difficult to imagine how to define an event in a way that is strict and meaningful at the same time. However, this definitional problem does not only exist for the theorist, but for the cognitive system as well. True, designers of good psychological experiments will usually try hard to make it very clear to their participants what counts as relevant stimulus and legal response. Outside the lab, however, it is much less clear how to properly individuate perceptual events and identify their structure. Assume, for instance, you are watching a soccer game or a birthday party. Obviously, such events consist of what Barker (1963) has called a "stream of behavior," which not only could be, but actually is, segmented in tens, hundreds, or thousands of meaningful units, depending on the observer’s interest, expertise, or attentional capabilities (e.g., Cohen & Ebbesen, 1979; Massad, Hubbard, & Newtson, 1979; for an overview, see Stränger & Hommel, 1996).

Given this flexibility of the cognitive system to define events internally by selecting a particular temporal interval, a particular aspect, and a particular grain size of the physical event, it would make little sense to come up with an a priori definition, at least at this point of theoretical development. Therefore, we will stick to our loose usage of the term "event" and will leave it to empirical investigation and domain-specific theorizing to lay the groundwork for a stricter definition. Very likely, those efforts will end up with a much more complex picture than we have sketched in Figure 1. Thus, although our examples keep things as simple as possible, this is not to deny the formation of higher-order event codes in the cognitive system. Such higher-order codes may refer to several levels of an event, and they may become themselves integrated into even higher-order representations of whole tasks, and so forth. That is, even if this paper and most of our own research focuses on the lowermost representational level--the relationship and interplay between codes of simple events and codes of their features--this level only provides the basis for presumably much more complex representations of perceived and produced events.

4. EMPIRICAL EVIDENCE

In this section we will mainly review behavioral data from a number of labs including our own covering several fields of research, domains as different as visual attention, action planning, and sensorimotor performance. These studies served different functions in the history and development of TEC--some motivated the conception, inclusion, or addition of theoretical assumptions, some were motivated by TEC and carried out to test its implications, and some coevolved, so-to-speak, with the theory. All taken together, we think, these behavioral findings are in good agreement with our central assumptions and thus provide ample support for TEC as developed so far. Yet, before going into the behavioral details we briefly mention some recent neuroanatomical and neurophysiological evidence suggesting the existence of brain modules shared by perception and action planning. Obviously, TEC is a purely cognitive approach that is not bound to, or relies on particular brain mechanisms; nor do we wish to claim that the available neuroscientific evidence necessarily requires an approach such as we suggest. Nevertheless, we do find it important to point out that our ideas fit nicely into the current neuroscientific picture.

One example for how perception and action planning may be interfaced in the brain are the "visual-and-motor neurons" found in the monkey’s parietal cortex and the"mirror neurons" located in the premotor cortex–areas that are commonly associated with action planning (Jeannoerd, 1997; Passingham, 1993). Visual-and-motor neurons are active when a monkey manipulates a specific object and/or while that object is merely fixated (e.g., Sakata, Taira, Murata, & Mine, 1995; Taira, Mine, Geor