To be published in Behavioral and Brain Sciences (in press)
© Cambridge University Press 2002



Below is the unedited, uncorrected final draft of a BBS target article that has been accepted for publication. This preprint has been prepared for potential commentators who wish to nominate themselves for formal commentary invitation. Please DO NOT write a commentary until you receive a formal invitation. If you are invited to submit a commentary, a copyedited, corrected version of this paper will be posted.


 

 

The E-Z Reader Model of Eye Movement Control in Reading:

Comparisons to Other Models

 

 

 

 

Erik D. Reichle

University of Pittsburgh

 

Keith Rayner and Alexander Pollatsek

University of Massachusetts, Amherst

 

 

 

 

 

 

 

 

 

 

 

Send correspondence to:

Keith Rayner

Department of Psychology

University of Massachusetts

Amherst, MA 01003

413-545-2175

e-mail: rayner@psych.umass.edu

 

 

Running Head: Reading Models


Short Abstract

            The E-Z Reader model (Reichle, Pollatsek, Fisher, & Rayner, 1998; Reichle, Rayner, & Pollatsek, 1999) provides a theoretical framework for understanding how word identification, visual processing, attention, and oculomotor control jointly determine when and where the eyes move during reading.  Thus, in contrast to other models reviewed in this article, E-Z Reader can simultaneously account for many of the known effects of linguistic, visual, and oculomotor factors on eye movement control during reading.  Furthermore, the core principles of the model have been generalized to other task domains (e.g., equation solving, visual search), and are broadly consistent with what is known about the architecture of the neural systems that support reading.


Abstract

             The E-Z Reader model (Reichle, Pollatsek, Fisher, & Rayner, 1998; Reichle, Rayner, & Pollatsek, 1999) provides a theoretical framework for understanding how word identification, visual processing, attention, and oculomotor control jointly determine when and where the eyes move during reading.  In this article, we first review what is known about eye movements during reading.  Then we provide an updated version of the model (E-Z Reader 7) and describe how it accounts for basic findings about eye movement control in reading.  We then review several alternative models of eye movement control in reading, discussing both their core assumptions and their theoretical scope.  On the basis of this discussion, we conclude that E-Z Reader provides the most comprehensive account of eye movement control during reading.  Finally, we provide a brief overview of what is known about the neural systems that support the various components of reading, and suggest how the cognitive constructs of our model might map onto this neural architecture.

 

 

Key Words: Attention, Eye-Movement Control, E-Z Reader, Fixations, Lexical Access, Models, Reading, Saccades


1. Introduction

            Reading is a complex skill that involves the orchestration of many different stages of information processing.  As the eyes move across the printed page, the visual features of the text are converted into orthographic and phonological patterns, which are then used to guide further language processing so that the content of the text can be understood.  In this target article, we will compare different models that try to account for how eye movements are controlled in reading.  We will not review all of the models that have been proposed to explain various aspects of reading.  Instead, we will only discuss those models that have attempted to explain the interface between vision and low-level aspects of language processing; that is, models that specify some combination of the following components of reading: Eye movement control, visuospatial attention, and/or the visual processing of words1.   Not surprisingly, we will argue that the model that we implemented, E-Z Reader2 (Reichle, Pollatsek, Fisher, & Rayner, 1998; Reichle, Rayner, & Pollatsek, 1999), does a better job of accounting for a wide range of data than does its competitors.  However, we will also point out some shortcomings of the model.

            The remainder of this article will be organized into five major sections.  First, we will briefly review some important findings regarding eye movements in reading; within this section we will describe some findings that we believe a model of eye movement control should be able to accommodate.  Second, we will provide an overview of the E-Z Reader model, including an updating of the model (E-Z Reader 7).  Third, we will provide an overview of other models of eye movement control in reading (including discussions of the pros and cons of the models compared to E-Z Reader).  Fourth, we will discuss future directions and ways that we intend to extend the E-Z Reader model.  In this section, we will also discuss a possible mapping between model components and neurophysiological mechanisms.  Finally, we will provide some concluding comments.

2.0 Eye Movements in Reading

            Any discussion of models of eye movement control must begin with a brief overview of eye movements during reading.  In this section, we will describe what is known about eye movements during reading as background material.  The following topics will be discussed: (1) saccades and fixations, (2) visual acuity, (3) saccade latency, (4) the acquisition of information during eye fixations, (5) perceptual span, (6) parafoveal preview effects, (7) regressions, (8) eye movement control (where to fixate next and when to move the eyes), and (9) measures of processing time.  It is not our intention to provide a complete and comprehensive review of each of these topics as our primary purpose in this article is to compare different models of eye movement control in reading.  The interested reader is invited to consult Rayner (1998) for a more complete review of each of the nine topics discussed in this section.

            2.1 Saccades and fixations.  Contrary to our subjective impression, the eyes do not move smoothly across the printed page during reading.  Instead, the eyes make short and rapid movements, called saccades (Erdmann & Dodge, 1898; Huey, 1908) that typically move them forward about 6-9 character spaces, although there is considerable variability (Rayner, 1978, 1998).  Since the distribution of saccade sizes, measured in number of character spaces is largely independent of visual angle when the number of character spaces is held constant (Morrison & Rayner, 1981; O’Regan, 1983), virtually all studies of reading use number of character spaces as the appropriate metric.  Saccades take 20-50 ms to complete depending upon the length of the movement and virtually no visual information is extracted during eye movements (Ishida & Ikeda, 1989; Wolverton & Zola, 1983).  Between saccades, the eyes remain stationary for brief periods of time (typically 200-250 ms) called fixations (Erdmann & Dodge, 1898; Huey, 1908).  Because visual information is only extracted from the printed page during fixations, reading is similar to a slide show in which short segments of text are displayed for approximately a quarter of a second.  It is important to note that there is considerable variability in both saccade length and fixation duration.  Some saccades only move the eyes a single character, whereas others are as large as 15-20 characters (although such long saccades typically follow regressions and place the eyes beyond the place from where the regression was initiated).  Likewise, some fixations are shorter than 100 ms and others are longer than 400 ms (Rayner, 1978, 1998).  Much of this variability apparently is related to the ease or difficulty involved in processing the currently fixated text.

            2.2 Visual acuity.  One of the reasons that the eyes are constantly moving in reading is that there are severe limits to how much visual information can be processed during a fixation.  Visual acuity is maximal in the center of the retina and rapidly decreases towards the periphery and fine visual discriminations can only be made within the fovea, or central 2° of vision.  As a result, the visual features that make up individual letters can only be encoded from a very narrow window of vision.  The practical significance of this is that it is necessary to fixate most words so that they can be identified.  Indeed, there is considerable evidence that a word becomes increasingly difficult to identify as the angular disparity between the fovea and the retinal image of a word increases (Rayner & Bertera, 1979; Rayner & Morrison, 1981).  Explaining how the reader deals with this limited acuity is one constraint on any model of eye movements.

            2.3 Saccade latency.  A second kind of constraint on any model of reading stems from the “race” between the processes identifying words and the need to plan a saccade early enough in a fixation so that reading can carry on at about 300 words per minute.  On the one hand, experiments in which subjects move their eyes to visual targets indicate that the saccadic latency, or the time needed to plan and execute a saccade, is approximately 180-250 ms (Becker & Jürgens, 1979; Rayner, Slowiaczek, Clifton, & Bertera, 1983).  This suggests that the decision to make a saccade is often made within the first 100 ms of a fixation.  However, this is seemingly at odds with the intuitively appealing idea that word recognition is a major contributor to driving eye movements during reading because most estimates indicate that lexical access requires 100-300 ms to complete (Sereno, Rayner, & Posner, 1998; Rayner & Pollatsek, 1989; Schilling, Rayner, & Chumbley, 1998).  It is thus not immediately obvious how the identification of one word can be the signal to begin planning a saccade to the next.  Indeed, early theories of eye movements in reading (Bouma & deVoogd, 1974; Kolers, 1976) posited that word identification was too slow to be the engine driving eye movements.

            2.4 The acquisition of information during reading.  During saccades, vision is suppressed so that the information needed for reading is acquired only during fixations (Ishida & Ikeda, 1989; Wolverton & Zola, 1983).  Furthermore, reading proceeds quite smoothly if text is available for processing for only the first 50-60 ms of a fixation prior to the onset of a masking pattern (Ishida & Ikeda, 1989; Rayner, Inhoff, Morrison, Slowiaczek, & Bertera, 1981).  This does not mean that words are identified within 50 ms, but rather that the information that is needed for reading gets into the processing system within 50-60 ms.

            2.5 Perceptual span.  One solution to the quandary over how word identification can be a signal to move the eyes is that words can be partially processed in the parafovea, or region of the retina that extends 5° on either side of the fovea.  McConkie and Rayner (1975) demonstrated the importance of parafoveal processing using an eye-contingent display change technique, called the moving-window paradigm, which is illustrated in Figure 1.  In this paradigm, the letters outside of a “window” spanning a given number of character spaces is distorted in some way (e.g., replaced with X’s).  By varying the size of the window and making its location contingent upon where the reader is looking, it is possible to determine the perceptual span, or region from which useful visual information can be encoded.  With alphabetic text (like English), readers can progress at a more-or-less normal rate when the window extends 14-15 character spaces to the right (McConkie & Rayner, 1975; Rayner, 1986; Rayner & Bertera, 1979; Rayner, Well, Pollatsek, & Bertera, 1982; DenBuurman, Boersma, & Gerrissen, 1981) and 3-4 character spaces to the left of the fixation point (McConkie & Rayner, 1976; Rayner, Well, & Pollatsek, 1980).  However, word encoding probably does not extend more than 7-8 characters to the right of fixation (Rayner et al., 1982; McConkie & Zola, 1984; Underwood & McConkie, 1985); beyond this distance, only low-spatial frequency information about letter shape (e.g., descenders vs. ascenders) and word length is extracted from the page.  The left-right asymmetry reflects covert attention and is language specific; with Hebrew text (which is read from right to left), the perceptual span extends asymmetrically to the left of fixation (Pollatsek, Bolozky, Well, & Rayner, 1981).

 

 

 

 

 

 

Figure 1.  The moving-window paradigm.  Panel A shows the positions of three successive fixations (indicated by the asterisks) in a normal line of text.  Panels B and C illustrate how a “window” of normal text is displayed contingent upon where the eyes are currently looking.  Panel B shows a two-word moving window; that is, both the fixated word and the word to the right of fixation are displayed normally, and all of the letters in the remaining words are replaced by Xs.  In Panel C, the window extends four character spaces to the left of fixation and 14 character spaces to the right of fixation.

 

            Four other points about the perceptual span are relevant.  First, the perceptual span does not extend below the line that is currently being read (Inhoff & Briihl, 1991; Inhoff & Topolski, 1992; Pollatsek, Raney, LaGasse, & Rayner, 1993); readers focus their attention on the line that they are currently reading.  Second, studies using various eye-contingent display change techniques have revealed that the size of the span is fairly constant for readers of similar alphabetic orthographies (such as English, French, and Dutch; see Rayner, 1998 for further details).  Third, characteristics of the writing system influence not only the asymmetry of the span, but also the overall size of the perceptual span.  Thus, the span is smaller for Hebrew than English (Pollatsek et al., 1982) since Hebrew is a more densely packed language than English.  And, it is much smaller for writing systems like Japanese (Ikeda & Saida, 1978; Osaka, 1992) and Chinese (Inhoff & Liu, 1998) that have ideographic components and hence are even more densely packed than Hebrew.  Fourth, the perceptual span is not hardwired, but rather seems to be attention-based.  The fact that there is an asymmetry due to the direction of the writing system is consistent with the span being attention based.  In fact, Pollatsek et al. (1982) found that the perceptual span of Israeli readers who were bilingual in Hebrew and English had opposite asymmetries when reading the two languages.  Furthermore, Rayner (1986) found that the span was smaller for beginning readers than skilled readers and that the span got smaller when children with four years of reading experience were given text that was too difficult for them.  Analogous to this finding, Henderson and Ferreira (1990; see also Inhoff, Pollatsek, Posner, & Rayner, 1988; Kennison & Clifton, 1995; Schroyens, Vitu, Brysbaert, & d’Ydewalle, 1999) found that the span got smaller when the fixated word was difficult to process.  Finally, Balota, Pollatsek, and Rayner (1985) found that readers obtained more information to the right of fixation when the upcoming word was highly predictable from the preceding text.

            2.6 Parafoveal preview effects.  Consistent with the findings of the last section, it has been demonstrated that orthographic (Rayner, 1975; Balota et al., 1985; Binder, Pollatsek, & Rayner, 1999) and phonological (Pollatsek, Lesch, Morris, & Rayner, 1992) processing of a word can begin prior to the word being fixated.  These results indicate that, during normal reading, the parafoveal preview of a word can reduce the duration of the subsequent fixation on the word, which is one measure of the time needed for identification (Schilling et al., 1998).  Surprisingly, neither semantic (Altarriba, Kambe, Pollatsek, & Rayner, 2001; Rayner, Balota, & Pollatsek, 1986) nor morphological (Kambe, 2002; Lima, 1987; Lima & Inhoff, 1985) information extracted from the parafovea appears to be of any benefit when the word is later fixated3.  Furthermore, parafoveal preview benefit is not due to retention of visual featural information as the case of all the letters can change from fixation to fixation with virtually no disruption to the reading process (McConkie & Zola, 1979; Rayner, McConkie, & Zola, 1980).  Instead, the source of the preview benefit seems to be due to abstract letter codes and phonological codes (see Rayner, 1998, for a review).  However, parafoveal information can produce word skipping (i.e., the word is not fixated) because words that can be identified in the parafovea do not have to be fixated and can therefore be skipped.  Many experiments (Balota et al., 1985; Binder et al., 1999; Ehrlich & Rayner, 1981; Rayner, Binder, Ashby, & Pollatsek, 2001; Rayner & Well, 1996; Schustack, Ehrlich, & Rayner, 1987) have demonstrated that predictable words are skipped more than unpredictable words and that short function words (like “the”) are skipped more than content words (O’Regan, 1979, 1980; Gautier, O’Regan, & LeGargasson, 2000).  When words are skipped, there is some evidence suggesting that the durations of the fixations preceding and following the skip are inflated (Pollatsek, Rayner, & Balota, 1986; Reichle et al., 1998)4.

            2.7 Regressions.  One indicator of the inherent difficulty of reading (even for skilled readers) is that 10-15% of the saccades move the eyes back to previous parts of the text.  These backward movements, called regressions, are thought to result both from problems with linguistic processing and oculomotor error.  The hypothesis that regressions can be caused by difficulties in linguistic processing is perhaps most clearly supported by the finding that regressions can be induced with structurally difficult “garden path” sentences; because such sentences often lead to incorrect syntactic analyses, reader often make regressions back to the point of difficulty and then re-interpret the sentence (Frazier & Rayner, 1982).  The idea that regressions are sometimes due to simple motor error is supported by the finding that, when the eyes fixate near the end of a word, they often move back a few character spaces (O’Regan, 1990).  This presumably happens because the eyes overshot their intended target (near the middle of the word) and a second fixation location affords a better place from which to see the word.  This interpretation is consistent with the finding that identification is most rapid if a word is fixated just to the left of its center, on the optimal viewing position (Clark & O’Regan, 1999; O’Regan, Lévy-Schoen, Pynte, & Brugaillère, 1984; O’Regan, 1990, 1992).

            2.8 Eye movement control.  Numerous studies have attempted to determine the characteristics of the mechanisms that control eye movements during reading.  There are two different activities that must be explained: (1) What determines where the reader decides to look next? and (2) What determines when the reader moves his/her eyes (either forward or backward in the text)?  Although there is not total consensus on these issues, there is some evidence to suggest that decisions about where to fixate next and when to move the eyes are made somewhat independently (Rayner & McConkie, 1976; Rayner & Pollatsek, 1981).  The earliest unambiguous demonstration that the duration of the current fixation and the length of the next saccade are computed on-line was provided by Rayner and Pollatsek (1981).  They varied physical aspects of the text randomly from fixation to fixation and found that the behavior of the eyes mirrored what was seen on a fixation.  In their first experiment, they used the moving window paradigm described above and varied the size of the window randomly from fixation to fixation and found that saccade length varied accordingly.  Thus, if the window on the current fixation was small, the eyes only moved a few characters, while if it was large, the eyes moved further.  In their second experiment, they delayed the onset of text in the fovea via a mask that appeared at the beginning of a fixation (with the time the mask was on varying randomly from fixation to fixation) and found that fixation durations were adjusted accordingly.  In addition, the manipulations affected saccade length and fixation duration independently; in the first experiment, saccade length was affected, but fixation duration was not, while in the second experiment, fixation duration was affected, but saccade length was not.  Thus, while the decisions about where to fixate next and when to move the eyes may sometimes overlap (see Rayner, Kambe, & Duffy, 2000), there is reason to believe the two decisions are made somewhat independently.

            2.8.A Where to fixate next.  Decisions about where to fixate next seem to be determined largely by low-level visual cues in the text, such as word length and the spaces between words.  Five types of results are consistent with this claim.  First, saccade length is influenced by the length of the fixated word and the word to the right of fixation (Blanchard, Pollatsek, & Rayner, 1989; O’Regan, 1979, 1980; Rayner, 1979; Rayner & Morris, 1992).  Second, when readers do not have information about where the spaces are between upcoming words, saccade length decreases and reading is slowed considerably (McConkie & Rayner, 1975; Morris, Rayner, & Pollatsek, 1990; Pollatsek & Rayner, 1982; Rayner, Fischer, & Pollatsek, 1998).  Third, although there is some variability in where the eyes land on a word, readers tend to make their first fixation about halfway between the beginning and the middle of the word (Rayner, 1979; McConkie, Kerr, Reddix, & Zola, 1988; McConkie, Kerr, Reddix, Zola, & Jacobs, 1989; McConkie, Zola, Grimes, Kerr, Bryant, & Wolff, 1991; Vitu, 1991).  Recently, Deutsch and Rayner (1999) demonstrated that the typical landing position in Hebrew words is likewise between the beginning (i.e., right-most end) and middle of a word.  Rayner (1979) originally labeled this prototypical location the preferred viewing location.  This position where the eyes typically land in a word is different from the optimal viewing location, which is the location in the word at which recognition time is minimized.  According to O’Regan and Levy-Schoen (1987), the optimal viewing position is a bit to the right of the preferred viewing location, closer to the center of the word.  Fourth, while contextual constraint influences skipping, in that highly predictable words are skipped more than unpredictable words (Balota et al., 1985; Ehrlich & Rayner, 1981), contextual constraint has little influence on where the eyes land in a word (Rayner et al., 2001)5.  Finally, the landing position on a word is modulated by the launch site (McConkie et al., 1988; Radach & Kempe, 1993; Radach & McConkie, 1998; Rayner, Sereno, & Raney, 1996) because the landing position varies as a function of the distance from the prior fixation.  As the launch site moves further from the target word, the distribution of landing positions shifts to the left and becomes more variable (see Fig. 2).

 

 

 

 

 

 

 

 

 

 

Figure 2.  Landing site distribution as a function of the saccade length between the launch site (wordn-1) and intended saccade target (wordn).  In all three panels, the launch site and target words are depicted by rectangles, with character spaces represented by numbers (as per convention, the space to the left of wordn is denoted by a zero.)  The landing site distributions are approximately Gaussian in shape.  Although the distributions are centered near the middle of the saccade targets, the oculomotor system is biased towards making saccades approximately seven character spaces in length.  This bias results in a systematic range error; that is, the eyes tend to overshoot close targets and undershoot more distant targets.  For example, in the middle panel, the intended saccade target is five character spaces from the launch site, so that (on average) the eyes overshoot their intended target, thereby causing the landing site distribution to shift towards the end of wordn.  In the bottom panel, the opposite happens: The eyes undershoot their target, causing the landing site distribution to shift towards the beginning of wordn.  

  

            2.8.B When to move the eyes.  The ease or difficulty associated with processing a word primarily influences when the eyes move.  Although a case can be made that low-level non-linguistic factors can also influence the decision about when to move the eyes, the bulk of the evidence suggests that linguistic properties of words are the major determiner of when to move.   A very robust finding is that readers look longer at low-frequency words than at high-frequency words (Altarriba, Kroll, Sholl, & Rayner, 1996; Henderson & Ferreira, 1990, 1993; Hyönä & Olson, 1995; Inhoff & Rayner, 1986; Just & Carpenter, 1980; Kennison & Clifton, 1995; Lavigne, Vitu, & d’Ydewalle, 2000; Raney & Rayner, 1995; Rayner, 1977; Rayner & Duffy, 1986; Rayner & Fischer, 1996; Rayner & Raney, 1996; Rayner et al., 1996; Rayner et al., 1998; Sereno, 1992; Vitu, 1991; Vitu, McConkie, Kerr, & O’Regan, 2001).  There are three additional points with respect to this finding that are relevant.  First, there is a spillover effect associated with fixating a low-frequency word; that is, fixation time on the next word is inflated (Rayner & Duffy, 1986).  Second, although the duration of the first fixation on a word is influenced by the frequency of that word, the duration of the prior fixation is not (Carpenter & Just, 1980; Henderson & Ferreira, 1993; Rayner et al., 1998).  Third, high-frequency words are skipped more than low-frequency words, particularly when they are short and the reader is fixated close to the beginning of the word (O’Regan, 1979; Rayner et al., 1996).

            A second important finding is that there is a predictability effect on fixation time in addition to a frequency effect.  Words that are highly predictable from the preceding context are fixated for less time than are words that are not so constrained (Altarriba et al., 1996; Balota et al., 1985; Binder et al., 1999; Ehrlich & Rayner, 1981; Inhoff, 1984; Lavigne et al., 2000; Rayner et al., 2001; Rayner & Well, 1996; Schustack et al., 1987; Zola, 1984).  Generally, the strongest effects of predictability are not as large as those of the strongest frequency effects.  Also, as we noted above, predictability has a strong effect on word skipping: Words that are highly predictable from the prior context are skipped more than words that are not so constrained.

            2.9 Measures of processing time.  To investigate the components of reading, researchers typically have subjects read sentences or passages of text while an eye tracker interfaced with a computer records the locations and durations of individual fixations.  Because an average college-level reader can read approximately 300 words per minute (Rayner & Pollatsek, 1989), this technique produces a staggering amount of data.  Accordingly, the data are usually reduced to word-based measures, which are across-subject averages that reflect how often and for how long individual words are fixated.  A number of word-based measures are standard (Inhoff & Radach, 1998; Liversedge & Findlay, 2000; Rayner, 1998; Rayner, Sereno, Morris, Schmauder, & Clifton, 1989; Starr & Rayner, 2001).  The first is gaze duration, which is defined as the sum of all fixations on a word, excluding any fixations after the eyes have left the word (i.e., including only refixations before the eyes move on to another word).  Gaze duration is usually averaged only over words that are not skipped during the initial encounter (or first pass) through that region of text.  Two other common measures are first-fixation duration and single-fixation duration.  The former is the duration of the first fixation on a word (again conditional on the word being fixated during the first pass through the text), while the latter is the average fixation duration on words that are fixated exactly once during the first pass.  These indices are typically reported along with indices of how often a word was fixated: The probability of a word being skipped, fixated once, and fixated more than once before moving to another word.  Often, the total time (the sum of all fixations on the word, including regressions back to the word) is also reported.

            The word-based measures provide a complete record of where and when fixations occurred.  These two aspects (where vs. when) also provide a useful framework for organizing a discussion of reading models because much of the controversy surrounding reading concerns the determinants of where and how long the eyes remain fixated.  The models that have been developed to explain eye movement control form a continuum, extending from models in which eye movements are determined primarily by oculomotor factors (oculomotor models) to those in which eye movements are guided by some form of cognitive control (processing models).  Prior to comparing different models, we will discuss our model, E-Z Reader (Pollatsek, Rayner, Fischer, & Reichle, 1999; Rayner, Reichle, & Pollatsek, 1998, 2000; Reichle et al., 1998, 1999; Reichle & Rayner, 2001) in some detail.  We will also provide an updated version of the model (E-Z Reader 7).

3.0 E-Z Reader

            E-Z Reader is a processing model, and extends the earlier work of Morrison (1984).  Morrison drew much of the inspiration for his model from the work of Becker and Jürgens (1979) and McConkie (1979).  McConkie (1979) suggested that, during reading, visual attention progressed across a line of text until the limitations of the visual system made it difficult to extract further lexical information; once this point of difficulty has been established, attention shifts and an eye movement is programmed and subsequently initiated, sending the eyes to the problematic location.  Although elegantly simple, the model was soon discarded due to problems in defining and explaining what the point of difficulty was, how it might be computed, and whether it could be computed soon enough to be of any use in skilled reading (Rayner & Pollatsek, 1989).

            The limitations inherent in McConkie’s (1979) early model of eye movement control led Morrison (1984) to propose a model in which the movement of the eyes was a function of successful processing.   According to Morrison, the identification of wordn (i.e., the word that is currently being fixated) causes the attention “spotlight” (Posner, 1980) to move to wordn+1, which in turn causes the oculomotor system to begin programming a saccade to wordn+1.  If the program finishes before wordn+1 is identified, then the saccade will be executed and the eyes will move to wordn+1.  However, if wordn+1 is identified before the program finishes, the saccade to wordn+1 may be canceled.  Cancellation can occur some of the time when attention shifts to wordn+2 while wordn is fixated.  In this case, the oculomotor system begins programming a saccade to wordn+2, which overrides the program to move the eyes to wordn+1 if the new program interrupts the old program soon enough.  Thus, according to Morrison, attention moves serially, from word to word, whereas saccades can be programmed in parallel.

            Morrison’s (1984) assumption about the parallel programming of saccades followed Becker and Jürgens’ (1979) demonstration that saccadic programming is completed in two stages: An initial, labile stage that is subject to cancellation, and an ensuing, non-labile stage in which the program cannot be canceled.  Their results suggested that if the oculomotor system begins programming a saccade while another saccadic program is in its labile stage of development, then the first program is aborted.  However, if the second program is initiated while the first saccadic program is in its non-labile stage, then both saccades will be executed, which typically results in a very short fixation between the two saccades.

            With these simple assumptions, Morrison (1984) was able to provide an elegant account of both frequency effects and parafoveal preview effects: Because short frequent words are more easily identified in the parafovea than long infrequent words, the former tend to be fixated for less time (and skipped more often) than the latter.  Despite its successes, however, Morrison’s model cannot explain refixations because the strictly serial attention shifts mean that each word is either fixated exactly once or is skipped.  More fundamentally, however, because Morrison’s model posits both that processing of words is strictly serial and that attention shifting is time-locked to word identification, the model is unable to handle some simple and robust phenomena in reading.  The first, as we noted above, is that one often gets “spillover” effects due to word frequency (e.g., Rayner & Duffy, 1986).  That is, lower frequency words often not only cause longer fixations on that word (wordn), but also lengthen either gaze durations and/or first fixations on the succeeding word (wordn+1).  According to Morrison’s model, this shouldn’t happen because attention doesn’t shift until wordn has been processed.  Because parafoveal processing on wordn+1 begins after this attention shift, the amount of information extracted from wordn+1 before it is fixated will only be a function of how long it takes to program and execute the saccade, and will not vary as a function of the frequency of wordn.  As a result, Morrison’s model predicts no delayed effects of word frequency (or any other delayed effects of word processing difficulty).  A related phenomenon (Henderson & Ferreira, 1990; Kennison & Clifton, 1995) is that the benefit gained through parafoveal preview decreases as foveal processing becomes more difficult (e.g., because the fixated word is lower frequency).  By essentially the same argument as above, Morrison’s model predicts that this shouldn’t happen because parafoveal preview time is only a function of the latency of moving the eyes after covert attention has shifted.

            There are at least three ways to circumvent the limitations of Morrison’s (1984) model.  The first is to add the assumption that if word identification is not completed by a processing deadline, attention does not shift to the next word, but instead remains on the current word, resulting in a refixation (Henderson & Ferreira, 1990; Sereno, 1992).  This leads to the prediction (which has not been supported; Rayner et al., 1996; Schilling et al., 1998) that the first of two fixations should be longer than single fixations because the former reflect cases in which the processing deadline must have been reached.  The second solution is to simply assume that difficulties with higher-order linguistic processing somehow cause the eyes to remain on the current word (Pollatsek & Rayner, 1990; Rayner & Pollatsek, 1989).  Unfortunately, how this happens has not been well specified.  Finally, a third way to avoid the shortcomings of Morrison’s proposal is to assume that word identification is completed in two stages.  This last approach is instantiated by E-Z Reader, which is discussed next.

            3.1 Overview of the E-Z Reader model.  E-Z Reader, like other processing models, makes the basic assumption that on-going cognitive (i.e., linguistic) processing influences eye movements during reading.  Because the model was not intended to be a deep explanation of language processing, it does not account for the many effects of higher-level linguistic processing on eye movements (for reviews, see Rayner, 1998; Rayner & Sereno, 1994; Rayner et al., 1989).  Although this is clearly a limitation, it should also be noted that many of these effects typically occur when the reader is having difficulty understanding the text that is being read, such as when a reader makes a regression to re-interpret a syntactically ambiguous “garden path” sentence (Frazier & Rayner, 1982).  The model can therefore be viewed as the “default” reading process.  That is, we view the process of identifying words to be the forward “driving engine” in reading, as the process of knitting the words into larger units of syntax or meaning would be too slow (whether successful or not) to be a signal to decide how and when to move the eyes forward for skilled readers.  Thus, we posit that higher-order processes intervene in eye movement control only when “something is wrong” and either send a signal to stop moving forward or a signal to execute a regression.  Hence, we view E-Z Reader as an explanation of what happens during reading when higher-level linguistic processing is running smoothly and doesn’t intervene.  One implication of this is that the model currently does not explain inter-word regressions.

            Like its immediate predecessors (see Reichle et al., 1998, 1999), E-Z Reader 7 consists of a small number of perceptual-motor and cognitive processes that determine when and where the eyes move during reading.  Figure 3 is a schematic diagram showing the flow of control among these processes.  As is evident in the figure, the central assumptions of the model are that: (1) a stage of word identification is the signal to move the eyes; and (2) attention is allocated from one word to the next in a strictly serial fashion.  Notice, however, that both visual encoding limitations and oculomotor constraints also play central roles in the moment-by-moment control of eye movements during reading.  In the discussion that follows, we will describe the specific assumptions of our model and how they are related to four major cognitive and perceptual-motor systems: Visual processing, word identification, attention, and oculomotor control.

 

 

 

 

 

 

 

 

 

 

 

Figure 3.  A schematic diagram of E-Z Reader 7.  Visual features on the printed page are projected from the retina to an early stage of visual processing, which then proceeds at a rate that is modulated by visual acuity limitations.  The low-spatial frequency information (e.g., word boundaries) is used by the oculomotor system to select the targets of upcoming saccades.  High-spatial information is passed on to the word identification system, which, though attentional selection, allows individual words to be identified by the word identification system.  The first stage of lexical processing (L1) signals the oculomotor systems to begin programming a saccade to the next word.  The completion of the second stage of word identification (L2) causes attention to shift to the next word.  Saccadic programming is thus decoupled from the shifts of attention.  Saccadic programming is completed in two stages: The first, labile stage (M1) can be canceled by the initiation of subsequent programs; the second, non-labile stage (M2) is not subject to cancellation.  Saccades are executed immediately after the non-labile stage of saccadic programming has been completed.  Black lines represent the flow of visual information, with the dashed line representing the low-spatial frequency information that is used by the oculomotor system to select the target locations of upcoming saccades.  The gray lines represent signals that are propagated among the various components of the model (e.g., the signal to shift attention).

 

3.1.A. (Early) visual processing.  Visual features from the printed page are projected from the retina to the visual cortex so that the objects on the page (i.e., the individual words) can be identified.  The earliest stages of visual processing are thought to be pre-attentive in that the features that make up individual words are not fully integrated into perceptual wholes (Lamme & Roelfsema, 2000; Wolfe & Bennett, 1996).  This processing is not instantaneous, with neural transmission from retina to brain taking approximately 90 ms to complete.

In our model, the preceding ideas are formalized by including the early processing stage in the visual system, which, though pre-attentive, is subject to visual acuity constraints (see Figure 3).  The duration of this early visual processing stage, t(V), is a free parameter that corresponds to the base time needed for neural transmission to propagate from the retina to those cortical and subcortical areas that mediate early visual processing.  To keep this assumption psychologically plausible, the value of t(V) was set equal to 90 ms.  However, because the rate of this early stage of processing is modulated by visual acuity, the rate at which a word is encoded is inversely proportional to both its length and its mean distance from the point of fixation.  More specifically, during each fixation, the amount of early visual processing (in ms) that is completed on each word in the visual field is determined by:

(1)  visual processing = t / (eSi½letter i – fixation½ /  N)

In Equation 1, t is the duration of the fixation (in ms), N is the number of letters in a word being processed, and e (= 1.08) is a free parameter6 that modulates the affects of the spatial disparity between each word’s letters and the fixation location (i.e., the center of the fovea).  Thus, the time needed to encode a word increases as the distance between its center and the fovea increases.  Moreover, the time needed to encode a word also increases with its length because the individual letters of long words will (on average) be further away from the point of fixation than will the individual letters of short words7.  One interesting implication of this equation is that the early visual processing of a word will be most rapid if the word is fixated near its center because a fixation on a word’s center will minimize the mean spatial deviations between the fixation and each of the word’s letters.  This property is also consistent with evidence that word identification is most rapid if the word is fixated near its center (or optimal viewing position; O’Regan, 1990, 1992; O’Regan & Lévy-Schoen, 1987; Vitu, O’Regan, & Mittau, 1990) and provides one explanation for why the eyes are seemingly directed towards this location during reading (see Shillcock et al., 2000).  It also allows the model to account for length effects (i.e., the finding that long words take longer to identify than short words; Just & Carpenter, 1980). 

Early visual processing is important for two other reasons.  First, it is necessary to obtain the word-boundary information that is needed to program saccades to upcoming words.  This is denoted in Figure 3 by the dashed arrow that extends from early visual processing to the labile stage of saccadic programming.  This arrow represents the flow of low-spatial frequency information that is acquired in the visual periphery (e.g., word boundaries, the presence/absence of ascenders and decenders, etc.).  The oculomotor system uses this information to program saccades to upcoming words.  Second, early visual processing provides the information that is subsequently used by higher-level visual areas to focus the attention “spotlight” and identify individual words.  Word identification (which is discussed in the next section) must therefore wait until the early visual encoding of that word has been completed.

3.1.B Word identification.  The process of identifying a word begins as soon as attention is focused on that word.  This identification process is then completed in two stages, reflecting early and late stages of lexical processing.  The first stage corresponds to being at (or at least close to) the identification of the orthographic form of the word.  We assume that this is not full lexical access, as the phonological and semantic forms of the word are not yet fully activated.  We labeled this process the “familiarity check” (i.e., f) in earlier versions of the model, but in E-Z Reader 7 it is simply referred to as the first stage of lexical access (i.e., L1). 

The second stage of word identification involves the identification of a word’s phonological and/or semantic forms so as to enable additional linguistic processing.  This second stage, therefore, more-or-less corresponds to what is typically thought to be “lexical access.”  In prior versions of our model, this stage of word identification was called the “completion of lexical access” (i.e., lc).  To avoid confusion, however, we will simply refer to this process as the second stage of lexical access (i.e., L2) in E-Z Reader 7.

The distinction between early and late stages of lexical processing has precedent in the literature; indeed, our distinction was partly motivated by the activation-verification model of lexical access (Paap et al., 1982).  The two models are broadly consistent if one conceptualizes the first stage of lexical access as a “quick and dirty” assessment of whether or not word identification is imminent, and the second stage as being the actual act of identification.  As indicated in Figure 3, this distinction is also important because the two stages of lexical processing play unique functional roles: The completion of the first stage of lexical access causes the oculomotor system to begin programming the next saccade, while the completion of the second stage causes the “spotlight” of attention to shift to the next word.  Thus, in E-Z Reader, saccadic programming is de-coupled from the shifting of attention.          

As with earlier versions of our model, the time (in ms) required to complete the first stage of lexical access on a word, t(L1), is a linear function of the natural logarithm of the word’s normative frequency of occurrence in printed text and its predictability within a given sentence context.  The mathematical statement of this relationship is given by Equation 2: 

(2)  t(L1) = [b1b2 ln(frequency)] (1 – q predictability)

            In Equation 2, b1 and b2 (= 228 and 10 ms, respectively) are free parameters that control how a word’s normative frequency (number of occurrences per million, as tabulated by Francis & Kučera, 1982) affect lexical processing time.  This time is also modulated by the right-hand term, in which the free parameter q (= 0.5) attenuates the degree to which a word’s predictability in a specific sentence context (as estimated using cloze task probabilities) attenuates the lexical processing time8.  In all of the simulations reported below, the actual times needed to complete the first stage of lexical processing was found by sampling from gamma distributions having means equal to t(L1) and standard deviations equal to 0.18 of their means.

The completion of the first stage of lexical processing of a word has two immediate consequences in the model: (1) it cues the oculomotor system to begin programming a saccade to the next word (the details of how the oculomotor system does this will be discussed in detail, below); (b) it initiates further processing of the word.  Because all (or at least most) of the orthographic coding has been completed in L1, the time required to complete the second stage of lexical processing, L2, is more influenced by a word’s predictability.  This distinction is reflected in Equation 3:

(3)  t(L2) =  D [b1 b2 ln(frequency)] (1 – predictability)

            As in Equation 2, the free parameters b1 and b2 control the degree to which a word’s frequency of occurrence affects the time necessary to process the word, but this quantity is attenuated by the free parameter D (= 0.5).  Note that, in contrast to L1, a word’s predictability fully affects L2; that is, words that can be predicted with complete certainty within a given sentence context will require no time in this second stage [i.e., if predictability = 1, then t(L2) = 0 ms].  Such cases reflect the situation when top-down information has already fully activated the semantic and phonological codes given reasonable corroborating input from orthography.  As was the case with the first stage of lexical processing, the actual process durations were sampled from gamma distributions.

Finally, it should be mentioned that—by adding the early visual processing stage to E-Z Reader 7—the minimal time needed to identify words in the model is very plausible.  Given the parameter values reported above, for example, the mean time needed to identify the word “the” (the most frequent word in English text) when it is centrally fixated and in a completely predictable context is 148 ms, while the time needed to identify the lowest frequency words in completely unpredictable contexts is 432 ms.  In contrast, E-Z Reader 6 predicted minimal and maximal mean word identification times of 16 and 278 ms, respectively.  E-Z Reader 7 thus predicts word identification latencies that are much more in line with the best available estimates: 150-300 ms (Rayner & Pollatsek, 1989).  

3.1.C Attention.  A central, and perhaps the most contentious, assumption of E-Z Reader is that covert shifts of attention occur serially, from one word to the next, as each word is identified in turn and then integrated into the discourse representation.  By “attention,” though, we do not mean spatial orientation; instead, we refer to the process of integrating features that allows individual words to be identified.  The separation between these two types of attention has considerable precedence in the literature (LaBerge, 1990).  For example, Treisman (1969) distinguished between input selection, or spatial orientation, and analyzer selection, or feature integration.  This distinction is important because spatial orientation shifts towards the targets of upcoming saccades (Hoffman & Subramaniam, 1995; however, see Stelmach, Campsall, & Herdman, 1997), which in E-Z Reader occur whenever the oculomotor system uses the low-spatial frequency information provided by the visual processing stage to program a saccade (see the dashed line in Fig. 3).  These shifts in spatial orientation, however, are decoupled from the shifts in attention (i.e., analyzer selection) that precede lexical processing.

Attention is allocated serially during reading because readers need to keep word order straight (Pollatsek & Rayner, 1999).  By shifting the focus of attention from one word to the next, readers identify and process each word in its correct order.  Although the results of several recent experiments (Kennedy, 1998, 2000; Kennedy, Ducrot, & Pynte, 2002; Inhoff, Starr, & Shindler, 2000; Starr & Inhoff, 2002) suggest that properties of two words (particularly visual/orthographic properties) can sometimes be encoded in parallel, we suspect that this does not usually occur in normal reading (see Rayner, White, Kambe, Miller, & Liversedge, 2002, for an extended discussion of these issues).  The reason for this is that much of the information that is conveyed by language (both written and spoken) is heavily dependent upon word order.

Furthermore, by decoupling eye movements from attention, our model can also explain aspects of eye movement control that Morrison’s (1984) model could not.  For example, E-Z Reader can explain why parafoveal preview benefit decreases as foveal processing difficulty increases (Henderson & Ferreira, 1990; Kennison & Clifton, 1995).  If the eyes are on wordn, parafoveal processing of wordn+1 begins, not with completion of the first stage of lexical processing of wordn, but after the completion of second stage.  Because parafoveal processing of wordn+1 ends (by definition) with the onset of the saccade to wordn+1, more time will remain for parafoveal processing of wordn+1 when wordn is easy to process (e.g., high-frequency).  This is depicted in Figure 4: The time required to complete L1 and L2 on wordn increases as its normative frequency decreases (see Equations 2 and 3).  Because the saccadic latency is not modulated by word frequency, a saccade will (on average) occur 240 ms (i.e., the mean saccadic latency) after the completion of L1.  This means that, with everything else being equal, the amount of time available to process wordn+1 in the parafovea will increase as the amount of time needed to process wordn decreases.

 

 

 

 

 

 

 

Figure 4.  A diagram showing how parafoveal preview benefit is modulated by normative word frequency.  The bottom line represents the time required to complete the first stage of lexical processing, t(L1), as a function of the natural logarithm of wordn’s token frequency.  The middle line represents the time required to complete the second stage of lexical processing, t(L2), on wordn.  Finally, the top line represents the saccadic latency, or time required to initiate a saccade from wordn to wordn+1.  On average, the saccadic latency requires a constant t(M1) + t(M2) ms to complete (starting from the point in time when the first stage of lexical processing on wordn has been completed).  In E-Z Reader, parafoveal preview begins as soon as wordn has been identified and attention has shifted to wordn+1.  The parafoveal preview is therefore limited to the duration of the interval (depicted by the shaded area in the figure) between t(L2) and t(M1) + t(M2).  Notice that, because the relative disparity between t(L1) and t(L2) increases as the frequency of wordn decreases, the duration of the parafoveal preview decreases with the frequency of wordn.

 

In the model, the serial-allocation-of-attention assumption is instantiated as follows: The completion of the second stage of lexical processing on wordn causes attention to shift to wordn+1, at which point the first stage of lexical processing begins on wordn+1 when pre-processing of wordn+1 is complete9.  The identification of one word thus causes the focus of attention to shift so that the word-identification system can begin identifying the next word (see Fig. 3).

3.1.D Oculomotor control.  Saccadic programming in E-Z Reader is completed in two stages: An early, labile stage (M1) that is subject to cancellation by subsequent programs, and a later, non-labile stage (M2) that is not subject to cancellation.  This assumption was motivated by demonstrations that a saccade to a first target can be cancelled by the presentation of a second to-be-fixated target if the second target is presented prior to approximately 230 ms after the first; after this time, both targets are typically fixated in sequence (Becker & Jürgens, 1979).  A considerable amount of subsequent research has supported this distinction between labile and non-labile stages of saccadic programming (Leff, Scott, Rothwell, & Wise, 2001; McPeek, Skavenski, & Nakayama, 2000; Molker & Fischer, 1999; Vergilino & Beauvillain, 2000).

During the first (labile) stage of saccadic programming, the eye movement system is simply engaged (or made ready) so that it can begin programming an eye movement.  The system then computes the distance between the current fixation location and the location of the saccade target (i.e., the intended saccade length).  Thus, although the target location is represented in terms of spatial coordinates, the saccadic program is represented in terms of a distance metric.  This is necessary because the distance that is specified by the saccadic program must ultimately be converted into the appropriate amount of force that has to be exerted (by the extraocular muscles) to execute the actual movement.  The labile stage of programming therefore consists of two sub-stages: (1) general system preparation, followed by (2) a location-to-distance transformation, in which the spatial location of the upcoming saccade target in converted into the necessary saccade length.  In E-Z Reader, the time needed to complete the labile programming stage is a random deviate that is sampled from a gamma distribution having a mean equal to a free parameter, t(M1), with each of the two aforementioned sub-stages subsuming half of this time. 

An important part of our model is that, when a saccade program is in the labile stage, it is subject to cancellation by a subsequent saccadic program.  If the second program is initiated during the system preparation sub-stage of the first program, then whatever amount of preparation has be done to ready the oculomotor system will also be applicable to the second program, so that it will be completed more rapidly than it otherwise would be.  If, however, the second program is initiated somewhat later, during the first program’s location-to-distance transformation sub-stage, then whatever processing has been done to specify the distance of the first saccade will not apply to the second because the target locations (and hence distances) of the two saccades are different.  This means that the second program will always require a minimal amount of time to finish—the time necessary to convert the spatial location of the saccade target into the intended saccade length.

During the second (non-labile) stage of programming, the command to move the eyes a particular direction and distance is communicated to the motor system.  At this point, an intended saccade is obligatory, and cannot be cancelled or modified by subsequent programs.  As with the labile stage of programming, the time needed to complete the non-labile stage of programming is sampled from a gamma distribution, with the mean of this distribution being equal to a free parameter, t(M2).  Upon completing the non-labile stage of programming, the saccade is executed immediately.

In E-Z Reader 7, the mean times needed to complete the labile, t(M1), and non-labile, t(M2), stages of saccadic programming were set equal to 187 and 53 ms, respectively.  To keep the model as simple as possible, the saccade durations were set equal to a fixed value: t(S) = 25 ms10.  Our saccadic-programming parameter values are consistent with estimates from simple saccade latency tasks (Becker and Jürgens, 1979; McPeek et al., 2000; Rayner et al., 1983).  It should be noted, however, that these values are in fact estimates of the minimal time required to initiate a saccade, often to pre-specified targets; in the context of reading text, therefore, the average saccadic latency may be slightly longer in duration than would be suggested by these previous estimates.

Let’s examine these assumptions using five key situations in reading.  The first situation (that is shown schematically in Figure 5A) is the simplest: Wordn is fixated, an eye movement is programmed to wordn+1, and no subsequent eye movement command is made while this program is in its labile stage.  The program therefore enters its non-labile stage, and an eye movement is made to wordn+1.

Now consider a second situation (Fig. 5B): Wordn is fixated, a program to fixate wordn+1 is initiated, but while the oculomotor system is being readied, a second program (to move the eyes to wordn+2) is initiated.  In this case, the program to fixate wordn+1 is cancelled, and the saccade leaving wordn wi