The
E-Z Reader Model of Eye Movement Control in Reading:
Comparisons
to Other Models
Erik D. Reichle
University of Pittsburgh
Keith Rayner and Alexander
Pollatsek
University of Massachusetts,
Amherst
Send correspondence to:
Keith Rayner
Department of Psychology
University of Massachusetts
Amherst, MA 01003
413-545-2175
e-mail: rayner@psych.umass.edu
Running Head: Reading Models
Short Abstract
The E-Z Reader model (Reichle, Pollatsek, Fisher, &
Rayner, 1998; Reichle, Rayner, & Pollatsek, 1999) provides a theoretical
framework for understanding how word identification, visual processing,
attention, and oculomotor control jointly determine when and where the eyes
move during reading. Thus, in contrast
to other models reviewed in this article, E-Z Reader can simultaneously account
for many of the known effects of linguistic, visual, and oculomotor factors on
eye movement control during reading. Furthermore,
the core principles of the model have been generalized to other task domains
(e.g., equation solving, visual search), and are broadly consistent with what
is known about the architecture of the neural systems that support reading.
Abstract
The E-Z Reader
model (Reichle, Pollatsek, Fisher, & Rayner, 1998; Reichle, Rayner, &
Pollatsek, 1999) provides a theoretical framework for understanding how word
identification, visual processing, attention, and oculomotor control jointly
determine when and where the eyes move during reading. In this article, we first review what is
known about eye movements during reading.
Then we provide an updated version of the model (E-Z Reader 7) and
describe how it accounts for basic findings about eye movement control in
reading. We then review several
alternative models of eye movement control in reading, discussing both their
core assumptions and their theoretical scope.
On the basis of this discussion, we conclude that E-Z Reader provides
the most comprehensive account of eye movement control during reading. Finally, we provide a brief overview of what
is known about the neural systems that support the various components of
reading, and suggest how the cognitive constructs of our model might map onto
this neural architecture.
Key Words: Attention,
Eye-Movement Control, E-Z Reader, Fixations, Lexical Access, Models, Reading,
Saccades
1. Introduction
Reading is a complex skill that involves the
orchestration of many different stages of information processing. As the eyes move across the printed page, the
visual features of the text are converted into orthographic and phonological
patterns, which are then used to guide further language processing so that the
content of the text can be understood.
In this target article, we will compare different models that try to
account for how eye movements are controlled in reading. We will not review all of the models that
have been proposed to explain various aspects of reading. Instead, we will only discuss those models
that have attempted to explain the interface between vision and low-level
aspects of language processing; that is, models that specify some combination
of the following components of reading: Eye movement control, visuospatial
attention, and/or the visual processing of words1. Not surprisingly, we will argue that the
model that we implemented, E-Z Reader2 (Reichle, Pollatsek,
Fisher, & Rayner, 1998; Reichle, Rayner, & Pollatsek, 1999), does a
better job of accounting for a wide range of data than does its
competitors. However, we will also
point out some shortcomings of the model.
The remainder of this article will be organized into five
major sections. First, we will briefly
review some important findings regarding eye movements in reading; within this
section we will describe some findings that we believe a model of eye movement
control should be able to accommodate.
Second, we will provide an overview of the E-Z Reader model, including
an updating of the model (E-Z Reader 7).
Third, we will provide an overview of other models of eye movement
control in reading (including discussions of the pros and cons of the models
compared to E-Z Reader). Fourth, we
will discuss future directions and ways that we intend to extend the E-Z Reader
model. In this section, we will also
discuss a possible mapping between model components and neurophysiological
mechanisms. Finally, we will provide
some concluding comments.
2.0 Eye Movements in
Reading
Any discussion of models of eye movement control must
begin with a brief overview of eye movements during reading. In this section, we will describe what is
known about eye movements during reading as background material. The following topics will be discussed: (1)
saccades and fixations, (2) visual acuity, (3) saccade latency, (4) the
acquisition of information during eye fixations, (5) perceptual span, (6)
parafoveal preview effects, (7) regressions, (8) eye movement control (where to
fixate next and when to move the eyes), and (9) measures of processing time. It is not our intention to provide a
complete and comprehensive review of each of these topics as our primary
purpose in this article is to compare different models of eye movement control
in reading. The interested reader is
invited to consult Rayner (1998) for a more complete review of each of the nine
topics discussed in this section.
2.1 Saccades and fixations. Contrary to our subjective impression, the
eyes do not move smoothly across the printed page during reading. Instead, the eyes make short and rapid
movements, called saccades (Erdmann & Dodge, 1898; Huey, 1908) that typically
move them forward about 6-9 character spaces, although there is
considerable variability (Rayner, 1978, 1998).
Since the distribution of saccade sizes, measured in number of character
spaces is largely independent of visual angle when the number of character
spaces is held constant (Morrison & Rayner, 1981; O’Regan, 1983), virtually
all studies of reading use number of character spaces as the appropriate
metric. Saccades take 20-50 ms to
complete depending upon the length of the movement and virtually no visual
information is extracted during eye movements (Ishida & Ikeda, 1989;
Wolverton & Zola, 1983). Between saccades,
the eyes remain stationary for brief periods of time (typically 200-250 ms)
called fixations (Erdmann & Dodge, 1898;
Huey, 1908). Because visual
information is only extracted from the printed page during fixations, reading
is similar to a slide show in which short segments of text are displayed for
approximately a quarter of a second. It
is important to note that there is considerable variability in both saccade
length and fixation duration. Some
saccades only move the eyes a single character, whereas others are as large as
15-20 characters (although such long saccades typically follow regressions and
place the eyes beyond the place from where the regression was initiated). Likewise, some fixations are shorter than
100 ms and others are longer than 400 ms (Rayner, 1978, 1998). Much of this variability apparently is
related to the ease or difficulty involved in processing the currently fixated
text.
2.2 Visual acuity. One of the reasons that the eyes are constantly moving in reading
is that there are severe limits to how much visual information can be processed
during a fixation. Visual acuity is
maximal in the center of the retina and rapidly decreases towards the periphery
and fine visual discriminations can only be made within the fovea, or central
2° of vision. As a result, the visual features that make
up individual letters can only be encoded from a very narrow window of
vision. The practical significance of
this is that it is necessary to fixate most words so that they can be
identified. Indeed, there is
considerable evidence that a word becomes increasingly difficult to identify as
the angular disparity between the fovea and the retinal image of a word
increases (Rayner & Bertera, 1979; Rayner & Morrison, 1981). Explaining how the reader deals with this
limited acuity is one constraint on any model of eye movements.
2.3 Saccade latency. A second kind of constraint on any model of reading stems from
the “race” between the processes identifying words and the need to plan a
saccade early enough in a fixation so that reading can carry on at about 300
words per minute. On the one hand,
experiments in which subjects move their eyes to visual targets indicate that
the saccadic latency, or the time needed to plan and execute a saccade,
is approximately 180-250 ms (Becker & Jürgens, 1979; Rayner, Slowiaczek,
Clifton, & Bertera, 1983). This
suggests that the decision to make a saccade is often made within the first 100
ms of a fixation. However, this is
seemingly at odds with the intuitively appealing idea that word recognition is
a major contributor to driving eye movements during reading because most
estimates indicate that lexical access requires 100-300 ms to complete (Sereno,
Rayner, & Posner, 1998; Rayner & Pollatsek, 1989; Schilling, Rayner,
& Chumbley, 1998). It is thus not
immediately obvious how the identification of one word can be the signal to
begin planning a saccade to the next.
Indeed, early theories of eye movements in reading (Bouma & deVoogd,
1974; Kolers, 1976) posited that word identification was too slow to be the
engine driving eye movements.
2.4 The acquisition of information during reading. During saccades, vision is suppressed so
that the information needed for reading is acquired only during fixations
(Ishida & Ikeda, 1989; Wolverton & Zola, 1983). Furthermore, reading proceeds quite smoothly
if text is available for processing for only the first 50-60 ms of a fixation
prior to the onset of a masking pattern (Ishida & Ikeda, 1989; Rayner,
Inhoff, Morrison, Slowiaczek, & Bertera, 1981). This does not mean that words are identified within 50 ms, but
rather that the information that is needed for reading gets into the processing
system within 50-60 ms.
2.5 Perceptual span. One solution to the quandary over how word identification can be
a signal to move the eyes is that words can be partially processed in the parafovea,
or region of the retina that extends 5° on either side of the
fovea. McConkie and Rayner (1975)
demonstrated the importance of parafoveal processing using an eye-contingent
display change technique, called the moving-window paradigm, which
is illustrated in Figure 1. In this
paradigm, the letters outside of a “window” spanning a given number of
character spaces is distorted in some way (e.g., replaced with X’s). By varying the size of the window and making
its location contingent upon where the reader is looking, it is possible to
determine the perceptual span, or region from which useful visual
information can be encoded. With
alphabetic text (like English), readers can progress at a more-or-less normal
rate when the window extends 14-15 character spaces to the right (McConkie
& Rayner, 1975; Rayner, 1986; Rayner & Bertera, 1979; Rayner, Well,
Pollatsek, & Bertera, 1982; DenBuurman, Boersma, & Gerrissen, 1981) and
3-4 character spaces to the left of the fixation point (McConkie & Rayner,
1976; Rayner, Well, & Pollatsek, 1980).
However, word encoding probably does not extend more than 7-8 characters
to the right of fixation (Rayner et al., 1982; McConkie & Zola, 1984;
Underwood & McConkie, 1985); beyond this distance, only low-spatial
frequency information about letter shape (e.g., descenders vs. ascenders) and
word length is extracted from the page.
The left-right asymmetry reflects covert attention and is language
specific; with Hebrew text (which is read from right to left), the perceptual
span extends asymmetrically to the left of fixation (Pollatsek, Bolozky, Well,
& Rayner, 1981).

Figure 1. The moving-window paradigm. Panel A shows the positions of three
successive fixations (indicated by the asterisks) in a normal line of
text. Panels B and C illustrate how a
“window” of normal text is displayed contingent upon where the eyes are
currently looking. Panel B shows a two-word
moving window; that is, both the fixated word and the word to the right of
fixation are displayed normally, and all of the letters in the remaining words
are replaced by Xs. In Panel C, the
window extends four character spaces to the left of fixation and 14 character
spaces to the right of fixation.
Four other points about the perceptual span are
relevant. First, the perceptual span
does not extend below the line that is currently being read (Inhoff &
Briihl, 1991; Inhoff & Topolski, 1992; Pollatsek, Raney, LaGasse, &
Rayner, 1993); readers focus their attention on the line that they are
currently reading. Second, studies
using various eye-contingent display change techniques have revealed that the
size of the span is fairly constant for readers of similar alphabetic
orthographies (such as English, French, and Dutch; see Rayner, 1998 for further
details). Third, characteristics of the
writing system influence not only the asymmetry of the span, but also the
overall size of the perceptual span. Thus,
the span is smaller for Hebrew than English (Pollatsek et al., 1982) since
Hebrew is a more densely packed language than English. And, it is much smaller for writing systems
like Japanese (Ikeda & Saida, 1978; Osaka, 1992) and Chinese (Inhoff &
Liu, 1998) that have ideographic components and hence are even more densely
packed than Hebrew. Fourth, the
perceptual span is not hardwired, but rather seems to be attention-based. The fact that there is an asymmetry due to
the direction of the writing system is consistent with the span being attention
based. In fact, Pollatsek et al. (1982)
found that the perceptual span of Israeli readers who were bilingual in Hebrew
and English had opposite asymmetries when reading the two languages. Furthermore, Rayner (1986) found that the
span was smaller for beginning readers than skilled readers and that the span
got smaller when children with four years of reading experience were given text
that was too difficult for them. Analogous
to this finding, Henderson and Ferreira (1990; see also Inhoff, Pollatsek,
Posner, & Rayner, 1988; Kennison & Clifton, 1995; Schroyens, Vitu,
Brysbaert, & d’Ydewalle, 1999) found that the span got smaller when the
fixated word was difficult to process.
Finally, Balota, Pollatsek, and Rayner (1985) found that readers
obtained more information to the right of fixation when the upcoming word was
highly predictable from the preceding text.
2.6 Parafoveal preview effects. Consistent with the findings of the last
section, it has been demonstrated that orthographic (Rayner, 1975; Balota et
al., 1985; Binder, Pollatsek, & Rayner, 1999) and phonological (Pollatsek,
Lesch, Morris, & Rayner, 1992) processing of a word can begin prior to the
word being fixated. These results
indicate that, during normal reading, the parafoveal preview of a word
can reduce the duration of the subsequent fixation on the word, which is one
measure of the time needed for identification (Schilling et al., 1998). Surprisingly, neither semantic (Altarriba,
Kambe, Pollatsek, & Rayner, 2001; Rayner, Balota, & Pollatsek, 1986)
nor morphological (Kambe, 2002; Lima, 1987; Lima & Inhoff, 1985)
information extracted from the parafovea appears to be of any benefit when the
word is later fixated3.
Furthermore, parafoveal preview benefit is not due to retention of
visual featural information as the case of all the letters can change from
fixation to fixation with virtually no disruption to the reading process
(McConkie & Zola, 1979; Rayner, McConkie, & Zola, 1980). Instead, the source of the preview benefit
seems to be due to abstract letter codes and phonological codes (see Rayner,
1998, for a review). However,
parafoveal information can produce word skipping (i.e., the word is not fixated)
because words that can be identified in the parafovea do not have to be fixated
and can therefore be skipped. Many
experiments (Balota et al., 1985; Binder et al., 1999; Ehrlich & Rayner,
1981; Rayner, Binder, Ashby, & Pollatsek, 2001; Rayner & Well, 1996;
Schustack, Ehrlich, & Rayner, 1987) have demonstrated that predictable
words are skipped more than unpredictable words and that short function words
(like “the”) are skipped more than content words (O’Regan, 1979, 1980; Gautier,
O’Regan, & LeGargasson, 2000). When
words are skipped, there is some evidence suggesting that the durations of the
fixations preceding and following the skip are inflated (Pollatsek, Rayner,
& Balota, 1986; Reichle et al., 1998)4.
2.7 Regressions.
One indicator of the inherent difficulty of reading (even for skilled
readers) is that 10-15% of the saccades move the eyes back to previous parts of
the text. These backward movements,
called regressions, are thought to result both from problems with
linguistic processing and oculomotor error.
The hypothesis that regressions can be caused by difficulties in
linguistic processing is perhaps most clearly supported by the finding that
regressions can be induced with structurally difficult “garden path” sentences;
because such sentences often lead to incorrect syntactic analyses, reader often
make regressions back to the point of difficulty and then re-interpret the
sentence (Frazier & Rayner, 1982).
The idea that regressions are sometimes due to simple motor error is
supported by the finding that, when the eyes fixate near the end of a word,
they often move back a few character spaces (O’Regan, 1990). This presumably happens because the eyes
overshot their intended target (near the middle of the word) and a second
fixation location affords a better place from which to see the word. This interpretation is consistent with the
finding that identification is most rapid if a word is fixated just to the left
of its center, on the optimal viewing position (Clark & O’Regan,
1999; O’Regan, Lévy-Schoen, Pynte, & Brugaillère,
1984; O’Regan, 1990, 1992).
2.8 Eye movement control. Numerous studies have attempted to determine
the characteristics of the mechanisms that control eye movements during
reading. There are two different activities
that must be explained: (1) What determines where the reader decides to look
next? and (2) What determines when the reader moves his/her eyes (either
forward or backward in the text)?
Although there is not total consensus on these issues, there is some
evidence to suggest that decisions about where to fixate next and when to move
the eyes are made somewhat independently (Rayner & McConkie, 1976; Rayner
& Pollatsek, 1981). The earliest
unambiguous demonstration that the duration of the current fixation and the
length of the next saccade are computed on-line was provided by Rayner and
Pollatsek (1981). They varied physical
aspects of the text randomly from fixation to fixation and found that the
behavior of the eyes mirrored what was seen on a fixation. In their first experiment, they used the
moving window paradigm described above and varied the size of the window
randomly from fixation to fixation and found that saccade length varied
accordingly. Thus, if the window on the
current fixation was small, the eyes only moved a few characters, while if it
was large, the eyes moved further. In
their second experiment, they delayed the onset of text in the fovea via a mask
that appeared at the beginning of a fixation (with the time the mask was on varying
randomly from fixation to fixation) and found that fixation durations were
adjusted accordingly. In addition, the
manipulations affected saccade length and fixation duration independently; in
the first experiment, saccade length was affected, but fixation duration was
not, while in the second experiment, fixation duration was affected, but
saccade length was not. Thus, while the
decisions about where to fixate next and when to move the eyes may sometimes
overlap (see Rayner, Kambe, & Duffy, 2000), there is reason to believe the
two decisions are made somewhat independently.
2.8.A Where to fixate next. Decisions about where to fixate next seem to
be determined largely by low-level visual cues in the text, such as word length
and the spaces between words. Five
types of results are consistent with this claim. First, saccade length is influenced by the length of the fixated
word and the word to the right of fixation (Blanchard, Pollatsek, & Rayner,
1989; O’Regan, 1979, 1980; Rayner, 1979; Rayner & Morris, 1992). Second, when readers do not have information
about where the spaces are between upcoming words, saccade length decreases and
reading is slowed considerably (McConkie & Rayner, 1975; Morris, Rayner, &
Pollatsek, 1990; Pollatsek & Rayner, 1982; Rayner, Fischer, &
Pollatsek, 1998). Third, although there
is some variability in where the eyes land on a word, readers tend to make
their first fixation about halfway between the beginning and the middle of the
word (Rayner, 1979; McConkie, Kerr, Reddix, & Zola, 1988; McConkie, Kerr,
Reddix, Zola, & Jacobs, 1989; McConkie, Zola, Grimes, Kerr, Bryant, &
Wolff, 1991; Vitu, 1991). Recently,
Deutsch and Rayner (1999) demonstrated that the typical landing position in
Hebrew words is likewise between the beginning (i.e., right-most end) and
middle of a word. Rayner (1979)
originally labeled this prototypical location the preferred viewing location. This position where the eyes typically land
in a word is different from the optimal viewing location, which is the location
in the word at which recognition time is minimized. According to O’Regan and Levy-Schoen (1987), the optimal viewing
position is a bit to the right of the preferred viewing location, closer to the
center of the word. Fourth, while
contextual constraint influences skipping, in that highly predictable words are
skipped more than unpredictable words (Balota et al., 1985; Ehrlich &
Rayner, 1981), contextual constraint has little influence on where the eyes
land in a word (Rayner et al., 2001)5. Finally, the landing position on a word is modulated by the launch
site (McConkie et al., 1988; Radach & Kempe, 1993; Radach &
McConkie, 1998; Rayner, Sereno, & Raney, 1996) because the landing position
varies as a function of the distance from the prior fixation. As the launch site moves further from the
target word, the distribution of landing positions shifts to the left and
becomes more variable (see Fig. 2).

Figure 2. Landing site distribution as a function of the
saccade length between the launch site (wordn-1) and intended
saccade target (wordn). In
all three panels, the launch site and target words are depicted by rectangles,
with character spaces represented by numbers (as per convention, the space to
the left of wordn is denoted by a zero.) The landing site distributions are approximately Gaussian in
shape. Although the distributions are
centered near the middle of the saccade targets, the oculomotor system is
biased towards making saccades approximately seven character spaces in
length. This bias results in a
systematic range error; that is, the eyes tend to overshoot close targets and
undershoot more distant targets. For example,
in the middle panel, the intended saccade target is five character spaces from
the launch site, so that (on average) the eyes overshoot their intended target,
thereby causing the landing site distribution to shift towards the end of wordn. In the bottom panel, the opposite happens:
The eyes undershoot their target, causing the landing site distribution to
shift towards the beginning of wordn.
2.8.B When to move the eyes. The ease or difficulty associated with
processing a word primarily influences when the eyes move. Although a case can be made that low-level non-linguistic
factors can also influence the decision about when to move the eyes, the bulk
of the evidence suggests that linguistic properties of words are the major
determiner of when to move. A very
robust finding is that readers look longer at low-frequency words than at
high-frequency words (Altarriba, Kroll, Sholl, & Rayner, 1996; Henderson
& Ferreira, 1990, 1993; Hyönä & Olson, 1995; Inhoff & Rayner, 1986;
Just & Carpenter, 1980; Kennison & Clifton, 1995; Lavigne, Vitu, &
d’Ydewalle, 2000; Raney & Rayner, 1995; Rayner, 1977; Rayner & Duffy,
1986; Rayner & Fischer, 1996; Rayner & Raney, 1996; Rayner et al.,
1996; Rayner et al., 1998; Sereno, 1992; Vitu, 1991; Vitu, McConkie, Kerr,
& O’Regan, 2001). There are three
additional points with respect to this finding that are relevant. First, there is a spillover effect
associated with fixating a low-frequency word; that is, fixation time on the
next word is inflated (Rayner & Duffy, 1986). Second, although the duration of the first fixation on a word is
influenced by the frequency of that word, the duration of the prior fixation is
not (Carpenter & Just, 1980; Henderson & Ferreira, 1993; Rayner et al.,
1998). Third, high-frequency words are
skipped more than low-frequency words, particularly when they are short and the
reader is fixated close to the beginning of the word (O’Regan, 1979; Rayner et
al., 1996).
A second important finding is that there is a
predictability effect on fixation time in addition to a frequency effect. Words that are highly predictable from the
preceding context are fixated for less time than are words that are not so
constrained (Altarriba et al., 1996; Balota et al., 1985; Binder et al., 1999;
Ehrlich & Rayner, 1981; Inhoff, 1984; Lavigne et al., 2000; Rayner et al.,
2001; Rayner & Well, 1996; Schustack et al., 1987; Zola, 1984). Generally, the strongest effects of
predictability are not as large as those of the strongest frequency
effects. Also, as we noted above,
predictability has a strong effect on word skipping: Words that are highly
predictable from the prior context are skipped more than words that are not so
constrained.
2.9 Measures of processing time. To investigate the components of reading,
researchers typically have subjects read sentences or passages of text while an
eye tracker interfaced with a computer records the locations and durations of
individual fixations. Because an
average college-level reader can read approximately 300 words per minute
(Rayner & Pollatsek, 1989), this technique produces a staggering amount of
data. Accordingly, the data are usually
reduced to word-based measures, which are across-subject averages that
reflect how often and for how long individual words are fixated. A number of word-based measures are standard
(Inhoff & Radach, 1998; Liversedge & Findlay, 2000; Rayner, 1998;
Rayner, Sereno, Morris, Schmauder, & Clifton, 1989; Starr & Rayner,
2001). The first is gaze duration,
which is defined as the sum of all fixations on a word, excluding any fixations
after the eyes have left the word (i.e., including only refixations
before the eyes move on to another word).
Gaze duration is usually averaged only over words that are not skipped
during the initial encounter (or first pass) through that region of
text. Two other common measures are first-fixation
duration and single-fixation duration. The former is the duration of the first fixation on a word (again
conditional on the word being fixated during the first pass through the text),
while the latter is the average fixation duration on words that are fixated
exactly once during the first pass.
These indices are typically reported along with indices of how often a
word was fixated: The probability of a word being skipped, fixated once, and
fixated more than once before moving to another word. Often, the total time (the sum of all fixations on the
word, including regressions back to the word) is also reported.
The word-based measures provide a complete record of
where and when fixations occurred.
These two aspects (where vs. when) also provide a useful framework for
organizing a discussion of reading models because much of the controversy
surrounding reading concerns the determinants of where and how long the eyes
remain fixated. The models that have
been developed to explain eye movement control form a continuum, extending from
models in which eye movements are determined primarily by oculomotor factors (oculomotor
models) to those in which eye movements are guided by some form of
cognitive control (processing models).
Prior to comparing different models, we will discuss our model, E-Z
Reader (Pollatsek, Rayner, Fischer, & Reichle, 1999; Rayner, Reichle, &
Pollatsek, 1998, 2000; Reichle et al., 1998, 1999; Reichle & Rayner, 2001)
in some detail. We will also provide an
updated version of the model (E-Z Reader 7).
3.0 E-Z Reader
E-Z Reader is a processing model, and extends the earlier
work of Morrison (1984). Morrison drew
much of the inspiration for his model from the work of Becker and Jürgens
(1979) and McConkie (1979). McConkie (1979)
suggested that, during reading, visual attention progressed across a line of
text until the limitations of the visual system made it difficult to extract
further lexical information; once this point of difficulty has been
established, attention shifts and an eye movement is programmed and
subsequently initiated, sending the eyes to the problematic location. Although elegantly simple, the model was
soon discarded due to problems in defining and explaining what the point of
difficulty was, how it might be computed, and whether it could be computed soon
enough to be of any use in skilled reading (Rayner & Pollatsek, 1989).
The limitations inherent in McConkie’s (1979) early model
of eye movement control led Morrison (1984) to propose a model in which the
movement of the eyes was a function of successful processing. According to Morrison, the identification
of wordn (i.e., the word that is currently being fixated) causes the
attention “spotlight” (Posner, 1980) to move to wordn+1, which in
turn causes the oculomotor system to begin programming a saccade to wordn+1. If the program finishes before wordn+1
is identified, then the saccade will be executed and the eyes will move to wordn+1. However, if wordn+1 is identified
before the program finishes, the saccade to wordn+1 may be
canceled. Cancellation can occur some
of the time when attention shifts to wordn+2 while wordn
is fixated. In this case, the
oculomotor system begins programming a saccade to wordn+2, which
overrides the program to move the eyes to wordn+1 if the new program
interrupts the old program soon enough.
Thus, according to Morrison, attention moves serially, from word to
word, whereas saccades can be programmed in parallel.
Morrison’s (1984) assumption about the parallel
programming of saccades followed Becker and Jürgens’ (1979) demonstration that
saccadic programming is completed in two stages: An initial, labile stage that
is subject to cancellation, and an ensuing, non-labile stage in which the program
cannot be canceled. Their results
suggested that if the oculomotor system begins programming a saccade while
another saccadic program is in its labile stage of development, then the first
program is aborted. However, if the
second program is initiated while the first saccadic program is in its
non-labile stage, then both saccades will be executed, which typically results
in a very short fixation between the two saccades.
With these simple assumptions, Morrison (1984) was able
to provide an elegant account of both frequency effects and parafoveal preview
effects: Because short frequent words are more easily identified in the
parafovea than long infrequent words, the former tend to be fixated for less
time (and skipped more often) than the latter.
Despite its successes, however, Morrison’s model cannot explain
refixations because the strictly serial attention shifts mean that each word is
either fixated exactly once or is skipped.
More fundamentally, however, because Morrison’s model posits both that processing
of words is strictly serial and that attention shifting is time-locked to word
identification, the model is unable to handle some simple and robust phenomena
in reading. The first, as we noted
above, is that one often gets “spillover” effects due to word frequency (e.g.,
Rayner & Duffy, 1986). That is,
lower frequency words often not only cause longer fixations on that word (wordn),
but also lengthen either gaze durations and/or first fixations on the
succeeding word (wordn+1).
According to Morrison’s model, this shouldn’t happen because attention
doesn’t shift until wordn has been processed. Because parafoveal processing on wordn+1
begins after this attention shift, the amount of information extracted
from wordn+1 before it is fixated will only be a function of how
long it takes to program and execute the saccade, and will not vary as a
function of the frequency of wordn.
As a result, Morrison’s model predicts no delayed effects of word
frequency (or any other delayed effects of word processing difficulty). A related phenomenon (Henderson &
Ferreira, 1990; Kennison & Clifton, 1995) is that the benefit gained
through parafoveal preview decreases as foveal processing becomes more
difficult (e.g., because the fixated word is lower frequency). By essentially the same argument as above,
Morrison’s model predicts that this shouldn’t happen because parafoveal preview
time is only a function of the latency of moving the eyes after covert
attention has shifted.
There are at least three ways to circumvent the
limitations of Morrison’s (1984) model.
The first is to add the assumption that if word identification is not
completed by a processing deadline, attention does not shift to the next word,
but instead remains on the current word, resulting in a refixation (Henderson
& Ferreira, 1990; Sereno, 1992).
This leads to the prediction (which has not been supported; Rayner et
al., 1996; Schilling et al., 1998) that the first of two fixations should be
longer than single fixations because the former reflect cases in which the
processing deadline must have been reached.
The second solution is to simply assume that difficulties with
higher-order linguistic processing somehow cause the eyes to remain on the
current word (Pollatsek & Rayner, 1990; Rayner & Pollatsek, 1989). Unfortunately, how this happens has not been
well specified. Finally, a third way to
avoid the shortcomings of Morrison’s proposal is to assume that word identification
is completed in two stages. This last
approach is instantiated by E-Z Reader, which is discussed next.
3.1 Overview of the E-Z Reader model. E-Z Reader, like other processing models,
makes the basic assumption that on-going cognitive (i.e., linguistic)
processing influences eye movements during reading. Because the model was not intended to be a deep explanation of
language processing, it does not account for the many effects of higher-level
linguistic processing on eye movements (for reviews, see Rayner, 1998; Rayner
& Sereno, 1994; Rayner et al., 1989).
Although this is clearly a limitation, it should also be noted that many
of these effects typically occur when the reader is having difficulty
understanding the text that is being read, such as when a reader makes a
regression to re-interpret a syntactically ambiguous “garden path” sentence
(Frazier & Rayner, 1982). The model
can therefore be viewed as the “default” reading process. That is, we view the process of identifying
words to be the forward “driving engine” in reading, as the process of knitting
the words into larger units of syntax or meaning would be too slow (whether
successful or not) to be a signal to decide how and when to move the eyes
forward for skilled readers. Thus, we
posit that higher-order processes intervene in eye movement control only when
“something is wrong” and either send a signal to stop moving forward or a
signal to execute a regression. Hence,
we view E-Z Reader as an explanation of what happens during reading when
higher-level linguistic processing is running smoothly and doesn’t intervene. One implication of this is that the model
currently does not explain inter-word regressions.
Like its immediate predecessors (see Reichle et al.,
1998, 1999), E-Z Reader 7 consists of a small number of perceptual-motor and
cognitive processes that determine when and where the eyes move during
reading. Figure 3 is a schematic
diagram showing the flow of control among these processes. As is evident in the figure, the central
assumptions of the model are that: (1) a stage of word identification is the signal
to move the eyes; and (2) attention is allocated from one word to the next in a
strictly serial fashion. Notice,
however, that both visual encoding limitations and oculomotor constraints also
play central roles in the moment-by-moment control of eye movements during
reading. In the discussion that
follows, we will describe the specific assumptions of our model and how they
are related to four major cognitive and perceptual-motor systems: Visual
processing, word identification, attention, and oculomotor control.

Figure 3. A schematic diagram of E-Z Reader 7. Visual features on the printed page are
projected from the retina to an early stage of visual processing, which then proceeds
at a rate that is modulated by visual acuity limitations. The low-spatial frequency information (e.g.,
word boundaries) is used by the oculomotor system to select the targets of
upcoming saccades. High-spatial
information is passed on to the word identification system, which, though
attentional selection, allows individual words to be identified by the word
identification system. The first stage
of lexical processing (L1) signals the oculomotor systems to
begin programming a saccade to the next word.
The completion of the second stage of word identification (L2)
causes attention to shift to the next word.
Saccadic programming is thus decoupled from the shifts of
attention. Saccadic programming is
completed in two stages: The first, labile stage (M1) can be
canceled by the initiation of subsequent programs; the second, non-labile stage
(M2) is not subject to cancellation. Saccades are executed immediately after the
non-labile stage of saccadic programming has been completed. Black lines represent the flow of visual
information, with the dashed line representing the low-spatial frequency
information that is used by the oculomotor system to select the target
locations of upcoming saccades. The
gray lines represent signals that are propagated among the various components
of the model (e.g., the signal to shift attention).
3.1.A. (Early) visual
processing. Visual features from
the printed page are projected from the retina to the visual cortex so that the
objects on the page (i.e., the individual words) can be identified. The earliest stages of visual processing are
thought to be pre-attentive in that the features that make up individual words
are not fully integrated into perceptual wholes (Lamme & Roelfsema, 2000;
Wolfe & Bennett, 1996). This processing
is not instantaneous, with neural transmission from retina to brain taking
approximately 90 ms to complete.
In our model, the preceding ideas are formalized by including the early processing stage in the visual system, which, though pre-attentive, is subject to visual acuity constraints (see Figure 3). The duration of this early visual processing stage, t(V), is a free parameter that corresponds to the base time needed for neural transmission to propagate from the retina to those cortical and subcortical areas that mediate early visual processing. To keep this assumption psychologically plausible, the value of t(V) was set equal to 90 ms. However, because the rate of this early stage of processing is modulated by visual acuity, the rate at which a word is encoded is inversely proportional to both its length and its mean distance from the point of fixation. More specifically, during each fixation, the amount of early visual processing (in ms) that is completed on each word in the visual field is determined by:
(1) visual processing = t / (eSi½letter i – fixation½ / N)
In Equation 1, t is the duration of the fixation (in ms), N is the number of letters in a word being processed, and e (= 1.08) is a free parameter6 that modulates the affects of the spatial disparity between each word’s letters and the fixation location (i.e., the center of the fovea). Thus, the time needed to encode a word increases as the distance between its center and the fovea increases. Moreover, the time needed to encode a word also increases with its length because the individual letters of long words will (on average) be further away from the point of fixation than will the individual letters of short words7. One interesting implication of this equation is that the early visual processing of a word will be most rapid if the word is fixated near its center because a fixation on a word’s center will minimize the mean spatial deviations between the fixation and each of the word’s letters. This property is also consistent with evidence that word identification is most rapid if the word is fixated near its center (or optimal viewing position; O’Regan, 1990, 1992; O’Regan & Lévy-Schoen, 1987; Vitu, O’Regan, & Mittau, 1990) and provides one explanation for why the eyes are seemingly directed towards this location during reading (see Shillcock et al., 2000). It also allows the model to account for length effects (i.e., the finding that long words take longer to identify than short words; Just & Carpenter, 1980).
Early visual processing is important for two other reasons. First, it is necessary to obtain the word-boundary information that is needed to program saccades to upcoming words. This is denoted in Figure 3 by the dashed arrow that extends from early visual processing to the labile stage of saccadic programming. This arrow represents the flow of low-spatial frequency information that is acquired in the visual periphery (e.g., word boundaries, the presence/absence of ascenders and decenders, etc.). The oculomotor system uses this information to program saccades to upcoming words. Second, early visual processing provides the information that is subsequently used by higher-level visual areas to focus the attention “spotlight” and identify individual words. Word identification (which is discussed in the next section) must therefore wait until the early visual encoding of that word has been completed.
3.1.B Word identification. The process of identifying a word begins as soon as attention is focused on that word. This identification process is then completed in two stages, reflecting early and late stages of lexical processing. The first stage corresponds to being at (or at least close to) the identification of the orthographic form of the word. We assume that this is not full lexical access, as the phonological and semantic forms of the word are not yet fully activated. We labeled this process the “familiarity check” (i.e., f) in earlier versions of the model, but in E-Z Reader 7 it is simply referred to as the first stage of lexical access (i.e., L1).
The second stage of word identification involves the identification of a word’s phonological and/or semantic forms so as to enable additional linguistic processing. This second stage, therefore, more-or-less corresponds to what is typically thought to be “lexical access.” In prior versions of our model, this stage of word identification was called the “completion of lexical access” (i.e., lc). To avoid confusion, however, we will simply refer to this process as the second stage of lexical access (i.e., L2) in E-Z Reader 7.
The distinction between early and late stages of lexical processing has precedent in the literature; indeed, our distinction was partly motivated by the activation-verification model of lexical access (Paap et al., 1982). The two models are broadly consistent if one conceptualizes the first stage of lexical access as a “quick and dirty” assessment of whether or not word identification is imminent, and the second stage as being the actual act of identification. As indicated in Figure 3, this distinction is also important because the two stages of lexical processing play unique functional roles: The completion of the first stage of lexical access causes the oculomotor system to begin programming the next saccade, while the completion of the second stage causes the “spotlight” of attention to shift to the next word. Thus, in E-Z Reader, saccadic programming is de-coupled from the shifting of attention.
As with earlier versions of our model, the time (in ms) required to complete the first stage of lexical access on a word, t(L1), is a linear function of the natural logarithm of the word’s normative frequency of occurrence in printed text and its predictability within a given sentence context. The mathematical statement of this relationship is given by Equation 2:
(2) t(L1) = [b1 – b2 ln(frequency)] (1 – q predictability)
In Equation 2, b1 and b2 (= 228 and 10 ms,
respectively) are free parameters that control how a word’s normative frequency
(number of occurrences per million, as tabulated by Francis & Kučera,
1982) affect lexical processing time.
This time is also modulated by the right-hand term, in which the free
parameter q (= 0.5) attenuates the degree
to which a word’s predictability in a specific sentence context (as estimated
using cloze task probabilities) attenuates the lexical processing time8. In all of the simulations reported below, the
actual times needed to complete the first stage of lexical processing was found
by sampling from gamma distributions having means equal to t(L1)
and standard deviations equal to 0.18 of their means.
The completion of the first
stage of lexical processing of a word has two immediate consequences in the
model: (1) it cues the oculomotor system to begin programming a saccade to the
next word (the details of how the oculomotor system does this will be discussed
in detail, below); (b) it initiates further processing of the word. Because all (or at least most) of the
orthographic coding has been completed in L1, the time
required to complete the second stage of lexical processing, L2,
is more influenced by a word’s predictability.
This distinction is reflected in Equation 3:
(3) t(L2) = D
[b1 – b2 ln(frequency)] (1 –
predictability)
As in Equation 2, the free parameters b1 and b2 control the degree to which a word’s frequency of occurrence affects the time necessary to process the word, but this quantity is attenuated by the free parameter D (= 0.5). Note that, in contrast to L1, a word’s predictability fully affects L2; that is, words that can be predicted with complete certainty within a given sentence context will require no time in this second stage [i.e., if predictability = 1, then t(L2) = 0 ms]. Such cases reflect the situation when top-down information has already fully activated the semantic and phonological codes given reasonable corroborating input from orthography. As was the case with the first stage of lexical processing, the actual process durations were sampled from gamma distributions.
Finally, it should be mentioned that—by adding the early visual processing stage to E-Z Reader 7—the minimal time needed to identify words in the model is very plausible. Given the parameter values reported above, for example, the mean time needed to identify the word “the” (the most frequent word in English text) when it is centrally fixated and in a completely predictable context is 148 ms, while the time needed to identify the lowest frequency words in completely unpredictable contexts is 432 ms. In contrast, E-Z Reader 6 predicted minimal and maximal mean word identification times of 16 and 278 ms, respectively. E-Z Reader 7 thus predicts word identification latencies that are much more in line with the best available estimates: 150-300 ms (Rayner & Pollatsek, 1989).
3.1.C Attention. A central, and perhaps the most contentious, assumption of E-Z Reader is that covert shifts of attention occur serially, from one word to the next, as each word is identified in turn and then integrated into the discourse representation. By “attention,” though, we do not mean spatial orientation; instead, we refer to the process of integrating features that allows individual words to be identified. The separation between these two types of attention has considerable precedence in the literature (LaBerge, 1990). For example, Treisman (1969) distinguished between input selection, or spatial orientation, and analyzer selection, or feature integration. This distinction is important because spatial orientation shifts towards the targets of upcoming saccades (Hoffman & Subramaniam, 1995; however, see Stelmach, Campsall, & Herdman, 1997), which in E-Z Reader occur whenever the oculomotor system uses the low-spatial frequency information provided by the visual processing stage to program a saccade (see the dashed line in Fig. 3). These shifts in spatial orientation, however, are decoupled from the shifts in attention (i.e., analyzer selection) that precede lexical processing.
Attention is allocated serially during reading because readers need to keep word order straight (Pollatsek & Rayner, 1999). By shifting the focus of attention from one word to the next, readers identify and process each word in its correct order. Although the results of several recent experiments (Kennedy, 1998, 2000; Kennedy, Ducrot, & Pynte, 2002; Inhoff, Starr, & Shindler, 2000; Starr & Inhoff, 2002) suggest that properties of two words (particularly visual/orthographic properties) can sometimes be encoded in parallel, we suspect that this does not usually occur in normal reading (see Rayner, White, Kambe, Miller, & Liversedge, 2002, for an extended discussion of these issues). The reason for this is that much of the information that is conveyed by language (both written and spoken) is heavily dependent upon word order.
Furthermore, by decoupling eye movements from attention, our model can also explain aspects of eye movement control that Morrison’s (1984) model could not. For example, E-Z Reader can explain why parafoveal preview benefit decreases as foveal processing difficulty increases (Henderson & Ferreira, 1990; Kennison & Clifton, 1995). If the eyes are on wordn, parafoveal processing of wordn+1 begins, not with completion of the first stage of lexical processing of wordn, but after the completion of second stage. Because parafoveal processing of wordn+1 ends (by definition) with the onset of the saccade to wordn+1, more time will remain for parafoveal processing of wordn+1 when wordn is easy to process (e.g., high-frequency). This is depicted in Figure 4: The time required to complete L1 and L2 on wordn increases as its normative frequency decreases (see Equations 2 and 3). Because the saccadic latency is not modulated by word frequency, a saccade will (on average) occur 240 ms (i.e., the mean saccadic latency) after the completion of L1. This means that, with everything else being equal, the amount of time available to process wordn+1 in the parafovea will increase as the amount of time needed to process wordn decreases.

Figure 4. A diagram showing how parafoveal preview
benefit is modulated by normative word frequency. The bottom line represents the time required to complete the
first stage of lexical processing, t(L1), as a
function of the natural logarithm of wordn’s token frequency. The middle line represents the time required
to complete the second stage of lexical processing, t(L2),
on wordn. Finally, the top
line represents the saccadic latency, or time required to initiate a saccade
from wordn to wordn+1.
On average, the saccadic latency requires a constant t(M1)
+ t(M2) ms to complete (starting from the point in
time when the first stage of lexical processing on wordn has been
completed). In E-Z Reader, parafoveal
preview begins as soon as wordn has been identified and attention
has shifted to wordn+1. The
parafoveal preview is therefore limited to the duration of the interval
(depicted by the shaded area in the figure) between t(L2)
and t(M1) + t(M2). Notice that, because the relative disparity
between t(L1) and t(L2)
increases as the frequency of wordn decreases, the duration of the parafoveal
preview decreases with the frequency of wordn.
In the model, the serial-allocation-of-attention assumption is instantiated as follows: The completion of the second stage of lexical processing on wordn causes attention to shift to wordn+1, at which point the first stage of lexical processing begins on wordn+1 when pre-processing of wordn+1 is complete9. The identification of one word thus causes the focus of attention to shift so that the word-identification system can begin identifying the next word (see Fig. 3).
3.1.D Oculomotor control. Saccadic programming in E-Z Reader is
completed in two stages: An early, labile stage (M1) that is
subject to cancellation by subsequent programs, and a later, non-labile stage (M2)
that is not subject to cancellation.
This assumption was motivated by demonstrations that a saccade to a
first target can be cancelled by the presentation of a second to-be-fixated
target if the second target is presented prior to approximately 230 ms after
the first; after this time, both targets are typically fixated in sequence
(Becker & Jürgens, 1979). A
considerable amount of subsequent research has supported this distinction
between labile and non-labile stages of saccadic programming (Leff, Scott,
Rothwell, & Wise, 2001; McPeek, Skavenski, & Nakayama, 2000; Molker
& Fischer, 1999; Vergilino & Beauvillain, 2000).
During the first (labile) stage of saccadic
programming, the eye movement system is simply engaged (or made ready) so that
it can begin programming an eye movement.
The system then computes the distance between the current fixation
location and the location of the saccade target (i.e., the intended saccade
length). Thus, although the target
location is represented in terms of spatial coordinates, the saccadic program
is represented in terms of a distance metric.
This is necessary because the distance that is specified by the saccadic
program must ultimately be converted into the appropriate amount of force that
has to be exerted (by the extraocular muscles) to execute the actual
movement. The labile stage of
programming therefore consists of two sub-stages: (1) general system
preparation, followed by (2) a location-to-distance transformation, in which
the spatial location of the upcoming saccade target in converted into the
necessary saccade length. In E-Z
Reader, the time needed to complete the labile programming stage is a random
deviate that is sampled from a gamma distribution having a mean equal to a free
parameter, t(M1), with each of the two aforementioned
sub-stages subsuming half of this time.
An important part of our model is that,
when a saccade program is in the labile stage, it is subject to cancellation by
a subsequent saccadic program. If the
second program is initiated during the system preparation sub-stage of the
first program, then whatever amount of preparation has be done to ready the
oculomotor system will also be applicable to the second program, so that it
will be completed more rapidly than it otherwise would be. If, however, the second program is initiated
somewhat later, during the first program’s location-to-distance transformation
sub-stage, then whatever processing has been done to specify the distance of
the first saccade will not apply to the second because the target locations
(and hence distances) of the two saccades are different. This means that the second program will
always require a minimal amount of time to finish—the time necessary to convert
the spatial location of the saccade target into the intended saccade length.
During the second (non-labile) stage of
programming, the command to move the eyes a particular direction and distance
is communicated to the motor system. At
this point, an intended saccade is obligatory, and cannot be cancelled or
modified by subsequent programs. As
with the labile stage of programming, the time needed to complete the
non-labile stage of programming is sampled from a gamma distribution, with the
mean of this distribution being equal to a free parameter, t(M2). Upon completing the non-labile stage of
programming, the saccade is executed immediately.
In E-Z Reader 7, the mean times needed to
complete the labile, t(M1), and non-labile, t(M2),
stages of saccadic programming were set equal to 187 and 53 ms,
respectively. To keep the model as simple
as possible, the saccade durations were set equal to a fixed value: t(S)
= 25 ms10. Our saccadic-programming parameter values are consistent
with estimates from simple saccade latency tasks (Becker and Jürgens, 1979; McPeek
et al., 2000; Rayner et al., 1983). It
should be noted, however, that these values are in fact estimates of the minimal
time required to initiate a saccade, often to pre-specified targets; in
the context of reading text, therefore, the average saccadic latency may be
slightly longer in duration than would be suggested by these previous
estimates.
Let’s examine these assumptions using five
key situations in reading. The first
situation (that is shown schematically in Figure 5A) is the simplest: Wordn
is fixated, an eye movement is programmed to wordn+1, and no
subsequent eye movement command is made while this program is in its labile
stage. The program therefore enters its
non-labile stage, and an eye movement is made to wordn+1.
Now consider a second situation (Fig. 5B): Wordn is fixated, a program to fixate wordn+1 is initiated, but while the oculomotor system is being readied, a second program (to move the eyes to wordn+2) is initiated. In this case, the program to fixate wordn+1 is cancelled, and the saccade leaving wordn wi