Cowan, N. (2001) The Magical Number 4 in Short-term Memory: A Reconsideration of Mental Storage Capacity. Behavioral and Brain Sciences 24 (1): XXX-XXX.
This is the unedited final draft of a BBS target article that has been accepted for publication (Copyright 2000: Cambridge University Press) and is currently being circulated for Open Peer Commentary.
This preprint is for inspection only, to help prospective commentators decide whether or not they wish to prepare a formal commentary.
Please do not prepare a commentary unless you have received a formal invitation indicating that it has been possible to include you in the final list of invited commentators.
For information on becoming a commentator on this or other BBS target articles, write to bbs@soton.ac.uk
For information about subscribing or purchasing offprints of the published version, with commentaries and author's response, write to:
journals_subscriptions@cup.org
(North America)
journals_subscriptions@cup.cam.ac.uk
(All other countries).
THE MAGICAL NUMBER 4 IN SHORT-TERM MEMORY:
A RECONSIDERATION OF MENTAL STORAGE CAPACITY
Nelson Cowan
Department of Psychology
University of Missouri
210 McAlester Hall
Columbia, MO 65211, USA
CowanN@missouri.edu
http://web.missouri.edu/~psycowan
Nelson Cowan
(Ph.D. 1980, University of Wisconsin - Madison) is
Middlebush Professor of the Social Sciences, in the Department of
Psychology at the University of Missouri - Columbia. He has authored
one book (Cowan, N., 1995, Attention and memory: An integrated
framework, Oxford University Press) and edited another (1997, The
development of memory in childhood, Psychology Press), and has 100
other publications on working memory, its development, and its relation
to attention. He is former Associate Editor of the Journal of
Experimental Psychology: Learning, Memory, and Cognition (1995-1999)
and won the 1998 University of Missouri Chancellor Award for Research
and Creative Activities.
Abstract
Miller (1956) summarized evidence that people can remember about 7 chunks
in short-term memory (STM) tasks. However, that number was meant more as
a rough estimate and a rhetorical device than as a real capacity limit.
Others have since suggested that there is a more precise capacity limit,
but that it is only 3 to 5 chunks. The present target article brings together
a wide variety of data on capacity limits suggesting that the smaller capacity
limit is real. Capacity limits will be useful in analyses of information
processing only if the boundary conditions for observing them can
be carefully described. Four basic conditions in which chunks can be identified
and capacity limits can accordingly be observed are: (1) when information
overload limits chunks to individual stimulus items, (2) when other steps
are taken specifically to block the recoding of stimulus items into larger
chunks, (3) in performance discontinuities caused by the capacity limit,
and (4) in various indirect effects of the capacity limit. Under these
conditions, rehearsal and long-term memory cannot be used to combine stimulus
items into chunks of an unknown size; nor can storage mechanisms that are
not capacity-limited, such as sensory memory, allow the capacity-limited
storage mechanism to be refilled during recall. A single, central capacity
limit averaging about 4 chunks is implicated along with other, non-capacity-limited
sources. The pure STM capacity limit expressed in chunks is distinguished
from compound STM limits obtained when the number of separately
held chunks is unclear. Reasons why pure capacity estimates fall within
a narrow range are discussed and a capacity limit for the focus of attention
is proposed.
Keywords:
Attention, enumeration, information chunks., memory capacity, processing
capacity, processing channels, serial recall, short-term memory, storage
capacity, verbal recall, working memory capacity, working memory.
1. Introduction to the Problem of Mental Storage Capacity
One of the central contributions of cognitive psychology has been to explore limitations in the human capacity to store and process information. Although the distinction between a limited-capacity primary memory and an unlimited-capacity secondary memory was described by James (1890), Miller's (1956) theoretical review of a "magical number seven, plus or minus two" is probably the most seminal paper in the literature for investigations of limits in short-term memory (STM) storage capacity. It was, in fact, heralded as one of the most influential Psychological Review papers ever, in a 1994 centennial issue of the journal. Miller's reference to a magical number, however, was probably just a rhetorical device. A more central focus of his article was the ability to increase the effective storage capacity through the use of intelligent grouping or "chunking" of items. He ultimately suggested that the specific limit of 7 probably emerged as a coincidence.
Over 40 years later, we are still uncertain as to the nature of storage capacity limits. According to some current theories there is no limit in storage capacity per se, but a limit in the duration for which an item can remain active in STM without rehearsal (e.g., Baddeley, 1986; Richman, Staszewski, & Simon, 1995). This has led to a debate about whether the limitation is a "magic number or magic spell" (Schweickert & Boruff, 1986) or whether rehearsal really plays a role (Brown & Hulme, 1995). One possible resolution is that the focus of attention is capacity-limited whereas various supplementary storage mechanisms, which can persist temporarily without attention, are time-limited rather than capacity-limited (Cowan, 1988, 1995). Other investigators, however, have long questioned whether temporary storage concepts are necessary at all, suggesting that the rules of learning and memory could be identical in both the short and long term (Crowder, 1993; McGeoch, 1932; Melton, 1963; Nairne, 1992; Neath, 1998).
At present, the basis for believing that there is a time limit to STM is controversial and unsettled (Cowan, Saults, & Nugent, 1997; Cowan, Wood, Nugent, & Treisman 1997; Crowder, 1993; Neath & Nairne, 1995; Service, 1998). The question is nearly intractable because any putative effect of the passage of time on memory for a particular stimulus could instead be explained by a combination of various types of proactive and retroactive interference from other stimuli. In any particular situation, what looks like decay could instead be displacement of items from a limited-capacity store over time. If the general question of whether there is a specialized STM mechanism is to be answered in the near future then, given the apparent unresolvability of the decay issue, general STM questions seem more likely to hinge on evidence for or against a chunk-based capacity limit.
The evidence regarding this capacity limit also has been controversial. According to one view (Wickens, 1984) there is not a single capacity limit, but several specialized capacity limits. Meyer and Kieras (1997) questioned the need for capacity limits to explain cognitive task performance; they instead proposed that performance scheduling concerns (and the need to carry out tasks in the required order) account for apparent capacity limits. The goal of this target article is to provide a coherent account of the evidence on storage capacity limits to date.
One reason why a resolution may be needed is that, as mentioned above, the theoretical manifesto announcing the existence of a capacity limit (Miller, 1956) did so with considerable ambivalence toward the hypothesis. Although Miller's ambivalence was, at the time, a sophisticated and cautious response to available evidence, a wealth of subsequent information suggests that there is a relatively constant limit in the number of items that can be stored in a wide variety of tasks; but that limit is only 3 to 5 items as the population average. Henderson (1972, p. 486) cited various studies on the recall of spatial locations or of items in those locations, conducted by Sperling (1960), Sanders (1968), Posner (1969), and Scarborough (1971), to make the point that there is a "new magic number 4 + 1." Broadbent (1975) proposed a similar limit of 3 items on the basis of more varied sources of information including, for example, studies showing that people form clusters of no more than 3 or 4 items in recall. A similar limit in capacity was discussed, with various theoretical interpretations, by others such as Halford, Maybery, and Bain (1988), Halford, Wilson, and Phillips (1998), Luck and Vogel (1997), and Schneider and Detweiler (1987).
The capacity limit is open to considerable differences of opinion and interpretation. The basis of the controversy concerns the way in which empirical results should be mapped onto theoretical constructs. Those who believe in something like a 4-chunk limit acknowledge that it can be observed only in carefully constrained circumstances. In many other circumstances, processing strategies can increase the amount that can be recalled. The limit can presumably be predicted only after it is clear how to identify independent chunks of information. Thus, Broadbent (1975, p. 4) suggested that "The traditional seven arises ... from a particular opportunity provided in the memory span task for the retrieval of information from different forms of processing."
The evidence provides broad support for what can be interpreted as a capacity limit of substantially fewer than Miller's 7 + 2 chunks; about 4 chunks on the average. Against this 4-chunk thesis, can delineate at least 7 commonly held opposing views: (1) There are capacity limits but that they are in line with Miller's 7+2 (e.g., still taken at face value by Lisman & Idiart, 1995). (2) Short-term memory is limited by the amount of time that has elapsed rather than by the number of items that can be held simultaneously (e.g., Baddeley, 1986). (3) There is no special short-term memory faculty at all; all memory results obey the same rules of mutual interference, distinctiveness, etc. (e.g., Crowder, 1993). (4) There may be no capacity limits per se but only constraints such as scheduling conflicts in performance and strategies for dealing with them (e.g., Meyer & Kieras, 1997). (5) There are multiple, separate capacity limits for different types of material (e.g., Wickens, 1984). (6) There are separate capacity limits for storage versus processing (Daneman & Carpenter, 1980; Halford et al., 1998). (7) Capacity limits exist, but they are completely task-specific, with no way to extract a general estimate. (This may be the "default" view today.) Even among those who agree with the 4-chunk thesis, moreover, a remaining possible ground of contention concerns whether all of the various phenomena that I will discuss are legitimate examples of this capacity limit.
These seven competing views will be re-evaluated in Section 4 (4.3.1-4.3.7). The importance of identifying the chunk limit in capacity is not only to know what that limit is, but more fundamentally to know whether there is such a limit at all. Without evidence that a consistent limit exists, the concepts of chunking and capacity limits are themselves open to question.
1.1 Pure capacity-based and compound STM estimates. I will call the maximum number of chunks that can be recalled in a particular situation as the memory storage capacity, and valid, empirically obtained estimates of this number of chunks will be called estimates of capacity-based STM. Although that chunk limit presumably always exists, it is sometimes not feasible to identify the chunks inasmuch as long-term memory information can be used to create larger chunks out of smaller ones (Miller, 1956), and inasmuch as time- and interference-limited sources of information that are not strictly capacity-limited may be used along with capacity-limited storage to recall information. In various situations, the amounts that can be recalled when the chunks cannot be specified, or when the contribution of non-capacity-limited mechanisms cannot be assessed, will be termed compound STM estimates. These presumably play an important role in real-world tasks such as problem-solving and comprehension (Daneman, & Merikle, 1996; Logie, Gilhooly, & Wynn, 1994; Toms, Morris, & Ward, 1993). However, the theoretical understanding of STM can come only from knowledge of the basic mechanisms contributing to the compound estimates, including the underlying capacity limit. The challenge is to find sound grounds upon which to identify the pure capacity-based limit as opposed to compound STM limits.
1.2. Specific conditions in which a pure storage capacity limit can be observed. It is proposed here that there are at least four ways in which pure capacity limits might be observed: (1) when there is an information overload that limits chunks to individual stimulus items, (2) when other steps are taken specifically to block the recoding of stimulus items into larger chunks, (3) when performance discontinuities caused by the capacity limit are examined, and (4) when various indirect effects of the capacity limit are examined. Multiple procedures fit under each of these headings. For each of these, the central assumption is that the procedure does not enable subjects to group items into higher-order chunks. Moreover, the items must be familiar units with no pre-existing associations that could lead to the encoding of multi-object groups, ensuring that each item is one chunk in memory. Such assumptions are strengthened by an observed consistency among results.
The first way to observe clearly limited-capacity storage is to overload the processing system at the time that the stimuli are presented, so that there is more information in auxiliary or time-limited stores than the subject can rehearse or encode before the time limit is up. This can be accomplished by presenting a large spatial array of stimuli (e.g., Sperling, 1960) or by directing attention away from the stimuli at the time of their presentation (Cowan, Nugent, Elliott, Ponomarev, & Saults, 1999). Such manipulations make it impossible during the presentation of stimuli to engage in rehearsal or form new chunks (by combining items and by using long-term memory information), so that the chunks to be transferred to the limited-capacity store at the time of the test cue are the original items presented.
The second way is with experimental conditions designed to limit the long-term memory and rehearsal processes. For example, using the same items over and over on each trial and requiring the recall of serial order limits subjects' ability to think of ways to memorize the stimuli (Cowan, 1995); and rehearsal can be blocked through the requirement that the subject repeat a single word over and over during the stimulus presentation (Baddeley, 1986).
The third way is to focus on abrupt changes or discontinuities in basic indices of performance (proportion correct and reaction time) as a function of the number of chunks in the stimulus. Performance on various tasks takes longer and is more error-prone when it involves a transfer of information from time-limited buffers, or from long-term memory, to the capacity-limited store than when it relies on the contents of capacity-limited storage directly. This results in markedly less accurate and/or slower performance when more than 5 items must be held than when fewer items must be held (e.g., in enumeration tasks such as that discussed by Mandler & Shebo, 1982).
Fourth, unlike the previous methods, which have involved an examination of the level of performance in the memory task, there also are indirect effects of the limit in capacity. For example, lists of items tend to be grouped by subjects into chunks of about 4 items for recall (Broadbent, 1975; Graesser & Mandler, 1978), and the semantic priming of one word by another word or learning of contingencies between the words appears to be much more potent if the prime and target are separated by about 3 or fewer words (e.g., McKone, 1995).
1.2.1. Other restrictions on the evidence. Although these four methods can prevent subjects from amalgamating stimuli into higher-order chunks, the resulting capacity estimates can be valid only if the items themselves reflect individual chunks, with strong intra-chunk associations and weak or (ideally) absent inter-chunk associations. For example, studies with nonsense words as stimuli must be excluded because, in the absence of pre-existing knowledge of the novel stimulus words, each word may be encoded as multiple phonemic or syllabic subunits with only weak associations between these subunits (resulting in an underestimate of capacity). As another example, sets of dots forming familiar or symmetrical patterns would be excluded for the opposite reason, that multiple dots could be perceived together as a larger object with non-negligible inter-dot associations, so that each dot would not be a separate chunk (resulting in an overestimate of capacity). It also is necessary to exclude procedures in which the central capacity's contents can be recalled and the capacity then re-used (e.g., if a visual array remains visible during recall) or, conversely, in which the information is not available long enough or clearly enough for the capacity to be filled even once (e.g., brief presentation with a mask). In Section 3, converging types of evidence will be offered as to the absence of inter-item chunking in particular experimental procedures (e.g., a fixed number of items correctly recalled regardless of the list or array size).
Finally, it is necessary to exclude procedures in which the capacity limit must be shared between chunk storage and the storage of intermediate results of processing. One example of this is the "n-back task" in which each item in a continuous series must be compared with the item that occurred n items ago (e.g., Cohen et al., 1997; Poulton, 1954) or a related task in which the subject must listen to a series of digits and detect three odd digits in a row (Jacoby, Woloshyn, & Kelley, 1989). In these tasks, in order to identify a fixed set of the most recent n items in memory, the subject must continually update the target set in memory. This task requirement may impose a heavy additional storage demand. These demands can explain why such tasks remain difficult even with n = 3.
It may be instructive to consider a hypothetical version of the n-back task that would be taken to indicate the existence of a special capacity limit. Suppose that the subject's task were to indicate, as rapidly as possible, if a particular item had been included in the stimulus set previously. Some items would be repeated in the set but other, novel items also would be introduced. On positive trials, the mean reaction time should be much faster when the item had been presented within the most recent 3 or 4 items than when it was presented only earlier in the sequence. To my knowledge, such a study has not been conducted. However, in line with the expectation, probed recall experiments have resulted in shorter reaction times for the most recent few items (Corballis, 1967).
The present view is that a strong similarity in pure capacity limits (to about 4 chunks on average) can be identified across many test procedures meeting the above four criteria. The subcategories of methods and some key references are summarized in Table 1 (see section 3), and each area will be described in more detail in Section 3 of the review.
1.3. Definition of chunks. A chunk must be defined with respect to associations between concepts in long-term memory. I will define the term chunk as a collection of concepts that have strong associations to one another and much weaker associations to other chunks concurrently in use. (This definition is related to concepts discussed by Simon, 1974). It would be assumed that the number of chunks can be estimated only when inter-chunk associations are of no use in retrieval in the assigned task. To use a well-worn example inspired by Miller (1956), suppose one tries to recall the series of letters, "fbicbsibmirs." Letter triads within this sequence (FBI, CBS, IBM, and IRS) are well-known acronyms, and someone who notices that can use the information to assist recall. For someone who does notice, there are pre-existing associations between letters in a triad that can be used to assist recall of the 12-letter sequence. If we further assume that there are no pre-existing associations between the acronyms, then the four of them have to occupy limited-capacity storage separately to assist in recall. If that is the case, and if no other optional mnemonic strategies are involved, then successful recall of the 12-item sequence indicates that the pure capacity limit for the trial was at least 4 chunks. (In practice, within the above example there are likely to be associations between the acronyms. For example, FBI and IRS represent two U.S. government agencies, and CBS and IBM represent two large U.S. corporations. Such associations could assist recall. For the most accurate pure capacity-based limit, materials would have to be selected so as to eliminate such special associations between chunks.) Notice that the argument is not that long-term memory fails to be involved in capacity-based estimates. Long-term memory is inevitably involved in memory tasks. The argument is that the purest capacity estimates occur when long-term memory associations are as strong as possible within identified chunks and absent between those identified chunks.
If someone is given new material for immediate recall and can look at the material long enough before responding, new associations between the original chunks can be formed, resulting in larger chunks or, at least, conglomerates with nonzero associations between chunks. McLean and Gregg (1967, p. 455) provided a helpful description of chunks in verbal recall, as "groups of items recited together quickly," helpful because recall timing provides one good indication of chunking (see also Anderson & Matessa, 1997). McLean and Gregg (p. 456) described three ways in which chunks can be formed: "(a) Some stimuli may already form a unit with which S is familiar. (b) External punctuation of the stimuli may serve to create groupings of the individual elements. (c) The S may monitor his own performance and impose structure by selective attention, rehearsal, or other means."
The practical means to identify chunks directly is an important issue, but one that is more relevant to future empirical work than it is to the present theoretical review of already-conducted work, inasmuch as few researchers have attempted to measure chunks directly. Direct measures of chunks can include empirical findings of item-to-item associations that vary widely between adjacent items in a list, being high within a chunk and low between chunks; item-to-item response times that vary widely, being relatively short within a chunk and long between chunks; and subjective reports of grouping. For studies in which the main dependent measure is not overt recall, measures of chunking for a trial must follow the trial immediately if it cannot be derived from the main dependent measure itself. Schneider and Detweiler (1987, pp. 105-106) provide an excellent further discussion of how chunks can be identified through convergent measures.
For most of the research that will be summarized in Section 3 below,
however, the researchers provided no direct evidence of chunking or its
absence. The present assumption for these studies is that chunk size can
be reasonably inferred from the presence of the task demands described
above in Section 1.2, which should prevent inter-item chunking. The present
thesis is that the great similarity of empirically-based chunk limits derived
using these guidelines, reviewed in Section 3, supports their validity
because the guidelines yield a parsimonious, relatively uniform description
of capacity limits of 3 to 5 chunks as the population average (with a maximum
range of 2 to 6 chunks in individuals).
2. Theoretical Framework
The most important theoretical point here is the identification of conditions under which a capacity limit can be observed (see Section 1); reasons for this limit also are proposed. The theoretical model in this section provides a logical way to understand the empirical results presented in Section 3. A fuller analytic treatment, consideration of unresolved issues, and comparison with other approaches is provided in Section 4.
The basic assumptions of the present theoretical framework are (1) that the focus of attention is capacity-limited, (2) that the limit in this focus averages about four chunks in normal adult humans, (3) that no other mental faculties are capacity-limited, although some are limited by time and susceptibility to interference, and (4) that any information that is deliberately recalled, whether from a recent stimulus or from long-term memory, is restricted to this limit in the focus of attention. This last assumption depends on the related premise, from Baars (1988) and Cowan (1988, 1995), that only the information in the focus of attention is available to conscious awareness and report. The identification of the focus of attention as the locus of the capacity limit stems largely from a wide variety of research indicating that people cannot optimally perceive or recall multiple stimulus channels at the same time (e.g., Broadbent, 1958; Cowan, 1995), although most of that research does not provide estimates of the number of chunks from each channel that occupy the focus of attention at any moment. There is an additional notion that the focus of attention serves as a global workspace for cognition, as described for example by Cowan (1995, p. 203) as follows:
Attention clearly can be divided among channels, but under the assumption of the unity of conscious awareness, the perceived contents of the attended channels should be somehow integrated or combined. As a simple supporting example, if one is instructed to divide attention between visual and auditory channels, and one perceives the printed word "dog" and the spoken word "cat," there should be no difficulty in determining that the two words are semantically related; stimuli that can be consciously perceived simultaneously can be compared to one another, as awareness serves as a "global workspace" (Baars, 1988).
Cowan (1995) also suggested two other processing limits. Information in a temporarily heightened state of activation, yet not in the current focus of attention, was said to be time-limited. Also, the transfer of this activated information into the focus of attention was said to be rate-limited. Importantly, however, only the focus of attention was assumed to be capacity-limited. This assumption differs from approaches in which there are assumed to be multiple capacity limits (e.g., Wickens, 1984) or perhaps no capacity limit (Meyer & Kieras, 1997).
The assignment of the capacity limit to the focus of attention has parallels in previous work. Schneider and Detweiler (1987) proposed a model with multiple storage buffers (visual, auditory, speech, lexical, semantic, motor, mood, and context) and a central control module. They then suggested (p. 80) that the control module limited the memory that could be used: "50 semantic modules might exist, each specializing in a given class of words, e.g., for categories such as animals or vehicles. Nevertheless, if the controller can remember only the four most active buffers, the number of active semantic buffers would be effectively only four buffers, regardless of the total number of modules...Based on our interpretations of empirical literature, the number of active semantic buffers seems to be in the range of three to four elements."
The present analysis, based on Cowan (1988, 1995), basically agrees with Schneider and Detweiler, though with some differences in detail. First, it should be specified that the elements limited to four are chunks. (Schneider and Detweiler probably agreed with this, though it was unclear from what was written). Second, the justification for the particular modules selected by Schneider and Detweiler (or by others, such as Baddeley, 1986) is dubious. One can always provide examples of stimuli that do not fit neatly into the modules (e.g., spatial information conveyed through acoustic stimulation). Cowan (1988, 1995) preferred to leave open the taxonomy, partly because it is unknown and partly because there may in fact not be discrete, separate memory buffers. Instead, there could be the activation of multiple types of memory code for any particular stimulus, with myriad possible codes. The same general principles of activation and de-activation might apply across all types of code (e.g., the principle that interference with memory for an item comes from the activation of representations for other items with similar memory codes), making the identification of particular discrete buffers situation-specific and therefore arbitrary. Third, Cowan (1995) suggested that the focus of attention and its neural substrate differ subtly from the controller and its neural substrate, though they usually work together closely. In particular, for reasons beyond the scope of this target article, it would be expected that certain types of frontal lobe damage can impair the controller without much changing the capacity of the focus of attention, whereas certain types of parietal lobe damage would change characteristics of the focus of attention without much changing the controller (see Cowan, 1995). In the present analysis, it is assumed that the capacity limit occurs within the focus of attention, though the control mechanism is limited to the information provided by that focus.
In the next section, so as to keep the theoretical framework separate from the discussion of empirical evidence, I will continue to refer to evidence for a "capacity-limited STM" without reiterating that it is the focus of attention that presumably serves as the basis of this capacity limit. (Other, non-capacity-limited STM mechanisms that may be time-limited contribute to compound STM measures but not to capacity-limited STM.) Given the usual strong distinction between attention and memory (e.g., the absence of memory in the central executive mechanism as discussed by Baddeley, 1986), the suggested equivalence of the focus of attention and the capacity-limited portion of STM may require some getting used to by many readers. With use of the term "capacity-limited STM," the conclusions about capacity limits could still hold even if it were found that the focus of attention is not, after all, the basis of the capacity limit.
A further understanding of the premise that the focus of attention is limited to about 4 chunks requires a discussion of working assumptions including memory retrieval, the role of long-term memory, memory activation, maintenance rehearsal, other mnemonic strategies, scene coherence, and hierarchical shifting of attention. These are discussed in the remainder of Section 2. In Section 3, categories of evidence will be explained in detail. Finally, in Section 4, on the basis of the evidence, the theoretical view will be developed and evaluated more extensively with particular attention to possible reasons for the capacity limits.
2.1. Memory retrieval. It is assumed here that explicit, deliberate memory retrieval within a psychological task (e.g., recall or recognition) requires that the retrieved chunk reside in the focus of attention at the time immediately preceding the response. The basis of this assumption is considerable evidence, beyond the scope of this article, that explicit memory in direct memory tasks such as recognition and recall requires attention to the stimuli at encoding and retrieval, a requirement that does not apply to implicit memory as expressed in indirect memory tasks such as priming and word fragment completion (for a review see Cowan, 1995). Therefore, any information that is deliberately recalled, whether it is information from a recent stimulus or from long-term memory, is subject to the capacity limit of the focus of attention. In most cases within a memory test, information must be recalled from both the stimulus and long-term memory in order for the appropriate units to be entered into the focus of attention. For example, if we attempt to repeat a sentence we do not repeat the acoustic waveform; we determine the known units that correspond to what was said and then attempt to produce those units, subject to the capacity limit.
A key question about retrieval in a particular circumstance is whether anything about the retrieval process makes it impossible to obtain a pure capacity-based STM estimate. A compound STM estimate can result instead if there is a source of information that is temporarily in a highly accessible state, yet outside of the focus of attention. This is particularly true when a subject's task involves the reporting of chunks one at a time, as in most recall tasks. In such a situation, if another mental source is available, the subject does not need to hold all of the to-be-reported information in the focus of attention at one time. In a trivial example, a compound, supplemented digit capacity limit can be observed if the subject is trained to use his or her fingers to hold some of the information during the task (Reisberg, Rappaport, & O'Shaughnessy, 1984). The same is true if there is some internal resource that can be used to supplement the focus of attention.
2.2. The role of long-term memory. Whereas some early notions of chunks may have conceived of them as existing purely in STM, the assumption here is that chunks are formed with the help of associations in long-term memory, although new long-term memory associations can be formed as new chunks are constructed. It appears that people build up data structures in long-term memory that allow a simple concept to evoke many associated facts or concepts in an organized manner (Ericsson & Kintsch, 1995). Therefore, chunks can be more than just a conglomeration of a few items from the stimulus. Gobet and Simon (1996, 1998) found that expert chess players differ from other chess players not in the number of chunks but in the size of these chunks. They consequently invoked the term "template" to refer to large patterns of information that an expert can retain as a single complex chunk, with reference to expert information in long-term memory (see also Richman et al., 1995).
The role of long-term memory is important to keep in mind in understanding the size of chunks. When chunks are formed in the stimulus field on the basis of long-term memory information, there should be no limit to the number of stimulus elements that can make up a chunk. However, if chunks are formed rapidly through new associations that did not exist before the stimuli were presented (another mechanism suggested by McLean & Gregg, 1967), then it is expected that the chunk size will be limited to about four items because all of the items (or old chunks) that will be grouped to form a new, larger chunk must be held in the focus of attention at the same time in order for the new intra-chunk associations to be formed (cf. Baars, 1988; Cowan, 1995). This assumption is meant to account for data on limitations in the number of items per group in recall (e.g., see Section 2.7). It theoretically should be possible to increase existing chunk sizes endlessly, little by little, because each old chunk occupies only 1 slot in the capacity-limited store regardless of its size.
2.3. Memory activation. It is assumed that there is some part of the long-term memory system that is not presently in the focus of attention but is temporarily more accessible to the focus than it ordinarily would be, and can easily be retrieved into that focus if it is needed for successful recall (Cowan, 1988, 1995). This accessible information supplements the pure capacity limit and therefore must be understood if we are to determine that pure capacity limit.
According to Baddeley (1986) and Cowan (1995), when information is activated (by presentation of that information or an associate of it) it stays activated automatically for a short period of time (e.g., 2 to 30 s), decaying from activation unless it is reactivated during that period through additional, related stimulus presentations or thought processes. In Baddeley's account, this temporary activation is in the form of the phonological buffer or the visuospatial sketch pad. As mentioned above, there is some question about the evidence for the existence of that activation-and-decay mechanism. Even if it does not exist, however, there is another route to temporary memory accessibility, described by Cowan et al. (1995) as "virtual short-term memory" and by Ericsson and Kintsch (1995), in more theoretical detail, as "long-term working memory." For the sake of simplicity, this process also will be referred to as activation. Essentially, an item can be tagged in long-term memory as relevant to the current context. For example, the names of fruits might be easier to retrieve from memory when one is standing in a grocery store than when one is standing in a clothing store because different schemas are relevant and different sets of concepts are tagged as relevant in memory. Analogously, if one is recalling a particular list of items, it might be that a certain item from the list is out of the focus of attention at a particular point but nevertheless is temporarily more accessible than it was before the list was presented. For example, if one is buying groceries based on a short list that was not written down, a fruit forgotten from the list might be retrieved with a process resembling the following stream of thought: "I recall that there were three fruits on the list and I already have gotten apples and bananas...what other fruit would I be likely to need?" The data structure in long-term memory then allows retrieval. One difference between this mechanism and the short-term decay and reactivation mechanism is that it is limited by contextual factors rather than by the passage of time.
If there is no such thing as time-based memory decay, the alternative assumption is that long-term working memory underlies phenomena that have been attributed to the phonological buffer and visuospatial sketchpad by Baddeley (1986). In the present article, the issue of whether short-term decay and reactivation exists will not be addressed. Instead, it is enough to establish that information can be made temporarily accessible (i.e., in present terms, active), by one means or another and that this information is the main data base for the focus of attention to draw upon.
2.4. Maintenance rehearsal. In maintenance rehearsal, one thinks of an item over and over and thereby keeps it accessible to the focus of attention (Baddeley, 1986; Cowan, 1995). One way in which this could occur, initially, is that the rehearsal could result in a recirculation of information into the focus of attention, reactivating the information each time. According to Baddeley (1986), the rehearsal loop soon becomes automatic enough so that there is no longer a need for attention. A subject in a digit recall study might, according to this notion, rehearse a sequence such as "2, 4, 3, 8, 5" while using the focus of attention to accomplish other portions of the task, provided that the rehearsal loop contains no more than could be articulated in about 2 s. In support of that notion of automatization, Guttentag (1984) used a secondary probe task to measure the allocation of attention and found that as children matured, less and less attention was devoted to rehearsal while it was ongoing.
It appears from many studies of serial recall with rehearsal-blocking or "articulatory suppression" tasks, in which a meaningless item or short phrase is repeated over and over, that rehearsal is helpful to recall (for a review, see Baddeley, 1986). Maintenance rehearsal could increase the observed memory limit as follows. An individual might recall an 8-item list by rehearsing, say, 5 of the items while holding the other 3 items in the focus of attention. Therefore, maintenance rehearsal must be prevented before pure capacity can be estimated accurately.
2.5. Other mnemonic strategies. With the possible exception of maintenance rehearsal, other well-known mnemonic strategies presumably involve the use of long-term memory. In recoding, information is transformed in a way that can allow improved associations. For example, in remembering two lines of poetry that rhyme, an astute reader may articulate the words covertly so as to strengthen the temporary accessibility of a phonological or articulatory code in addition to whatever lexical code already was strong. This phonological code in turn allows the rhyme association to assist retrieval of activated information into the focus of attention. Another type of recoding is the gathering of items (i.e., chunks corresponding to stimuli as intended by the experimenter) into larger chunks than existed previously. This occurs when an individual becomes aware of the associations between items, such as the fact that the 12-letter string given above could be divided into four 3-letter acronyms. Elaborative rehearsal involves an active search for meaningful associations between items. For example, if the items "fish, brick" were presented consecutively, one might form an image of a dead fish on a brick, which could be retrieved as a single unit rather than two unconnected units. Recoding and elaborative rehearsal are not intended as mutually exclusive mechanisms, but slightly different emphases on how long-term memory information can be of assistance in a task in which memory is required. These, then, are some of the main mechanisms causing compound STM limits to be produced instead of pure capacity-based STM limits.
2.6. Scene coherence. The postulation of a capacity of about 4 chunks appears to be at odds with the earlier finding that one can comprehend only one stream of information at a time (Broadbent, 1958; Cherry, 1953) or the related, phenomenologically-based observation that one can concentrate on only one event at a time. A resolution of this paradox was suggested by Mandler (1985, p. 68) as follows:
The organized (and limited) nature of consciousness is illustrated by the fact that one is never conscious of some half dozen totally unrelated things. In the local park I may be conscious of four children playing hopscotch, or of children and parents interacting, or of some people playing chess; but a conscious content of a child, a chessplayer, a father, and a carriage is unlikely (unless of course they form their own meaningful scenario).
According to this concept, a coherent scene is formed in the focus of attention and that scene can have about 4 separate parts in awareness at any one moment. Although the parts are associated with a common higher-level node, they would be considered separate provided that there are no special associations between them that could make their recall mutually dependent. For example, four spices might be recalled from the spice category in a single retrieval (to the exclusion of other spices), but salt and pepper are directly associated and so they could only count as a single chunk in the focus of attention.
This assumption of a coherent scene has some interesting implications for memory experiments that may not yet have been conducted. Suppose that a subject is presented with a red light, a spoken word, a picture, and a tone in rapid succession. A combination of long-term memory and sensory memory would allow fairly easy recognition of any of these events, yet it is proposed that the events cannot easily be in the focus of attention at the same time. One possible consequence is that it should be very difficult to recall the serial order of these events because they were not connected into a coherent scene. They can be recalled only by a shifting of attention from the sensory memory or the newly formed long-term memory representation of one item to the memory representation of the next item, which does not result in a coherent scene and is not optimal for serial recall.
2.7. Hierarchical shifting of attention. Attentional focus on
one coherent scene does not in itself explain how a complex sequence can
be recalled. To understand that, one must take into account that the focus
of attention can shift from one level of analysis to another. McLean and
Gregg (1967, p. 459) described a hierarchical organization of memory in
a serial recall task with long lists of consonants: "At the top level of
the hierarchy are those cueing features that allow S to get from one chunk
to another. At a lower level, within chunks, additional cues enable S to
produce the integrated strings that become his overt verbal responses."
An example of hierarchical organization was observed by Graesser and Mandler
(1978) in a long-term recall task. The assumption underlying this research
was that, like perceptual encoding, long-term recall requires a limited-capacity
store to operate. It was expected according to this view that items would
be recalled in bursts as the limited-capacity store (the focus of attention)
was filled with information from long-term memory, recalled, and then filled
and recalled again. Studies of the timing of recall have indeed found that
retrieval from long-term memory (e.g., recall of all the fruits one can
think of) occurs in bursts of about 5 or fewer items (see Broadbent, 1975;
Mandler, 1975). Graesser and Mandler (1978, Study 2) had subjects name
as many instances of a semantic category as possible in 6 min. They used
a mathematical function fit to cumulative number of items recalled to identify
plateaus in the response times. These plateaus indicated about 4 items
per cluster. They also indicated, however, that there were lengthenings
of the inter-cluster interval that defined superclusters. Presumably, the
focus of attention shifted back and forth between the supercluster level
(at which several subcategories of items are considered) and the cluster
level (at which items of a certain subcategory are recalled). An example
would be the recall from the fruit category as follows: "apple - banana
- orange - pear (some common fruits); grapes - blueberries - strawberries
(smaller common fruits); pineapple - mango (exotic fruits); watermelon,
canteloupe, honeydew (melons). By shifting the focus to higher and lower
levels of organization it is possible to recall many things from a scene.
I assume that the capacity limit applies only to items within a single
level of analysis, reflecting simultaneous contents of the focus of attention.
3. Empirical Evidence for the Capacity Limit
Table 1
Types of Evidence of A Capacity Limit of About 4 Items, With Selected
Key References (Numbered According to the Relevant Section of the Article)
3.1. Imposing an Information Overload
3.1.1. Visual whole report of spatial arrays (Sperling, 1960)
3.1.2. Auditory whole report of spatiotemporal arrays (Darwin et al.
1972)
3.1.3. Whole report of unattended spoken lists (Cowan et al., in prep.)
3.2. Preventing Long-term Memory Recoding, Passive Storage, and Rehearsal
3.2.1. Short-term, serial verbal retention with articulatory suppression
(see Table 3 references; also Pollack et al., 1959; Waugh & Norman,
1965)
3.2.2. Short-term retention of unrehearsable material (Glanzer &
Razel, 1974; Jones et al., 1995; Simon, 1974; Zhang & Simon, 1985)
3.3. Examining Performance Discontinuities
3.3.1. Errorless performance in immediate recall (Broadbent, 1975)
3.3.2. Enumeration reaction time (Mandler & Shebo, 1982; Trick
& Pylyshyn, 1993)
3.3.3. Multi-object tracking (Pylyshyn et al., 1994)
3.3.4. Proactive interference in immediate memory (Halford et al.,
1988; Wickelgren, 1966)
3.4. Examining Indirect Effects of the Limits
3.4.1. Chunk size in immediate recall (Wickelgren, 1964; Ryan, 1969;
Chase & Simon, 1974; Ericsson et al., 1980; Ericsson, 1985)
3.4.2. Cluster size in long-term recall (Broadbent , 1975; Graesser
& Mandler, 1978)
3.4.3. Positional uncertainty in recall (Nairne, 1991)
3.4.4. Analysis of the recency effect in recall (Watkins, 1974)
3.4.5. Sequential effects in implicit learning and memory (Cleeremans
& McClelland, 1991; McKone, 1995)
3.4.6. Influence of capacity on properties of visual search (Fisher,
1984)
3.4.7. Influence of capacity on mental addition reaction time (Logan,
1988; Logan & Klapp, 1991)
3.4.8. Mathematical modeling parameters (Kintsch & van Dijk, 1978;
Raaijmakers & Shiffrin, 1981; Halford et al., 1998
3.1. Capacity limits estimated with information overload. One way in which long-term recoding or rehearsal can be limited is through the use of stimuli that contain a large number of elements for a brief period of time, overwhelming the subject's ability to rehearse or recode before the array fades from the time-limited buffer stores. This has been accomplished in several ways.
3.1.1. Visual whole report of spatial arrays. One study (Sperling, 1960) will be explained in detail, as it was among the first to use the logic just described. It revealed evidence for both (a) a brief, pre-attentive, sensory memory of unlimited capacity and (b) a much more limited, post-attentive form of storage for categorical information. Sperling's research was conducted to explore the former but it also was informative about the latter, limited-capacity store. On every trial, an array of characters (e.g., 3 rows with 4 letters per row) was visually presented simultaneously, in a brief (usually 50-msec) flash. This was followed by a blank screen. It was assumed that subjects could not attend to so many items in such a brief time but that sensory memory outlasted the brief stimulus array, and that items could be recalled to the extent that the information could be extracted from that preattentive store. On partial report trials, a tone indicated which row of the array the subject should recall (in a written form), but on whole report trials the subject was to try to recall the entire array (also in written form). The ability to report items in the array depended on the delay of the partial report cue. When the cue occurred very shortly after the array, most or all of the 4 items in the cued row could be recalled, but that diminished as the cue delay increased, presumably because the sensory store decayed before the subject knew which sensory information to bring into the more limited, categorical store.
By the time the cue was 1 sec later than the array, it was of no value (i.e., performance reached an asymptotically low level). Subjects then could remember about 1.3 of the cued items from a row of 4. It can be calculated that at that point the number of items still remembered was 1.3 x 3 (the number of rows in the array) or about 4. That was also how many items subjects could recall on the average on trials in the "whole report" condition, in which no partial report cue was provided. The limit of 4 items was obtained in whole report across a large variety of arrays differing in the number, arrangement, and composition of elements. Thus, a reasonable hypothesis was that subjects could read about 4 items out of sensory memory according to a process in which the unlimited-capacity, fading sensory store is used quickly to transfer some items to a limited-capacity, categorical store (according to the present theoretical framework, the focus of attention).
One could illustrate the results of Sperling (1960) using Figure 1, which depicts the interaction of nested faculties in the task in a manner similar to Cowan (1988, 1995). Within that theoretical account, sensory memory is assumed to operate through the activation of features within long-term memory (an assumption that has been strengthened through electrophysiological studies of the role of reactivation in automatic sensory memory comparisons; see Cowan, Winkler, Teder, & Näätänen, 1993). The nesting relation implies that some, but not all, of sensory memory information also is in the focus of attention at a particular moment in this task. In either the whole report condition or the partial report condition, the limited capacity store (i.e., the focus of attention) can be filled with as many of the items from sensory memory as the limited capacity will allow; but in the partial report condition, most of these items come from the cued row in the array. Because the display is transient and contains a large amount of information, subjects have little chance to increase the amount recalled through mnemonic strategies such as maintenance or elaborative rehearsal. Part A of the figure represents whole report and shows that a subset of the items can be transferred from activated memory to the capacity-limited store. Part B of the figure, representing partial report, shows that the items transferred to the capacity-limited store are now confined to the cued items (filled circles), allowing a larger proportion of those items to be reported.

A word is in order about the intent of this simple model shown in Figure 1. It is not meant to deny that there are important differences between more detailed structures such as the phonological store and the visuospatial sketch pad of Baddeley (1986). However, the model is meant to operate on a taxonomically inclusive level of analysis. It seems likely that there are other storage structures not included in Baddeley's model, such as memory for nonverbal sounds and for tactile stimuli. In principle, moreover, these separate structures could share important functional properties (e.g., incoming stimuli requiring a particular kind of coding interfere with the short-term retention of existing representations using similar coding) and could operate based on similar neural mechanisms of activation. Given that we do not know the taxonomy of short-term stores, they are represented together in the model according to their common principle, that they are activated portions of the long-term memory system. This activated memory includes both physical features and conceptual features. What is critical for the present purposes is that all of the storage structures making up activated memory are assumed not to have capacity limitations. Instead, they are assumed to be limited because of memory decay, interference from subsequent stimuli, and/or some other basis of temporary accessibility. Only the focus of attention is assumed to have a fixed capacity limit in chunks, and it is that capacity limit that is of primary concern here.
There are also a number of other theoretical suggestions that are consistent with the present approach but with different terminology and assumptions. For example, the approach appears compatible with a model proposed recently by Vogel, Luck, and Shapiro (1998). Their "conceptual short-term memory" would correspond to the activated portion of long-term memory in the model of Cowan (1988), whereas their "visual working memory" would correspond to the focus of attention. Potential differences between the approaches appear to be that what they call conceptual memory could, according to Cowan (1988), include some physical features; and what they call visual working memory would, according to Cowan (1988), prove to be one instance of the focus of attention, a central structure that represents conscious information from all modalities. The most critical similarity between the models for present purposes is that the capacity limit shows up in only one place (the visual working memory or focus of attention), not elsewhere in the model.
With these theoretical points in mind, we can return to a consideration of Sperling's study. The observed limit to about 4 items in whole report theoretically might be attributed to output interference. However, studies by Pashler (1988) and Luck and Vogel (1997), in which output interference was limited, militate against that interpretation. In one experiment conducted by Luck and Vogel, for example, subjects saw an array of 1 to 12 small colored squares for 100 msec and then, after a 900-msec blank interval, another array that was the same or differed in the color of one square. The subject was to indicate whether the array was the same or different. Thus, only one response per trial was required. Performance was nearly perfect for arrays of 1-3 squares, slightly worse with 4 squares, and much worse at larger array sizes. Very similar results were obtained in another experiment in which a cue was provided to indicate which square might have changed, reducing decision requirements. Some of their other experiments clarify the nature of the item limit. The 4-item limit was shown to apply to integrated objects, not features within objects. For example, when objects in an array of bars could differ on four dimensions (size, orientation, color, and presence or absence of a central gap), subjects could retain all four dimensions at once as easily as retaining any one. The performance function of proportion correct across increasing array size (i.e., increasing number of array items) was practically identical no matter how many stimulus attributes had to be attended at once. This suggested that the capacity limit should be expressed in terms of the number of integrated objects, not the number of features within objects. The objects serve as the chunks here.
Broadbent (1975) noted that the ability to recall items from an array grows with the visual field duration: "for the first fiftieth of a second or so the rate of increase in recall is extremely fast, and after that it becomes slower." He cites Sperling's (1967) argument that in the early period, items are read in parallel into some visual store; but that, after it fills up, additional items can be recalled only if some items are read (more slowly) into a different, perhaps articulatory store. Viewed in this way, the visual store would have a capacity of 3 to 5 items, given that the performance function rapidly increases for that number of items. However, the "visual store" could be a central capacity limit (assumed here to be the focus of attention) rather than visually specific as the terminology used by Sperling seems to imply.
A related question is what happens when access to the sensory memory image is limited. Henderson (1972) presented 3 x 3 arrays of consonants, each followed by a masking pattern after 100, 400, 1000, or 1250 msec. This was followed by recall of the array. Although the number of consonants reported in the correct position depended on the duration of the array, the range of numbers was quite similar to other studies, with means for phonologically dissimilar sets of consonants ranging from about 3 with 100-msec exposure times to about 5.5 with 1250-msec exposure times. This indicates that most of the transfer of information from sensory storage to a limited-capacity store occurs rather quickly.
A similar limit may apply in situations in which a scene is changed in a substantial manner following a brief interruption and people often do not notice the change (e.g., Simons & Levin, 1998). Rensink, O'Regan, and Clark (1997) proposed that this limit may occur because people can monitor only a few key elements in a scene at one time.
3.1.2. Auditory whole report of spatiotemporal arrays. Darwin, Turvey, and Crowder (1972) carried out an experiment that was modeled after Sperling's (1960) work, but with stimuli presented in the auditory modality. On each trial, subjects received 9 words in a spatiotemporal array, with sequences of 3 spoken items (numbers and letters) presented over headphones at left, center, and right locations simultaneously for a total array size of 9 items. The partial report cue was a visual mark indicating which spatial location to recall. The results were quite comparable to those of Sperling (1960). Once more the partial report performance declined across cue delays until it was equivalent to the whole report level of about 4 items though, in this experiment in the auditory modality, the decline took about 4 sec rather than 1 sec as in vision, and the last item in each sequence was recalled better than the first two items. In both modalities, the whole report limit may suggest the limited capacity for storage of item labels in a consciously accessed form.
3.1.3. Whole report of ignored (unattended) spoken lists. In all of the partial report studies, the measure of short-term memory capacity depended upon the fact that there were too many simultaneously presented items for all of them to be processed at once, so that the limited-capacity mechanism was filled with items quickly. If items were presented slowly and one at a time, the subject would be able to use mnemonic processes such as rehearsal (e.g., Baddeley, 1986) to expand the number of items that could be held, and therefore would be able to exceed the constraints of the limited-capacity store. If a way could be found to limit these mnemonic processes, it could allow us to examine pure capacity in a test situation more similar to what is ordinarily used to examine STM (presumably yielding a compound STM estimate); namely immediate, serial verbal list recall.
Cowan et al. (1999) limited the processing of digits in a spoken list by having subjects ignore the items in the spoken list until after their presentation. Subjects played a computer game in which the name of a picture at the center of the screen was to be compared to the names of four surrounding pictures to indicate (with a mouse click) which one rhymed with the central picture. A new set of pictures then appeared. As this visual game was played repeatedly, subjects ignored lists of digits presented through headphones. Occasionally (just 16 times in a session), 1 sec after the onset of the last spoken word in a list, the rhyming game disappeared from the screen and a memory response screen appeared shortly after that, at which time the subject was to use the keypad to report the digits in the spoken list. Credit was given for each digit only if it appeared in the correct serial position. Relative to a prior memory span task result, lists were presented at span length and at lengths of span-1 (i.e., lists one item shorter than the longest list that was recalled in the span task), span-2, and span-3. A control condition in which subjects attended to the digits also was presented, before and after the ignored-speech session. In the attended-speech control condition, the number of digits recalled was higher than in the unattended condition, and it increased with list length. However, in the ignored-speech condition, the mean number of items recalled remained fixed at a lower level regardless of list length. The level was about 3.5 items in adults, and fewer in children. This pattern is reproduced in Figure 2. It is important that the number correct remained fixed across list lengths in the ignored-speech condition, just as the whole-report limit remained fixed across array sizes in Sperling (1960). It is this pattern that is crucial for the conclusion that there is a fixed capacity limit.

It is important also to consider individual-subject data. Sperling's (1960) data appeared to show that his very few, highly trained individuals had capacity limits in the range of about 3.5 - 4.5. In the study of Cowan et al. (1999), results from 35 adults are available even though only the first 24 of these were used in the published study. Figure 3 shows each adult subject's mean number correct in the unattended speech task, as well each subject's standard error and standard deviation across unattended speech trials. It is clear from this figure that individuals did not fit within a very narrow window of scores; their individual estimates of capacity ranged from as low as about 2 to as high as almost 6 in one participant. One might imagine that the higher estimates in some individuals were due to residual attention to the supposedly ignored spoken digits, but the results do not support that suggestion. For example, consider the subject shown in Figure 3 who had the best memory for ignored speech. If that subject attended to the spoken digits that were to be ignored, then the result should have been a positive slope of memory across the four list lengths, similar to the attended-speech condition shown in Figure 2. In fact, however, that subject's scores across 4 list lengths had a slope of -0.35. Across all of the adult subjects, the correlation between memory for ignored speech and slope of the ignored speech memory function was r = -.19, n.s. The slight tendency was thus for subjects with better recall to have less positive slopes than those with poorer recall. The slopes were quite close to zero (M = 0.05, SD = 0.32) and were distributed fairly symmetrically around 0. Another possible indication of attention to the supposedly ignored speech would be a tradeoff between memory and visual task performance during the ignored speech session. However, such a tradeoff did not occur. The correlation between memory for unattended speech and reaction times on the visual task was -.33, n.s., the tendency being for subjects with better memory for ignored speech also to display slightly shorter reaction times on the visual task. The same type of result was obtained for the relation between memory and visual task reaction times on a trial-by-trial basis within individuals. The mean within-subject correlation was -.08 (SD = .25), showing that the slight tendency was for a subject's trials that produced better memory to be accompanied by shorter mean reaction times on the preceding visual task. Thus, the memory capacity of up to 6 items in certain individuals as measured in this technique and the individual differences in capacity seem real, not due to attention-assisted encoding. Figure 3 shows that individuals' standard errors (rectangles) were relatively small, and that even the standard deviations (bars) of the best versus the worst rememberers did not overlap much.

The study of Cowan et al. (1999) is not the only one yielding individual difference information. For example, the data set reported as the first experiment of Luck and Vogel (1997), on visual storage capacity, resulted from individual subject estimates of storage capacity ranging from 2.2 to 4.7, and a graduate student who spent months on the capacity-estimation tasks developed a capacity of about 6 items (Steven Luck, personal communication, January 18, 1999). These estimates are quite similar to the ones shown in Figure 3 despite the great differences in procedures. Similar estimates can be obtained from the study of Henderson (1972), in which each consonant array was followed by a mask. For example, with a 400-msec field exposure duration (long enough to access sensory memory once, but probably not long enough for repeated access) and no supplementary load, the 6 subjects' mean number correct ranged from 3.0 to 5.1 items.
All of these results appear to require a modification of conclusions that could be drawn from the previous literature. In his ground-breaking review of memory span, Dempster (1981, p. 87) concluded that "there is little or no evidence of either individual or developmental differences in capacity." In the previous literature only processing speeds were found related to span, but none of the previous developmental investigations examined memory with strategic processing during reception of the list minimized so as to examine capacity. There do appear to be individual and developmental differences in capacity.
Figure 4 illustrates another intriguing point about Cowan et al. (1999). In this scatterplot of memory for unattended versus attended speech in individuals within each age group, the equation line represents the case in which memory was equal in the two tasks. What the plot shows is that memory was always better in the attended speech task, but that the amount of improvement in the attended speech task relative to the unattended speech task was independent of the level of performance on the unattended task. In the attended condition the means (and SDs) were: for 35 adults, 5.43 (0.78); for 26 fourth graders, 4.31 (0.88); and for 24 first graders, 3.48 (0.69). In the ignored speech condition the comparable means were: for the adults, 3.51 (0.94); for the fourth graders, 2.99 (0.86); and for the first graders, 2.34 (0.69). Notice that, among all groups, the ratio of mean attended to unattended numbers correct fell within a narrow range, between 1.4 and 1.6. This pattern suggests that attention at the time of reception of the list may add a process that is independent of the processes involved in memory for unattended speech. That process presumably is independent of the pure capacity limit and could reflect the use of attention to form larger chunks.

It should be noted that the main scoring procedure used by Cowan et al. (1999) credited correct recall of a digit only if it appeared in the correct serial position. Cowan et al. also examined results of a scoring procedure in which credit was given for any correct digit, regardless of the serial position. Such results cannot be compared across list lengths because the probability of guessing correctly increases dramatically with list length (given that each digit could occur only once per list). Nevertheless, it is noteworthy that recall at all ages was more like a constant proportion correct across lists lengths in this free scoring, not a constant number correct as in the serial position scoring. Adults and fourth-grade children were over 90% correct on lists of length 4 through 6, the lengths examined with this scoring procedure, and first-grade children were correct on 83%, 80%, and 83% of the lists at these three lengths. The item scoring raises the question of what it is that is held in a capacity-limited mechanism. It cannot simply be the items that are held, as the free scoring does not show the limited-capacity pattern of a constant number correct across list lengths. The digits themselves may be stored in activated memory (e.g., auditory sensory or phonological memory) and drawn from it into the focus of attention as needed. Instead, it might be the mapping between the digits in memory and the serial positions in the list that would have to be held in capacity-limited storage.
3.2. Capacity limits estimated by blocking long-term memory recoding, passive storage, and rehearsal. Verbal materials can be used under conditions that discourage recoding and rehearsal, or materials that are intrinsically difficult to recode, store, and rehearse can be used. These methods force subjects to rely primarily on capacity-limited storage of chunks that were learned out of the laboratory or, at least, before the experimental trial in question.
3.2.1. Short-term, serial verbal retention with articulatory suppression. The contribution of long-term memory can be minimized by drawing the stimuli from the same, small set on every trial and requiring the correct recall of serial order. Because the same items recur over and over, it is difficult to retain long-term associations that help in the retention of serial order of the items on a particular trial. That is, in fact, the nature of the stimuli in most immediate, serial recall experiments that have been conducted. Further, the contribution of rehearsal can be minimized by imposing articulatory suppression (Baddeley, 1986; Murray, 1968), a secondary task in which the subject repeats, whispers, or mouths a rote utterance over and over during the presentation of items (e.g., "the, the, the...") and sometimes throughout recall itself if a nonspeech recall mode is used.
Cowan, Wood et al. (1998) offered an account of what these variables do when used together. They proposed that when new words are presented on every trial in a serial recall task, the phonological portion of activated memory includes a phonological representation of the word sequence. However, when the same words are used over and over on every trial, all of the representations of items from the memory set become active in memory, so that the memory items in the current list cannot necessarily be distinguished from items used in previous trials. Rehearsal may allow a special representation of the to-be-recalled list to be constructed in active memory even under these circumstances in which a small set of items is used over and over. Cowan, Wood et al. offered these assumptions to account for why articulatory suppression has a much larger effect on performance for a small set of words than for large sets of words that are not repeated from trial to trial. A small set of words used over and over, along with articulatory suppression, may minimize the contribution of articulatory and passive phonological storage factors in recall. It is only under these conditions, for example, that the "word length effect," or advantage for lists composed of short words, is eliminated (LaPointe & Engle, 1990). Word length effects that remain even with articulatory suppression when a large set of items is used can be explained on the grounds that phonological representations of these items are generated from long-term memory (Besner, 1987) and remain active despite articulatory suppression. (An alternative interpretation of articulatory suppression effects would state that suppression works by taking up processing capacity rather than by blocking rehearsal. However, if that were true, suppression should impair performance even when a large set of words is used. Given that it does not, the alternative interpretation seems wrong.)
Before describing results of serial recall experiments with spoken stimuli and articulatory suppression, it is necessary to restrict the admissible serial recall data in a few other ways. Memory for multisyllabic words was excluded because these often might be retained as separate segments rather than integrated units (e.g., fire-man if morphemic segments are used; um-brel-la if syllabic segments are used). Memory for nonwords also was excluded because one might retain them in terms of separate phonemic or syllabic series even if they are monosyllabic. Only spoken words were included because articulatory suppression seems to interfere with the retrieval of the phonological representation of printed words, but not of spoken words. For example, articulatory suppression during the presentation of a list eliminates phonological similarity effects for printed words, but not for spoken words (Baddeley, Lewis, & Vallar, 1984). Finally, conditions with highly unusual stimulus parameters were eliminated. Unusually slow stimulus presentations (> 4 sec per word) were excluded because it might be possible to insert rehearsals despite the articulatory suppression, as were unusually fast presentations (< 0.5 sec per word) because of encoding difficulty; and grouped presentations were omitted because they encourage long-term recoding of the list.
Table 2 shows the results for all studies meeting these constraints. I was able to find 9 studies that included at least one experimental condition involving the immediate recall of spoken, monosyllabic words from a small set in the presence of articulatory suppression. Among these studies I was able to derive 17 independent estimates of memory storage. There appears to be a striking degree of convergence among the 17 estimates. All but one of the estimates fell within the range of 3-5 items, and most fell in the 3-4-item range. The only outlier was an estimate of 2.4 items from Longoni et al. (1993). That low estimate is difficult to understand because the stimulus conditions were almost identical to another experimental condition in Longoni et al. that yielded an estimate of 3.4 items.
Table 2| Reference | Data Source | Method of Calculating Items in Storage | Est. |
| Murray (1968) | Figure 1, p. 682. Cued recall; auditory presentation with suppression.
6 letters. |
Add the proportions correct across probed serial positions and assume recall of the first, unprobed item in the list = .8. Thus, at List Length 6: .4+.2+.4+.5+.8=2.3; +1=3.3. | 3.1 |
| same | 7 letters | same | 3.2 |
| same | 8 letters | same | 3.0 |
| same | 9 letters | same | 3.1 |
| Peterson & Johnson (1971) | Table 2, p. 349 (5 letters, serial recall; count during pres., low-similarity condit.) Date = proportion of lists recalled correctly. | 5 items, 45% of lists correct. High assumption is that on the other
55% of trials, subjects get 4 correct, for a mean of 5(.45) + 4(.55) =
4.45.
A more moderate assumption is 5(.45) + 4(.28) + 3(.27) = 4.18. |
4.2 |
| Levy (1971) | Table 1, p. 126. Neutral arctic. cond., cued recall, simultan. auditory
presentation
7 serial positions |
7 ser. posit. x .39 items/s.p. = 2.73 items correct. Add 0.8 items for the first, unprobed position = 3.53 items. | 3.5 |
| same | Table 2, p. 130
9 serial positions |
9 ser. posit. x .34 = 3.06, add 0.8 for first, unprobed position = 3.86 items. | 3.9 |
| Baddeley, Thomson, & Buchanan (1975) | Figure 6, p. 585 Serial recall, monosyllabic words, auditory presentation with suppression | 5 serial positions x .7 items / s.p. correct = 3.5 items. | 3.5 |
| Baddeley, Lewis, & Vallar (1984) | Table 1, p. 236
Serial recall, monosyllabic words, dissimilar items with suppression) Fast presentation |
5 serial positions x .64 items / s.p. correct =3.2 items in each case. |
3.2 |
| Table 2, continued | |||
| same | Slower presentation | same | 3.2 |
| Cowan, Cartwright, Winterowd, & Sherk (1987) | Table 1, p. 514 Monosyllabic words, serial recall, span procedure, dissimilar items with articulatory suppression. Arctic. task: Whisper alphabet | Span = estimate. (Omitted conditions in which articulatory suppression task was presumably ineffective; whisper same letter once after each item, span = 4.81; whisper same letter continuously throughout study, span = 4.86.) |
4.0 |
| same | Task: Whisper next letter on each trial. | same | 4.0 |
| Longoni, Richardson, & Aiello (1993) | Table 3, p. 17
Serial recall, distinct items, suppression task = whisper "the." Presentation rate of 0.5 sec per item. |
6 serial positions x .57 correct = 3.42 (Additional data from a very slow presentation rate of 5 sec per item condition were omitted because rehearsal was possible; for that cond., 6 serial positions x .78 correct = 4.68) |
3.4 |
| same | Table 4, p. 19
Whisper "hiya" during presentation & recall |
6 serial positions x .40 correct = 2.40 It is not clear why such discrepant results obtained in these 2 experiments. | 2.4 |
| Avons, Wright, & Pammer (1994) | Table 1, p. 215
Short words, immediate recall (all had suppression) Serial recall condition |
5 serial positions x .69 items / s.p. correct = 3.45 items. | 3.5 |
| same | Probed recall condition
(Probed by the serial position of the item) |
5 serial positions x .72 items / s.p. correct = 3.60 items. | 3.6 |
| Hitch, Burgess, Towse, & Culpin (1996) | Figure 4, p. 125
Auditory presentation of items, suppression, recall in correct serial positions. |
Dependent measure = items correct = about 4.0. (Omitted results for grouped lists, about 6.0.) | 4.0 |
The methods of estimation are described briefly in Table 2. The most commonly applicable method was to take the proportion correct at each serial position (or, when necessary, an estimate of this proportion based on a figure) and add the proportions across serial positions to arrive at the number correct. In a probed recall experiment (e.g., Murray, 1968) there is an initial list item for which the procedure produces no memory estimate; based on past research on primacy effects, the available proportion at this first serial position always was estimated at 0.8. For some studies, alternative assumptions led to alternative estimates of storage. For example, in the study of Peterson and Johnson (1971), the dependent measure reported was the number of lists recalled correctly, and to estimate items recalled one must make assumptions about the number of errors within the lists recalled incorrectly. Estimates of capacity are given in the table under a "high" assumption that at least 4 items were recalled within each 5-item list, and under a more "moderate" assumption that erroneous lists contained 1 or 2 errors (i.e., 4 or 3 correct items) equally often. It is the more moderate estimate that appears in the rightmost column of the table. When the measure was memory span, the estimate was taken as the span in conditions in which the articulatory suppression task can be presumed to have been most effective (e.g., in Cowan et al., 1987).
Waugh and Norman (1965) impeded rehearsal in a different way, through instructions to the subjects not to rehearse. In their experiment, each list contained 16 spoken digits and the last digit was accompanied by a tone. It was to serve as a probe, the same digit having occurred once before somewhere in the list. The subject was to respond with the digit that had followed the probe digit when it was presented earlier, in the list. Results with an ordinary, 1-per-sec presentation rate (e.g., Waugh & Norman, 1965, p. 91) showed that performance levels were much higher with 3 or fewer items intervening between the target pair and the response (> .8) than it was with 4 or more intervening items (< .6). The transition between 3 and 4 intervening items was abrupt. Note that with 3 intervening items in this task, successful performance would require that the subject's memory extend back far enough to remember 4 items: the target pair and two intervening items. (The last intervening item was the probe, which did not have to be remembered.) Thus, this task leads to an estimate of 4 items in capacity-limited short-term storage. (Performance levels with a very fast, 4-per-sec presentation decreased rather more continuously as a function of the number of intervening items, which possibly could reflect the heavier contribution of a time-limited source of activation, such as sensory memory, that was the most vivid for more recent items and faded gradually across items.)
Another way to limit rehearsal is to use a "running memory span" procedure, in which a long list of items is presented and the subject is unaware of the point at which the test is to begin. Pollack, Johnson, and Knaff (1959) devised such a procedure. In their Experiment 1, lists of 25, 30, 35, and 40 digits were presented. When the list ended, the task was to write down as many of the most recent items as possible, making sure to write them in the correct serial positions with respect to the end of the list. Under these conditions, the list was too long and continuous for rehearsal to do any good, and the obtained mean span was 4.2 digits. (Theoretically, it might be possible for the subejct continually to compute, say, what were the last 5 items; but there is no task demand that would encourage such difficult work even if it were feasible. The absence of on-line task requirements makes this task very different from the n-back tasks, which, as discussed earlier, do not meet the criterial for inclusion.)
It is possible to prevent rehearsal in yet another way, by requiring processing between items rather than during the presentation of items. Consider, for example, the working memory span task of Daneman and Carpenter (1980) in which the subject must read sentences and also retain the final word of each sentence. The reading should severely limit rehearsal of the target words. Daneman and Carpenter (1980, p. 455) reported a mean span of 3.15 words in this circumstance. It is at first puzzling to think that subjects could do this well, inasmuch as they might need some of the capacity-limited storage space for processing the sentences (unless storage and processing demands are totally separate as suggested by Daneman and Carpenter, 1980 and by Halford et al., 1998). Notice, however, that the word memory load does not reach 3 until after the third sentence has been processed. This might well leave some of the limited storage capacity available for sentence processing until the very end of the trial.
3.2.2. Short-term retention of unrehearsable material. A second way that time-limited stores can be eliminated from a measure of storage is with materials that, by their nature, cannot be rehearsed and thereby refreshed in active memory. It is unlikely that items that cannot be rehearsed lend themselves easily to long-term recoding, either. An analysis of one early study illustrates this distinction. Some verbal materials are too long to be rehearsed (Baddeley, 1986). Simon (1974) examined this in an informal study using himself as a subject, and tried to remember well-known expressions such as "four score and seven years ago, " "To be or not to be, that is the question," and "All's fair in love and war." He concluded that "lists of three such phrases were all I could recall with reliability, although I could sometimes retain four." Of course, the number of words and syllables contained in these phrases was much larger. Elsewhere in the article, for example, it was noted that 7 one-syllable words could be recalled. The present theoretical assumption is that, in the recall of phrases, each phrase served as a previously learned chunk and also was too long to allow effective rehearsal; thus, by aiming the focus of attention at the phrase level, four such phrases could be recalled despite their inclusion of many more units on a sub-chunk level. In the recall of isolated words, in contrast, given that each word was much shorter than a phrase, it was presumably possible to use rehearsal to reactivate memory (and possibly to form new chunks larger than a single word) and therefore to increase the number of words recalled above what would be expected if each word were a separate chunk. This reasoning is supported by the fact that about 4 unconnected spoken words can be recalled when rehearsal is blocked, as shown in Table 2.
Jones, Farrand, Stuart, and Morris (1995) carried out an experiment that reveals a capacity limit, though that was not the purpose of the experiment. On each trial, a series of dots was presented one at a time at different spatial locations on the computer screen. After a variable test delay, the response screen included all of the dots and the task was to point to them in the serial order in which they had been presented. There was very little loss of information over retention intervals of up to 30 s. The authors suggested that this stability of performance across test delays indicates that some sort of "rehearsal" process was used. I would suggest that the so-called rehearsal process used here does not contaminate the estimate of storage because it is not a true rehearsal process. Instead, it may be a process in which some of the items, linked to serial position or order, are held in the capacity-limited store. Each list presented by Jones et al. included 4, 7, or 10 dots. It can be estimated from their paper (Jones et al., Fig. 2, p. 1011) that these three list lengths led to means of 3.5, 3.8, and 3.2 items recalled in a trial, respectively. These estimates were obtained by calculating the mean proportion correct across serial positions and multiplying it by the number of serial positions.
Several studies of the memory for unrehearsable material produce estimates lower than 3.0. Glanzer and Razel (1974) examined the free recall of proverbs and estimated the short-term storage capacity using the method developed by Waugh and Norman (1965), based on the recency effect. The estimate was 2.0 proverbs in short-term storage on the average. Glanzer and Razel also estimated the contents of short-term storage for 32 different free recall experiments, and found a modal value of 2.0 - 2.4 items in storage, very comparable to what they found for the proverbs. However, there is a potential problem with the Waugh and Norman (1965) method of estimating the contents of short-term storage. They assumed that the most recent items are recalled from either of two sources: short-term storage or long-term storage. The estimate of short-term storage is obtained by taking the list-medial performance level to reflect long-term memory and assuming that the recency effect occurs because of this same memory plus the additional contribution of short-term memory. This assumption is problematic, though, if the items in the recency positions are not memorized in the same way but are more often recalled only with the short-term store and not with the same contribution of long-term storage that is found for the earlier list items. This possibility is strengthened by the existence of negative recency effects in the final free recall of lists that previously had been seen in immediate recall (Craik, Gardiner, & Watkins, 1970). Glanzer and Razel consequently may have overcorrected for the contribution of long-term memory in the recency positions.
Another low estimate was obtained for unrehearsable material by Zhang and Simon (1985) using Chinese. In their Experiment 1, the mean number of items recalled was 2.71 when the items were radicals without familiar pronounceable names, and 6.38 (like the usual English memory span) when the items were characters with pronounceable, rehearsable names, within which radicals were embedded. A lower estimate for unrehearsable items is to be expected. However, the fact that it was lower than 3 would not be expected if, as the authors asserted, there are over 200 such radicals and "educated Chinese people can recognize every radical" (p. 194), making each radical a single visual chunk. It seems possible that there are visual similarities among three or more radicals that tend to make them interfere with one another in memory when radicals are presented in a meaningless series, preventing them from serving as independent chunks. Although this analysis is speculative, the basis of the discrepancy between these few estimates below 3.0 and the estimates obtained in the many other experiments taken to reflect a capacity limit (in the 3 - 5 chunk range) is an important area for future research.
3.3. Capacity limits estimated with performance discontinuities. Although subjects in some procedures may be able to perform when there are more than 4 items, the function describing the quality or speed of performance sometimes shows a discontinuity when one reaches about 4 items (e.g., a much longer reaction time cost for each additional item after the fourth item). Presumably, in these circumstances, some optional processing mechanism must be used to supplement the capacity-limited store only if the stimuli exceed the capacity. This can occur in several ways as shown below.
3.3.1. Errorless performance in immediate recall. Broadbent (1975) noted that we usually measure span as the number of items that can be recalled on 50% of the trials. However, he cites evidence that the number of items that can be recalled reliably, with a very high accuracy, is about 3 or 4 and is much more resistant to modifications based on the nature of the items (Cardozo & Leopold, 1963; see also Atkinson & Shiffrin, 1968). That is, there is a flat performance function across list lengths until 3 or 4 items. It stands to reason that when items beyond 4 are remembered, it is through the use of supplementary mnemonic strategies (such as rehearsal and chunking), not because of the basic storage capacity.
3.3.2. Enumeration reaction time. The ability to apprehend a small number of items at one time in the conscious mind can be distinguished from the need to attend to items individually when a larger number of such items are presented. This point is one of the earliest to be noted in psychological commentaries on the limitations in capacity. Hamilton (1859) treated this topic at length and noted (Vol. 1, p. 254) that two philosophers decided that six items could be apprehended at once, whereas at least one other (Abraham Tucker) decided that four items could be apprehended. He went on to comment: "The opinion [of six] appears to me correct. You can easily make the experiment for yourselves, but you must be aware of grouping the objects into classes. If you throw a handful of marbles on the floor, you will find it difficult to view at once more than six, or seven at most, without confusion; but if you group them into twos, or threes, or fives, you can comprehend as many groups as you can units; because the mind considers these groups only as units, -it views them as wholes, and throws their parts out of consideration. You may perform the experiment also by an act of imagination." When the experiment actually was conducted, however, it showed that Hamilton's estimate was a bit high. Many studies have shown that the time needed to count a cluster of dots or other such small items rises very slowly as the number of items increases from 1 to 4, and rises at a much more rapid rate after that. Jevons (1871) was probably the first actual study, noting that Hamilton's conjecture was "one of the very few points in psychology which can, as far as we yet see, be submitted to experiment." He picked up handfuls of beans and threw them into a box, glancing at them briefly and estimating their number, which was then counted for comparison. After over a thousand trials, he found that numbers up to 4 could be estimated perfectly, and up to 5 with very few errors.
Kaufman, Lord, Reese, and Volkmann (1949) used the verb "subitize" to describe the way in which a few items apparently can be apprehended and enumerated in a very rapid fashion (as if these items enter the focus of attention at the same time). In contrast, when there are more items, the reaction time or the time necessary for accurate counting increases much more steeply as the number of items increases (as if these items must enter the focus of attention to be counted piecemeal, not all at once). Mandler and Shebo (1982) described the history of the subitizing literature. As they note, subitizing has been observed via two main procedures: one in which the duration of an array is limited and the dependent measure is the proportion of errors in estimating the number of items in the array, and another method in which the array stays on and the primary dependent measure is the reaction time to respond with the correct number. The results from the first of these methods seem particularly clear. For example, in results reported by Mandler and Shebo (1982, p. 8), the proportion of errors was near zero for arrays of 1-4 items (or for 1-3 items with a presentation duration as short as 200 msec) and increased steeply after that, at a rate of about 15% per additional item until nearly 100% error was reached with an array size of 11. The reaction time increased slowly with array sizes of 1-3 and more steeply for array sizes of 5-8. After that it leveled off (whereas it continued to increase at the same rate, for much higher array sizes, in procedures in which the array stayed on and the dependent measure was the time to produce the number). The average response was identical to the correct response for array sizes of 1-8, with an increasing degree of underestimation as array size increased from 9 to 20. From the present viewpoint, it would appear that 3 or 4 items were subitized initially and about 3 or 4 more could be added to the subitized amount without losing track of which items had been counted.
Alternative hypotheses about enumeration and related processes must be considered. Trick and Pylyshyn (1994) put forth a theory of subitizing suggesting that it is capacity-limited (hence the limit to 4 items), but still not attention-demanding, and that it takes place at a point in processing intermediate between unlimited-capacity automatic processes and serial or one-at-a-time attentive processes. It was called the FINST (finger of instantiation) theory in that there are a limited number of "fingers" of instantiation that can be used to define individual items in the visual field. This theory is specific to vision, and it was contrasted with a working memory theory in which subitizing is said to occur because of a limit in the number of temporary memory slots.
The evidence used by Trick and Pylyshyn (1994a) to distinguish between the theories is open to question. First, it was shown that items could be subitized only if they were organized in a way that made them "pop out" of the surroundings (the evidence of Trick & Pylyshyn, 1993). Certainly, this suggests that there is a pre-attentive stage of item individuation, but perhaps the subitization occurs only afterward, contingent not only on this rapid item individuation as Trick and Pylyshyn said, but contingent also on the availability of slots. One reason to make this distinction is that the phenomenon of popout clearly is not limited to four items; it obviously occurs for much larger numbers of items. For example, when one looks inside a carton of eggs, all of the eggs appear to pop out against the surrounding carton. It is the inclusion of individuated items in the enumeration routine that is limited to about 4. Another type of evidence used by Trick and Pylyshyn (1994a) was that there was said to be no effect of a memory load on subitization, unlike counting. Logie and Baddeley (1987) were the main authors cited in this regard, though subitization was not the focus of their study. Logie and Baddeley did find that two distractor tasks (articulatory suppression from repetition of the word "the," and tapping) had little effect in the subitizing range, whereas articulatory suppression had an effect in the counting range. However, these tasks can be carried out relatively automatically and would not be expected to require much working memory capacity (Baddeley, 1986). For example, unlike counting backward as a distractor task, which causes severe forgetting of a consonant trigram over an 18-s distractor-filled period (Peterson & Peterson, 1959), articulatory suppression causes almost no loss over a similar time period (Vallar & Baddeley, 1982). Interference with articulatory processing can explain why articulatory suppression interfered with counting, for items over 4 in the array task and also for every list length within another task that involved enumeration of sequential events rather than simultaneous spatial arrays. The data of Logie and Baddeley thus do seem to support the distinction between subitizing and counting, but they do not necessarily support the FINST theory over the working memory limitation theory of subitizing.
Another type of evidence (from Trick & Pylyshyn, 1994b) involved a cue validity paradigm (a variation of the procedure developed by Posner, Snyder, & Davidson, 1980). On each trial in most of the experiments, two rectangles appeared, and dots were to appear in only one rectangle. The task was to count the dots. Sometimes, there would be a cue (an arrow pointing to one rectangle or a flashing rectangle) to indicate slightly in advance which rectangle probably would contain the dots. The cue was valid (giving correct information) on 80% of the cued trials and invalid (giving incorrect information) on 20% of the cued trials. On other trials, no informative cue was given. The validity of the cue affected performance in the counting range more than in the subitizing range, leading Trick and Pylyshyn (1994a) to view the results as supportive of the FINST theory. However, there was still some effect of cue validity in the subitizing range, so the result is less than definitive in comparing the FINST and working memory accounts of subitizing.
Atkinson, Campbell, and Francis (1976) and then Simon and Vaishnavi (1996) investigated enumeration within afterimages so that subjects would be unable to shift their gaze in a serial fashion using eye movements. Both studies found that the subitizing limit remained at 4 items, with errors in enumeration only above that number, even though subjects had a long time to view each afterimage. Therefore, it seems that a focal attention strategy involving eye movements is important for visual enumeration of over 4 items, but not at or below 4 items, the average number that subjects may be able to hold in the limited-capacity store at one time.
3.3.3. Multi-object tracking. Another, more recent line of research involves "multi-object tracking" of dots or small objects that move around on the computer screen (Pylyshyn & Storm, 1988; Yantis, 1992; for a recent review see Pylyshyn, Burkell, et al., 1994). In the basic procedure, before the objects move, some of them flash several times and then cease flashing. After that all of them wander randomly on the screen and, when they stop, the subject is to report which dots had been flashing. The flavor of the results is described well by Yantis (1992, p. 307): "Performance deteriorated as the number of elements to be tracked increased from 3 to 5 [out of 10 on the screen]; tracking three elements was viewed by most subjects as relatively easy, although not effortless, while tracking 5 of 10 elements was universally judged to be difficult if not impossible by some subjects." As in subitizing, one could use either FINST or working memory theories to account for this type of finding.
3.3.4. Proactive interference in short-term memory. One can observe proactive interference (PI) in retrieval only if there are more than 4 items in a list to be retained (Halford et al., 1988). This presumably occurs because 4 or fewer items are, in a sense, already retrieved; they reside in a limited-capacity store, eliminating the retrieval step in which PI arises. Halford et al. demonstrated this storage capacity limit in a novel and elegant manner. They used variant of Sternberg's (1966) memory search task, in which the subject receives a list of items and then a probe item and must indicate as quickly as possible whether the probe appeared in the list. In their version of the task, modeled after Wickens, Moody, and Dow (1981), lists came in sets of three, all of which were similar in semantic category (Experiment 1) or rhyme category (Experiment 2). Thus, the first trial in each set of three was a low-PI trial, whereas the last trial in the set was a high-PI trial. Experiment 1 showed that with lists of 10 items, there were PI effects. With a list length of 4, there was no PI. Experiment 2 showed that PI occurred for lists of 6 or more items, but not lists of 4 items. Presumably, the items within a list of 4 did not have to be retrieved because they all could be present within a capacity-limited store at the same time. Also consistent with this sort of interpretation, in 8- to 9-year-old children PI was observed with 4 items, but not 2 items in a list. The magnitude of growth of a capacity limit with age in childhood matches what was observed by Cowan et al. (1999) with a very different procedure (see Figure 2).
In the Halford et al. (1988) study, it was the length of the target list that was focused upon. We can learn more by examining also the effect of variations in the length of the list causing PI. Wickelgren's (1966) subjects copied a list of PI letters, a single letter to be recalled, and then a list of retroactive interference (RI) letters. The subject was to recall only the target letter. There were always 8 letters in one of the interference sets (the PI set for some subjects, the RI set for others), whereas the other interfering set could contain 0, 4, 8, or 16 letters. There was a large effect of the number of RI letters, with substantial differences between any two RI list lengths. In contrast, when it was the PI set that varied in length, there was a difference between 0 and 4 PI letters but very little effect of additional PI letters beyond 4. Wickelgren suggested that PI and RI both generate associative interference, whereas RI additionally generates another source of forgetting (either decay or storage interference). Thus, associative interference would have been limited primarily to the 4 closest interfering items on either side of the target.
A mechanism of PI in these situations can be suggested. It seems likely that excellent, PI-resistant recall occurs when the active contents of the limited-capacity store are to be recalled. When the desired information is no longer active, the long-term memory record of the correct former state of the limited-capacity store can be used as a cue to the recall of the desired item(s). If several former limited-capacity states were similar in content, it may be difficult to select the right one. Moreover, if the limited-capacity store serves as a workspace in which items become associated with one another (Baars, 1988; Cowan, 1995), then it might be difficult to select the correct item from among several present in the limited-capacity store simultaneously. The PI results described above could then be interpreted as follows. For studies in Halford et al. (1988), target lists of more than 4 items could not be held entirely within limited-capacity storage, so that a former state of the store had to be reconstituted. This could cause PI because some of the target items may have shared a former limited-capacity state with nearby items from a prior list, or because some of the other former limited-capacity states would have contained items resembling the correct item. For subjects in Wickelgren's (1966) study who received a variable number of PI letters, the target item would have been removed from the limited-capacity store by the presentation of 8 following RI letters. Therefore, at the time of recall, the subject would have had to identify the former state of the limited-capacity store that contained the single target letter. This same former state may also have included several of the adjacent letters, which could become confused with the target letter. Only 3 or so letters adjacent to the target letter usually would have been in the limited-capacity store at the same time as the target letter, and thus only those letters would contribute much to associative interference. In a broader context, this analysis may be one instance of a cue-overload theory of PI (cf. Glenberg & Swanson, 1996; Raaijmaker & Shiffrin , 1981; Tehan & Humphreys, 1996; O.C. Watkins & M.J. Watkins, 1975) asserting that recall is better when fewer test items are associated with the cue used to recall the required information.
3.4. Capacity limits estimated with indirect effects. So far we have discussed effects of the number of stimulus items on a performance measure directly related to the subject's task, in which recall of items in the focus of attention is required. It is also possible to observe effects that are related to the subject's task only indirectly by deriving a theoretical estimate of capacity from the presumed role of the focus of attention in processing.
3.4.1. Chunk size in immediate recall. The "magical number 4" lurks in the background of the seminal article by Miller (1956) on the magical number 7 + 2, which emphasized the process of grouping elements together to form larger meaningful units or "chunks." The arrangement of telephone numbers with groups of 3 and then 4 digits would not appear to be accidental, but rather an indication of how many elements can be comfortably held in the focus of attention at one time to allow the formation of a new chunk in long-term memory (Baars, 1988; Cowan, 1995). Several investigators have shown that short-term memory performance is best when items are grouped into sublists of no more than 3 or 4 (Broadbent, 1975; Ryan, 1969; Wickelgren, 1964).
The grouping limit even seems to apply for subjects who have learned how to repeat back strings of 80 or more digits (Ericsson, Chase, & Faloon, 1980; Ericsson, 1985). These subjects did so by learning to form meaningful chunks out of small groups of digits, and then learning to group the chunks together to form "supergroups." At both the group and the supergroup levels, the capacity limit seems to apply, as described by Ericsson et al. (1980, p. 1182) for their first subject who increased his digit span greatly: "After all of this practice, can we conclude that S.F. increased his short-term memory capacity? There are several reasons to think not...The size of S.F.'s groups were almost always 3 and 4 digits, and he never generated a mnemonic association for more than 5 digits...He generally used three groups in his supergroups and, after some initial difficulty with five groups, never allowed more than four groups in a supergroup." Ericsson (1985) reviewed details of the hierarchical grouping structure in the increased-digit-span subjects and he reviewed other studies of memory experts, which also revealed a similar grouping limit of 3-5 items. This limit to the grouping process would make sense if the items or groups to be further grouped together must reside in a common, central workspace so that they can be linked. A limited-capacity store might be conceived in this way, as a workspace in which items are linked together (Baars, 1988; Cowan, 1995). The focus of attention, the presumed locus of limited-capacity storage, presumably must shift back and forth from the super-group level to the group level.
3.4.2. Cluster size in long-term recall. As Shiffrin (1993) pointed out, in a sense every cognitive task is a STM task because items must be active in a limited-capacity store at the time of recall, even though that sometimes can come about only through the reactivation of long-term memories. Assume, therefore, that it is necessary to represent items in the limited-capacity store to prepare them to be recalled. That sort of mechanism should apply not only to immediate recall, but also to long-term recall. Bursts of responses should be observed as an individual fills the limited short-term capacity, recalls the items held with that capacity, and then refills it with more information retrieved from long-term storage.
Broadbent (1975) used similar reasoning to motivate a study of grouping in long-term recall. He asked subjects to recall members of a learned category: the Seven Dwarfs, the seven colors of the rainbow, the countries of Europe, or the names of regular television programs. There were some important differences in the fluency of recall for these categories. However, measurement of the timing of recall showed bursts of 2, 3, or 4 items clustered together, and occasionally 5 items in a cluster. One of the 10 subjects produced a run of 6 rainbow colors, but otherwise the cluster sizes were below 6. Thus, this study provides evidence for a short-term memory capacity limit even as applied to recall from a category in long-term memory. Wilkes (1975) reviewed similar results in a more detailed study of pausing within recall. Bower and Winzenz (1969, cited in Wilkes, 1975) showed that repetitions of a digit string within a longer series resulted in improved recall over time only if the grouping structure did not change each time the repetition occurred.
Mandler (1967) suggested that the size of the limited-capacity store is 5 + 2 items. He focused on a number of experiments in which the items to be recalled could be divided into a number of categories (e.g., fruits, clothing, vehicles). Recall is superior when a list is recalled by categories; in a free recall task, subjects tend to recall clusters organized by category even when the items were not ordered by category in the stimulus list. Mandler suggested that subjects left to their own devices typically could recall 5 + 2 categories and 5 + 2 items from each category that was recalled (e.g., "apple, orange, banana, grape; shirt, pants, hat," and so on). (Mandler also relied on Tulving & Patkau, 1962; and Tulving & Pearlstone, 1966).
The recall of separate items from a higher-level category in memory might be considered analogous to the retrieval of separate items from an array represented in sensory memory. In the case of the sensory array, the items are related in that they all are part of a common visual field, but are nevertheless separate perceptually (assuming they are not organized into larger perceptual objects such as a cluster of rounded letters surrounded by angular letters). In memory, the items are related in that they are associated to a common semantic category, but are nevertheless separate conceptually (assuming they are not organized into clusters such as salt-and-pepper within the spice category). The clustering of items in recall presumably depends on the absence of an automatized routine for recall. Thus, one should not expect clustering into groups of about 4 items for a task like recitation of the alphabet. The same applies to extensive intra-category knowledge that can result in the recall of large chunks structures or templates (Gobet & Simon, 1996, 1998).
When one focuses on flawless recall, the number is closer to 3 or 4. Thus, Mandler's (1967, Fig. 7) summarization of recall per category shows that when there were only 1-3 items in a category, these items were recalled flawlessly (provided that the category was recalled at all). The number recalled within the category declined rather steadily thereafter, from about 80% recalled from categories with 4-6 items to about 20% recalled from categories of about 80 items. The growing absolute number of items recalled from larger categories might be attributed to covert subcategorization or to long-term learning mechanisms, but these will not allow the recall of all items.
Why should similar constraints apply to the recall of items within a category and to the recall of categories? One explanation is that the limited-capacity store can be used in more than one iteration. At one moment the categories are drawn into mind, and at the next moment the capacity is consumed with items within the first category to be recalled. An obvious question about such an account is how the categories are retained while the limited capacity is being used for items within a category. Presumably the immediate consequence of the limited capacity store is to form a better-organized long-term representation of the stimulus set. Thus, once a set of categories is brought into mind, these categories are combined into a long-term memory representation that can be accessed again as needed in the task. Ericsson and Kintsch (1995) provided a detailed account of that sort of process.
Finally, it should be noted that the capacity limit provides only an upper bound for the clustering of items in recall. If the rate of retrieval from memory is too slow, it might make sense for the individual to recall the items deposited in a capacity-limited store before that capacity has been used up. Gruenewald and Lockhead (1980) obtained a pattern of results in which the long-term recall of category exemplars occurred (according to their criteria) most often without clustering and in clusters of two, three, four, or five in decreasing order of frequency. In contrast, Graesser and Mandler (1978, p. 96) observed a limit in items per cluster hovering around 4.0 throughout a long recall protocol. A prediction of the present analysis is that if a particular procedure results in a limit smaller than about 4 items, extended practice with the task should eventually lead to a plateau in performance at about 4 items per cluster.
3.4.3. Positional uncertainty in recall. In a theoretical mechanism discussed above and used earlier by Raaijmakers and Shiffrin (1981), items held simultaneously in the limited-capacity store become associated with one another. Additionally, their serial positions may become associated with one another. In free recall, the associations can be helpful because the thought of one item elicits the associated items. However, in serial recall, the associations can present a problem because it may be difficult to retrieve the order of simultaneously held items. This type of account might explain positional uncertainty in serial recall. It predicts that an item typically should be recalled no more than 3 positions away from its correct position (assuming that sets of at most 4 items are present in limited-capacity storage at an moment). This prediction matches the data well. For example, Nairne (1991) presented words aloud with 2.5-s onset-to-onset times between the words. Five lists of 5 words each were followed by a 2-min distractor task and then a test in which the words were to be placed into their correct lists and locations within lists. The results showed that when an item was placed within the correct list, its serial position was confused with at most 3 other serial positions in the list. Nairne (1992) found a flattening of the error functions with delayed testing. This change over time is compatible with increased difficulty of accessing information from a particular prior state of the limited-capacity store, but with no evidence of a spread of uncertainty to a larger range of serial positions. Within-list confusions still occurred across about 3 items.
3.4.4. Analysis of the recency effect in recall. Watkins (1974) reviewed research in which a long list of verbal items was presented on each trial and the subject was to recall as many of those items as possible, without regard to the order of items. In such studies, recall is typically best for the most recent items. This recency effect has been viewed as the result of the use of dual memory mechanisms, with a short-term memory mechanism used only for the last few items (which typically are recalled first). Underlying this view is the finding that the recency effect is quickly eroded if a distractor-filled delay is imposed between the list and the recall cue (Glanzer & Cunitz, 1966; Postman & Phillips, 1965), whereas the rest of the recall function is unaffected. Several investigators have reasoned that it would be possible to estimate the contents of short-term memory by subtracting out the contribution of long-term memory, but it is not clear exactly what assumptions one should make in order to do so. Watkins ruled out some methods on the basis of logical considerations, and compared the results of several favored methods (Tulving & Colotla, 1970; Tulving & Patterson, 1968; Waugh & Norman, 1965). Under a variety of test situations, these methods produced estimates of short-term memory capacity ranging from 2.21 to 3.43 (Watkins, 1974, Table 1). For the method judged most adequate (Tulving and Colotla, 1970) the estimates ranged from 2.93 to 3.35.
One apparent difficulty for this interpretation of the recency effect is that one can obtain it under filled test delays that are too protracted to permit the belief that a short-term store is still in place. Bjork and Whitten (1974) presented a series of printed word pairs for immediate free recall, with the pairs separated by silent intervals of 12 s and with up to 42 s following the last item pair. These silent intervals were filled with a distracting arithmetic task to prevent rehearsal. Under these conditions, a recency effect of 3-5 items emerged. The theoretical framework for understanding these results, further developed by Glenberg and Swanson (1986), was one having to do with the ratio between the inter-item interval and the retention interval, which could influence the temporal distinctiveness of the items at the time of recall. The temporal distinctiveness is higher under Bjork and Whitten's conditions for the last few item pairs than for previous item pairs, and the same can be said of the last few list items within the immediate recall conditions that Watkins (1974) had considered. Although the long-term recency effect challenges a time-limited memory explanation of the recency effect, it need not challenge a temporary memory capacity limit. A capacity-limited store could work in combination with the distinctiveness principles. The recall process could proceed in phases, each of which may involve the subject scanning the memory representation, transferring several items to the capacity-limited store, recalling those items, and then returning to the representation for another limited-capacity "handful" of items. It would make sense for the subject to recall the most recent, most distinctive items in the first retrieval cycle so as to avoid losing the distinctiveness advantage of those items. Assuming that a capacity-limited store (presumably the focus of attention) must intervene between a memory representation and recall, it is consistent with the long-term recency effect.
3.4.5. Sequential effects in implicit learning and memory. Implicit learning is a process in which information is learned without the awareness of the subject, and implicit memory is learned information that is revealed in an indirect test, without the subject having been questioned explicitly about the information. The role of a limited-capacity store in implicit learning and memory exists, though its nature is not yet clear (see Frensch & Miner, 1994; Nissen & Bullemer, 1987; Reber & Kotovsky, 1997). It is possible that the role of a limited-capacity store depends on whether learning and memory require associations between items that go beyond a simple chain of association between adjacent items. There are data supporting this conjecture and suggesting that implicit learning can take place if one need hold no more than 4 items in the capacity-limited store at one time.
Lewicki, Czyzewska, and Hoffman (1987) examined one situation in which contingencies were spread across seven trials in a row, but the capacity-limited store need not encompass all of those items. Lewicki et al. presented sets of 7 trials in a row in which the subject was to indicate which quadrant of the screen contained the target item, the digit 6 (using a keypress response). The first six trials in the set included only the target, but the seventh trial also included 35 foils distributed around the screen. Moreover, unbeknownst to the subject, the locations of targets on Trials 1, 3, 4, and 6, taken together, indicated where the target would be on the seventh, complex trial. Under these circumstances, subjects succeeded in learning the contingencies and there was a drop in performance when the contingencies were changed. However, as Stadler (1989) pointed out, any three of the four critical trials were enough to determine the location of the target on Trial 7 in a set. If subjects remembered the outcomes of Trials 3, 4, and 6 and considered them together, they theoretically could predict the outcome of Trial 7. Given that subjects probably didn't know which trials were predictive, they might only be able to use Trials 4, 5, and 6, the last three trials, for prediction of Trial 7. These trials by themselves were predictive on a probabilistic, though not an absolute, basis. Thus, a limited-capacity store of 4 items could serve for this purpose. Stadler (1989) extended the result and verified that the learning was implicit and not available to subjects' awareness.
Cleeremans and McClelland (1991) demonstrated more precisely that a sequence of 4 items in limited-capacity storage at one time may be enough to allow learning of the contingencies between those items. The task was to press one of six keys corresponding to stimuli at six screen locations. These locations were stimulated according to a finite state grammar on 85% of the trials, of which subjects were unaware. However, on the remaining, randomly selected 15% of the trials, the expectations according to the grammar were violated. The nature of the grammar was such that a prediction could be made only if one took into account a sequence of several previous stimuli. A sequential analysis of the reaction times showed that, after 13 experimental sessions, subjects became able to use a series of three stimuli to predict the location of the next, fourth stimulus. Even after 20 sessions, though, they remained unable to use a series of four stimuli to predict the location of the next, fifth stimulus. Thus, four seems to be the asymptotic value. (The authors presented a different theoretical account based on the diminishing predictive value of remote associations. However, the two accounts may not be mutually exclusive.)
McKone (1995) demonstrated a capacity limit in the sequences that can contribute to repetition priming in lexical decision or word naming. Series of words or nonwords were presented and there were repetitions of items with a variable number of different items intervening between the two instances of the repeated word (ranging from 0 to 23 intervening items). The measure of priming was a decrease in reaction time for the repeated word, suggesting that the representation of the first instance of that word was still active in memory. McKone concluded (p. 1108) that "for words, a large short-term priming component decayed rapidly but smoothly over the first 3 items" intervening between the instances of the repeated word, and then reached "a stable long-term value." This appears to be evidence of a series of about 4 consecutive items present in a limited-capacity store at any time, though a long-term store also contributes to priming as shown by the asymptotic level of residual priming at longer repetition lags.
An unresolved question stemming from McKone (1995) is why the priming declined smoothly over the last few items. When the presentation is sequential and there is no deliberate effort to retain any but the current item, as in this study, it is possible that the more recent items tend to be more strongly represented in the limited-capacity store. It also is possible that some of the most recent 4 items sometimes are replaced in the limited-capacity store by items from elsewhere in the list or from extraneous thoughts.
3.4.6. Influence of capacity on the properties of visual search. Fisher (1984) observed what appears to be a limit in the ability to examine items in a visual array in search of a well-learned target item. In previous work, Shiffrin and Schneider (1977) and Schneider and Shiffrin (1977) distinguished between variably-mapped searches (in which a foil on one trial might become the target on another trial) and consistently-mapped searches (in which the items that serve as targets never serve as foils). On each trial in their experiments, the subject knows what target or targets to search for, and indicates as rapidly and accurately as possible the presence or absence of the target item(s) in a visual array of items. It was proposed that variably-mapped searches require capacity-limited or "controlled" processing, whereas over many trials, a consistently-mapped search task comes to be performed automatically, without using controlled processing. Under those circumstances, it was shown that processing took place on all items in parallel, so that the reaction time to detect a target item was nearly unaffected by the number of items in the array.
Fisher (1984) proposed that capacity limits still might appear if the required rate of perceptual processing were fast enough. He re-examined this assumption using a task in which there were 20 stimulus arrays in rapid succession on each trial (with a pattern mask preceding the sequence of 20 and another following it). The duration of each array varied across trial blocks (with 10 durations between 40 and 200 msec per array), as did the array size (with 4 or 8 stimuli per array). The arrays were letters except for a single target item within one array, which was always the digit "5". The task was to indicate the spatial location of this target item out of 8 possible locations. This is a type of consistently-mapped task situation in which little practice is needed to achieve an automatic search because the digit and letter categories have been learned before the experiment. The results were analyzed in light of a mathematical model, the "steady-state limited-channel model," based on the following defining assumptions (Fisher, 1984, p. 453): "(1) encoded stimuli in the visual cortex are scanned once for placement on a comparison channel; (2) the time between arrivals of stimuli to the comparison channels is exponentially distributed with rate parameter l; (3) the time to compare a stimulus with a prespecified target is exponentially distributed with rate parameter m; (4) at most k comparison channels can execute in parallel; (5) stimuli in iconic memory are equally likely to be replaced by the characters or masks which appear next to the input streams; (6) masks are not placed on the comparison channels; and (7) the system is in a steady state. Note that it is assumed that the two dimensional coordinates of a stimulus are retained in the visual cortex..." The critical idea is that the representation of a stimulus is lost if that stimulus is offered to the comparison process at a time when there are no comparison channels free. Thus, the capacity limit in this situation is defined by the number of comparison channels, k. Within the field of queuing theory, Erlang's Loss Formula describes the problem and, using that formula, Fisher found that the data fit the formula best with k = 4. Assuming that the comparison channels are actually slots within some short-term storage mechanism, this result serves as another indication of its limited-capacity nature (also see Schweickert, Hayt, Hersberger, & Guentert, 1996).
3.4.7. Influence of capacity on mental addition reaction time. Logan (1988, Experiment 4) developed a task in which subjects had to verify equations like "B + 3 = E" (true in this example because E is 3 letters after B in the alphabet). The addend could be 2, 3, 4, or 5. Practice effects in this novel task followed a power function for addends of 2 through 4. However, for an addend of 5, the fit was much worse, and there was a discontinuity in the learning curves after about 24 presentations in which the reaction times for this addend dropped sharply. This discontinuity was linked to a strategy shift that many subjects reported. They reported that problems with an addend of 5 were much harder and led to a more deliberate learning strategy in which particular instances were memorized. Logan and Klapp (1991) replicated this finding. It might be further speculated that the discontinuity could occur because the numbers 1-4 can be visualized more clearly during the problem, serving as a place-holder during the adding process. Numbers of 5 and above may be difficult because the visualization of items to be added is hindered by the capacity limitation.
3.4.8. Mathematical modeling parameters. Attesting to the potential importance of the pure capacity limit, various articles presenting mathematical models of various complex cognitive processes have used the assumption that 4 items can be saved in a short-term store. These include, for example, the Kintsch and van Dijk (1978) model of text comprehension, the Raaijmakers and Shiffrin (1981) model of memory search (SAM), and the recent model of processing capacity by Halford et al. (1998). These models presumably use a limit of 4 because it maximizes the ability of the model to fit real data.
3.5. Empirical summary. In this review, care was taken to exclude situations in which the chunking of items was unclear (yielding compound STM estimates). The results of the remaining experimental situations, a wide variety of situations in fact, suggest that about 4 chunks can be held in a pure capacity-limited STM (presumably the focus of attention). The experimental means for groups of adults generally range from about 3 to 5 chunks, whereas individual subject means range more widely from 2 to 6 chunks.
3.6. Testability of the theoretical analysis. There are several
ways in which, in future research, one could invalidate the capacity estimates
of 3 to 5 chunks that have been derived from the many theoretical phenomena
described above. First, one could show that performance in these studies
results from a grouping process in which multiple items contribute to a
chunk. In many cases the argument against this was limited to a theoretical
rationale for why such chunking should be absent in particular circumstances
(see Section 1.2); few studies actually have provided direct evidence of
chunk size. Second, one could develop conditions in which more is done
to limit chunking and find smaller capacity estimates. Third, one could
find that there are hidden storage demands and that, when they are eliminated,
substantially larger capacity estimates arise (e.g., > 6 chunks). Fourth,
in a different vein, one could find low or zero correlations between different
estimates of storage capacity despite large individual differences in the
estimates. This last finding would not necessarily challenge the notion
that there are fixed capacity limits in a given domain, but it would challenge
the concept of a central capacity mechanism (e.g., the focus of attention)
that is the seat of the capacity limit across all domains.
4. Theoretical Account of Capacity Limits: Unresolved Issues
Below, I will address several fundamental theoretical questions about capacity limits. (1) Why does the capacity limit occur? (2) What is the nature of this limit: Is there a single capacity limit or are there multiple limits? (3) What are the implications of the present arguments for alternative theoretical accounts? (4) Finally, what are the boundaries of the central-capacity-limit account? An enigma in Miller (1956), regarding absolute judgments, will be touched upon to examine the potential breadth of the present framework.
4.1. Why the capacity limit? Future research must establish why there is a limit in capacity. One possible reason is that the capacity limit has become optimized through adaptive processes in evolution. Two relevant teleological arguments can be made on logical grounds. Recently, as well, arguments have been made concerning the physiological basis of storage capacity limits. Any such account must consider details of the data including the individual variability in the capacity limit estimates that have been observed as discussed in Section 3.1.3 above. These issues will be addressed in turn.
4.1.1. Teleological accounts. Several investigators have provided mathematical arguments relevant to what the most efficient size of working memory would be. Dirlam (1972) asked if there is any one chunk size that is more efficient than any other if it is to be the basis of a memory search. He assumed that STM is a multi-level, hierarchically structured system and that the search process is random. The nodes at a particular level are searched only until the correct one is found, at which time the search is confined to subnodes within that node. This process was assumed to continue until, at the lowest level of the hierarchy, the searched-for item is identified. In other words, the search was said to be self-terminating at each level of the hierarchy. Dirlam then asked what rule of chunking would minimize the expected number of total node and item accesses regardless of the number of items in the list, and calculated that the minimum would occur with an average chunk size of 3.59 items at each level of the hierarchy, in close agreement with the capacity of short-term memory that has been observed empirically in many situations (see above).
MacGregor (1987) asked a slightly different question: What is the maximal number of items for which a one-level system is more efficient than a two-level system? The consequences of both self-terminating search and exhaustive search assumptions were examined. A concrete example would help to explain how. Suppose that one received a list that included eight items. Further suppose that one had the option of representing this list either in an unorganized manner or as two higher-level chunks, each containing 4 items. With a self-terminating search method, if one had to search for a particular letter in the unorganized list, one would search on the average through 4.5 of the items (the average of the numbers 1 through 8). If one had to search through the list organized into two chunks, one would have to search on the average through 1.5 chunks to find the right chunk and then an average of 2.5 items within that chunk to find the right item, or 4.0 accesses in all. On the average, the hierarchical organization would be more efficient. With an exhaustive search method, if one had to search for a particular letter in the unorganized list, one would have to search through 8 items. For the organized list, one would need 2 searches to find the right chunk and then 4 searches to find the right item within that chunk, or 6 accesses in all. On the average, again, the organized list would be more efficient. In contrast, consider a self-terminating search for a list of 4 items that could be represented in an unorganized manner or as 2 chunks of 2 items each. The unorganized list would require an average of 2.5 searches whereas the organized list would require that 1.5 clusters and 1.5 items in that cluster be examined, for a total of 3.0 searches. In this case, the unorganized list is more efficient on average. MacGregor calculated that organizing list items into higher-level chunks is beneficial with an exhaustive or a self-terminating search when there are more than 4 or 5.83 items, respectively.
Although these theoretical findings depend on some untested assumptions (e.g., that the difficulty of search is the same at every level of a hierarchy), they do provide useful insight. The empirically observed capacity limit of about 4 chunks corresponds to what has been predicted for how many items can be advantageously processed in an ungrouped manner when the search is exhaustive (MacGregor, 1987). These theoretical and empirical limits may correspond because very rapid searches of unorganized lists are, in fact, exhaustive (Sternberg, 1966). However, slower, self-terminating searches along with more elaborate mental organization of the material also may be possible, and probably are advantageous if there is time to accomplish this mental organization. That possibility can help to explain why the empirically observed limit of about 4 chunks is close to the optimal chunk size when multiple levels of organization are permitted in a self-terminating search (Dirlam, 1972).
Another teleological analysis can be formulated on the basis of Kareev (1995). He suggested that a limited working memory is better than an unlimited one for detecting imperfect correlations between features in the environment. To take a hypothetical example, there could be a population with a 70% correlation between the height of an individual and the pitch of his or her voice. In a statistical procedure, when one uses a limited sample size to estimate the correlation (e.g., an observation of 4-8 individuals), the modal value of the observed correlation is larger than the population value. The smaller the sample size, the higher the modal value. Thus, a smaller sample size would increase the chances that a moderate correlation would be noticed at all. In human information processing, the limit in the sample size could be caused by the capacity limit of the observer's short-term memory; more samples may have been observed but the observer bases his or her perceived estimate of the correlation on only the number of examples that fit into the focus of attention at one time. Kareev, Lieberman, and Lev (1997) showed that, in fact, low-working-memory subjects were more likely to notice a population correlation of .2 - .6. In this regard, it bears mention that in the statistical sampling procedure, the modal value of the sample correlations for sample sizes of 6 and 8 were shown to be only moderately greater than the true population value (which was set at .6 or .7); but for a sample size of 4, the modal value of the sample correlations was almost 1.0. Here, then, is another reason to believe that a basic capacity limit of 4 could be advantageous. It could take a moderate correlation in the real world and turn it into a perceived strong correlation. At least, this could be advantageous to the extent that decisiveness in decision-making and definiteness in the perception of stimulus relationships are advantageous. For example, it makes sense to walk away from someone displaying traits that are moderately correlated with injurious behavior, and it makes sense to perceive that people usually say please when they are asking for a favor.
There is a strong similarity between the theoretical analysis of Kareev and earlier proposals that a large short-term memory capacity can be a liability rather than a strength in the early stages of language learning. Newport (1990) discussed a "less is more" hypothesis to explain why language learners who are developmentally immature at the time of initial learning have an advantage over more mature learners for some language constructs. An alternative to the nativist theory of language learning, this theory states that immature language learners grasp only small fragments of language at a time, which helps them to break up a complex language structure into smaller parts. Consistent with this proposal, Elman (1993) found that a computer implementation of a parallel distributed processing model of cognition learned complex language structure more easily if the short-term memory capacity of the model started out small and increased later in the learning process, rather than taking on its mature value at the beginning of learning.
Below, neurophysiological accounts of capacity limits will be reviewed. The teleological arguments still will be important to the extent that they can be seen as being consistent with the physiological mechanisms underlying capacity limits (or, better yet, motivating them).
4.1.2. Neurophysiological accounts. In recent years, a number of investigators have suggested a basis of capacity limits that can be traced back to information about how a single object is represented in the brain. In a theoretical article on visual shape recognition, Milner (1974, p. 532) suggested that "cells fired by the same figure fire together but not in synchrony with cells fired by other figures...Thus, features from a number of figures could be detected and transmitted through the network with little mutual interference, by a sort of time-sharing arrangement." In support of this hypothesis, Gray, König, Engel, and Singer (1989) in an experiment on cats, found that two columns of cortical cells that represented different portions of the visual field were active in a correlated manner only if they were stimulated by different portions of the same object, and not if they were stimulated by different objects. This led to the hypothesis that the synchronization of activity for various features represents the binding of those features to form an object in perception or STM. More recently, these findings have been extended to humans. Tiitinen et al. (1993) found that the 40-Hz oscillatory cycle, upon which these synchronizations are thought to ride, is enhanced by attention in humans. Rodriguez et al. (1999) reported electrical synchronizations between certain widely separated scalp locations 180-360 msec after a stimulus was presented when an object (a silhouetted human profile) was perceived, but not when a random field (actually an upside down profile not detected as such) was perceived. The scalp locations appeared to implicate the parietal lobes, which Cowan (1995) also proposed to be areas involved in the integration of features to form objects. Miltner, Braun, Arnold, Witte, and Taub (1999) further showed that the binding can take place not only between perceptual features, but also between a feature and an activated mental concept. Specifically, cyclic activity in the gamma (20-70 Hz) band was synchronized between several areas of the brain in the time period after the presentation of a conditioned stimulus (CS+), a color illuminating the room, but before the presentation of the unconditioned stimulus (UCS), electric shock that, as the subjects had learned, followed the conditioned stimulus. No such synchronization occurred after a different color (CS-) that did not lead to electric shock.
If objects and meaningful events can be carried in the synchronized activity of gamma wave activity in the brain, then the question for STM capacity becomes, "How many objects or events can be represented simultaneously in the brain?" Investigators have discussed that. Lisman and Idiart (1995) suggested that "each memory is stored in a different high-frequency ('40 hertz') subcycle of a low-frequency oscillation. Memory patterns repeat on each low-frequency (5 to 12 hertz) oscillation, a repetition that relies on activity dependent changes in membrane excitability rather than reverberatory circuits." In other words, the number of subcycles that fit into a low-frequency cycle would define the number of items that could be held in a capacity-limited STM. This suggestion was intended by Lisman and Idiart to motivate the existence of a memory span of about seven items (e.g., [40 subcycles / sec] / [5.7 cycles / sec] = 7 subcycles / cycle). However, it could just as well be used to motivate a basic capacity of about 4 items (e.g., [40 subcycles / sec] / [10 cycles / sec] = 4 subcycles / cycle). This proposal also was intended to account for the speed of retrieval of information stored in the capacity-limited STM but, again, just as well fits the 4-item limit. If 40 subcycles occur per second then each subcycle takes 25 msec, a fair estimate of the time it takes to search one item in STM (Sternberg, 1966). Luck and Vogel (1998) made a proposal similar to Lisman and Idiart but made it explicit that the representation of each item in STM would involve the synchronization of neural firing representing the features of the item. The STM capacity limit would occur because two sets of feature detectors that fire simultaneously produce a spurious synchronization corrupting memory by seeming to come from one object.
Other theorists (Hummel and Holyoak, 1997; Shastri and Ajjanagadde, 1993) have applied this neural synchronization principle in a way that is more abstract. It can serve as an alternative compatible with Halford et al.'s (1998) basic notion of a limit on the complexity of relations between concepts, though Halford et al. instead worked with a more symbolically based model in which "the amount of information that can be represented by a single vector is not significantly limited, but the number of vectors that can be bound in one representation of a relation is limited" (p. 821). Shastri and Ajjanagadde (1993) formulated a physiological theory of working memory very similar to Lisman and Idiart (1995), except that the theory was meant to explain "a limited-capacity dynamic working memory that temporarily holds information during an episode of reflexive reasoning" (p. 442), meaning reasoning that can be carried out "rapidly, spontaneously, and without conscious effort" (p. 418). The information was said to be held as concepts or predicates that were in the form of complex chunks; thus, it was cautioned, "Note that the activation of an entity together with all its active superconcepts counts as only one entity" (p. 443). It was remarked that the bound on the number of entities in working memory, derived from facts of neural oscillation, falls in the 7 + 2 range; but the argument was not precise enough to distinguish that from the lower estimate offered in the present paper. Hummel and Holyoak (1997) brought up similar concepts in their theory of thinking with analogies. They defined "dynamic binding" (a term that Shastri and Ajjanagadde also relied upon to describe how entities came about) as a situation in which "units representing case roles are temporarily bound to units representing the fillers of those roles" (p. 433). They estimated the limit of dynamic binding links as "between four and six" (p. 434). In both the approaches of Shastri and Ajjanagadde (1993) and Hummel and Holyoak (1997), these small limits were supplemented with data structures in long term memory or "static bindings" that appear to operate in the same manner as the long-term working memory of Ericsson and Kintsch (1995), presumably providing the "active superconcepts" that Shastri and Ajjanagadde mentioned.
One problem for the interpretation of synchronous oscillations of nervous tissue is that they can be observed even in lower animals in situations that appear to have little to do with the possibility of conscious awareness of particular stimuli (e.g., Braun, Wissing, Schäfer, & Hirsch, 1994; Kirschfeld, 1992). This, in itself, need not invalidate the role of oscillations in binding together the features of an object or the objects in a capacity-limited store in humans. It could be the case that mechanisms already present in lower animals form the basis of more advanced skills in more advanced animals, just as the voice apparatus is necessary for speech but is present even in non-speaking species. Thus, von der Malsburg (1995, p. 524) noted that "As to the binding mechanism based on temporal signal correlations, its great advantage [is] being undemanding in terms of structural requirements and consequently ubiquitously available and extremely flexible..."
4.1.3. Reconciliation of teleological and neurophysiological accounts. One concern here is whether the teleological and physiological accounts of capacity limits are consistent or inconsistent with one another. The process of scanning through the items in STM has been employed theoretically by both the teleological and the physiological theorists. For example, the teleological argument that MacGregor (1987) built using an exhaustive scan resulted in the conclusion that the scan would be most efficient if the number of items per group were 4. This conclusion was based on the assumption that the amount of time it takes to access a group to determine whether a particular item is present within it is equal to the amount of time it then takes to access each item within the appropriate group once that group is selected, so as finally to identify the probed item. This concept can be mapped directly onto the concept of the set of items (or chunks) in capacity-limited STM being represented by a single cycle of a low-frequency oscillation (5 to 12 Hz) with each item mapped onto a different cycle of a 40-Hz oscillation, riding on top of the 5 to 12 Hz oscillation. These figures are in line with the teleological data and memory capacity data reviewed above if the rate for the slow oscillation is close to about 10 Hz, so that four items would fit in each of the slower cycles. As suggested by Shastri and Ajjanagadde (1993) and others, the cyclic search process could be employed recursively. For example, at one point in a probed recognition process there could be up to 4 chunks in the capacity-limited store. Once the correct chunk is identified, the contents of STM would be replaced by the items contained within that chunk, now "unpacked," so that the contents of the chunk can be scanned in detail. In present theoretical terms, the focus of attention need not focus on multiple levels of representation at the same time.
4.1.4. What is the basis of individual differences? We will not have a good understanding of capacity limits until we are able to understand the basis of the marked developmental and individual differences in measured capacity that were observed by Cowan et al. (1999) and comparable individual differences observed in other procedures (Henderson, 1972; Luck & Vogel, personal communication, January 18, 1999). One possible basis would be individual differences in the ratio of slow to fast oscillatory rhythms. Miltner et al. found most rapid oscillatory activity at 37-43 Hz, but some residual activity at 30-37 Hz and 43-48 Hz. One can combine a 12-Hz slow cycle with a 30-Hz rapid cycle to predict the low end of the range of memory capacities (12/30 = 2.5 items), or one can combine an 8-Hz slow cycle with a 48-Hz fast cycle to predict the high end of the range (8/48 = 6 items). According to these figures, however, one would not expect the slow cycle to go below 8 Hz given the capacity limits observed empirically. Here, then, is a physiological prediction based on a combination of existing physiological and behavioral results. An important next step may be the acquisition of data that can help to evaluate the psychological plausibility of the theoretical constructs surrounding this type of theory. As one promising example, the finding of Tiitinen et al. (1993) that the 40-Hz neural cycle is enhanced by attention is consistent with the present suggestion that the fundamental storage capacity limit of about 4 items is based on the 40-Hz cycle and is in essence a limit in the capacity of the focus of attention. It is easy to see how research on this topic also could clarify the basis of individual differences in capacity. Specifically, one could determine if individual differences in oscillatory rates mirror behavioral differences in the limited storage capacity.
It remains to be explained why attended speech shows such an intriguing, simple relationship to unattended speech (Figure 4) in which attended speech is increased above the unattended speech limit by a variable amount. This figure makes it apparent that individuals use the same processes in both conditions, plus supplementary processes for attended speech. This difference might be accounted for most simply by the process of chunking (formation of inter-item associations) during attended list presentations.
It is important not to become too reductionistic in the interpretation of biological effects. It is possible that stimulus factors and/or behavioral states modulate biological cycle frequencies under some circumstances. Some studies with an automatized response or a rapid response have resulted in smaller individual differences. The highly trained subjects of Sperling (1960) appeared to produce capacity (whole report) estimates deviating from the population mean by no more than about 0.5 items, although there were few subjects. In an enumeration task in which a reaction time measure defined the subitizing range, Chi and Klahr (1975) found no difference between 5- and 6-year-olds versus adults in the subitizing range. Perhaps there is an intrinsic, baseline capacity of the focus of attention that shows few differences between individuals, and perhaps under some circumstances but not others, the level and direction of effort at the time of recall modulate that capacity. Further study of individual differences in memory capacity is thus likely to be important theoretically.
4.2. Central capacity or separate capacities? In most of the research that I have discussed, the capacity limit is examined with a coherent field of stimulation. I have not directly tackled the question of whether there is one central capacity limit or whether there are separate limits for domains of cognition (e.g., separate capacities for the visual versus auditory modalities; for verbal versus spatial representational codes; or, perhaps, for two collections of items distinguished by various other physical or semantic features). According to the models of Cowan (1988, 1995) and Engle et al. (1999), the capacity limit would be a central one (the focus of attention). Some fine points must be kept in mind on what should count as evidence for or against a central limit.
Ideally, evidence for or against a central capacity limit could be obtained by looking at the number of items recalled in two tasks, A and B, and then determining whether the total number of items recalled on a trial can be increased by presenting A and B together and adding the number of items recalled in the two tasks. For example, suppose that one can recall 3 items in Task A and 4 items in Task B, and that one can recall 6 items all together in a combined, A + B task. Although performance on the component tasks is diminished when the tasks are carried out together, the total number of items recalled is greater than for either task presented alone. This savings would serve as initial evidence for the existence of separate storage mechanisms (with or without an additional, central storage mechanism). Further, if there were no diminution of performance in either task when they were combined, that would serve as evidence against the central storage mechanism or capacity limit.
This type of reasoning can be used only with important limitations, however. As discussed above, several different mechanisms contribute to recall, including not only the capacity-limited focus of attention, but also the time- or interference-limited sources of activation of long-term memory (sensory stores, phonological and spatial stores, and so on). If the focus of attention could shift from examining one source of activation to examining another dissimilar source, it would be possible to recall items from Task A and then shift attention to activated memory representations of the items in Task B, bringing them into the focus of attention for recall in turn. If all of the information need not be entered into the focus of attention at one time, performance in the combined task would overestimate central storage capacity. This possibility contaminates many types of evidence that initially look as if they could provide support for multiple capacity-limited stores. These include various studies showing that one can recall more in two tasks with different types of materials combined than in a single task, especially if the modalities or types of representations are very different (Baddeley, 1986; Frick, 1984; Greene, 1989; Henderson, 1972; Klapp & Netick, 1988; Luck & Vogel, 1997; Martin, 1980; Penney, 1980; Reisberg, Rappaport, & O'Shaughnessy, 1984; Sanders & Schroots, 1969; Shah & Miyake, 1996).
Theoretically, it should be possible to overcome methodological problems in order to determine if there are true multiple capacity limits. One could make it impossible for the subject to rehearse items during presentation of the materials by using complex arrays of two types concurrently; perhaps concurrent visual and auditory arrays. It would also be necessary to make sure that the focus of attention could not be used recursively, shifting from one type of activated material to the next for recall. If the activated representations were sensory in nature, this recursive recall might be prevented simply by backward-masking one or both of the types of materials. These requirements do not seem to have been met in any extant study. Martin (1980) did use simultaneous left- and right-sided visual and auditory lists (4 channels at once, only 2 of them meaningful at once, with sequences of 4 stimuli presented at a fast, 2 / sec rate on each of the 4 channels). She found that memory for words presented concurrently to the left and right fields in the same modality was, on the average, 51.6% correct, whereas memory for pairs containing one printed and one spoken word was 76.9% correct. However, there was nothing to prevent the shifting of attention from visual to auditory sensory memory in turn.
Another methodological possibility is to document the shifting of attention rather than preventing it. This can be accomplished with reaction time measures. One enumeration study is relevant. Atkinson, Francis, and Campbell (1976) presented two sets of dots separated by their organization into lines at different orientations, by two different colors, or by their organization into separate groups. Separation by spatial orientation or grouping was capable of eliminating errors when there was a total of 5 to 8 dots. Color separation reduced, but did not eliminate, errors. However, the grouping did not reduce the reaction times in any of these studies. It seems likely that some sort of apprehension process took place separately for each group of 4 or fewer dots and that the numbers of dots in each group were then added together. Inasmuch as the reaction times were not slower when the items were grouped, one reasonable interpretation is that subitizing in groups and then adding the groups is the normal enumeration process for fields of 5 or more dots, even when there are no physical cues for the groups. The addition of physical cues simply makes the subitizing process more accurate (though not faster). This study provides some support for Mandler's (1985) suggestion that the capacity limit is for sets of items that can be combined into a coherent scheme. By dividing the sensory field into two coherent, separable schemes, the effective limit can be increased; but different schemes or groups can become the limit of attention only one at a time, explaining why perceptual grouping cues increase accuracy without altering the reaction times.
Physiological studies also may help if they can show a reciprocity between tasks that do not appear to share specific processing modes. One study using event-related potentials (ERPs) by Sirevaag, Kramer, Coles, and Donchin (1989) is relevant. It involved two tasks with very little in common, both of which were effortful. In one task, the subject controlled a cursor using a joystick, in an attempt to track the movement of a moving target. The movement could be in one or two dimensions, always in discrete jumps, and the cursor could be controlled by either the velocity or the acceleration of the joystick, resulting in four levels of task difficulty. In the second task, administered concurrently, the subject heard a series of high and low tones and was to count the number of occurrences of one of the tones. The P300 component of ERP responses to both tasks was measured. This component is very attention-dependent. The finding was that, across conditions, the P300 to the tracking targets and the P300 to the tones exhibited a reciprocity. The larger the P300 was to the tracking targets, the smaller it was to the tones, and vice versa. The sum of the P300 amplitudes was practically constant across conditions. The simplest interpretation of these results is that there is a fixed capacity that can be divided among the two tasks in different proportions, and that the relative P300 amplitudes reflect these proportions.
In sum, the existing literature can be accounted for with the hypothesis that there is a single capacity-limited store that can be identified with the focus of attention. This store is supplemented with other storage mechanisms that are not capacity limited although they are limited by the passage of time and/or the presentation of similar interfering material. The focus of attention can shift from one type of activated memory to another and will recoup considerable information from each type if the materials represented are dissimilar.
4.3. Implications for Alternative Accounts of Information Processing
In light of the information and ideas that have been presented, it is important to reconsider alternative accounts of information processing and the question of their continued viability and plausibility.
4.3.1. The magical number seven plus or minus two. Although Miller (1956) offered his magical number only as a rhetorical device, the number did serve to characterize performance in many tasks. It has been taken more literally as a memory limit by many researchers (e.g., Lisman & Idiart, 1995). The present stance is that the number seven estimates a commonly obtained, compound capacity limit, rather than a pure capacity limit in which chunking has been eliminated. It occurs in circumstances in which the stimuli are individually attended at the time of encoding and steps have not been taken to eliminate chunking. What is needed, however, is an explanation of why this particular compound limit crops up fairly often when rehearsal is not prevented. One possibility is that this number reflects a certain reasonable degree of chunking. Most adults might be able to learn at most three chunks of information rapidly, each with perhaps three units, leading to a span of 9. The slightly lower estimates that are often obtained could result from the inability to learn the chunks quickly enough. However, these speculations are intended only to provoke further research into the basis of commonly obtained compound capacity limits. What is essential to point out for the present account is that these compound limits are too high to describe performance in the situations in which it can be assumed that the combination of items into higher-order chunks was severely limited or prevented.
4.3.2. The time-limitation account. The view that working memory is limited by the duration of unrehearsed information in various short-term buffers is exemplified by the model of Baddeley (1986). The research reviewed in the present article leaves open the question of whether time limitations exist (as explained in Section 1). However, whereas some have assumed that time limits can take the place of capacity limits, the evidence described in this article cannot be explained in this manner. In Baddeley's theory, memory span was said to be limited to the number of items that could be rehearsed in a repeating loop before their representations decay from the storage buffer in which they are held (in about 2 sec in the absence of rehearsal). If rehearsal is always articulatory in nature, though, this notion is inconsistent with findings that the memory span for idioms would imply a much longer rehearsal time than the memory span for individual characters (e.g., see Glanzer & Razel, 1974). Something other than just the memory's duration and the rate of articulatory rehearsal must limit recall.
The time-based account might be revived if a different means of rehearsal could be employed for idioms than for words. For example, subjects might be able to scan semantic nodes, each representing an idiom, and quickly reactivate them in that way without articulatory rehearsal of the idioms. According to a modified version of Baddeley's account, this scanning would have to be completed in about 2 sec to prevent decay of the original memory traces. However, even that modified time-based theory seems inadequate to account for situations in which the material to be recalled is presented in an array so quickly that rehearsal of any kind can contribute little to performance (e.g., Luck & Vogel, 1997). Also, any strictly time-based account has difficulty explaining why there is an asymptotic level of recall in partial report approximating 4 items with both auditory and visual presentation of characters, even though it takes much longer to reach that asymptote in audition (at least 4 s: Darwin et al., 1972) than in vision (at most 1 s: Sperling, 1960). The only way to preserve a time-based account would be to assume that the rate of extraction of information from sensory storage in the two modalities is a limiting factor and is, for some mysterious reason, inversely proportional to the duration of sensory storage, resulting in an asymptotic limit that does not depend on the duration of storage. It seems far simpler to assume a capacity limit.
4.3.3. The unitary storage account. Some theorists (e.g., Crowder, 1993) have assumed that there is no special short-term memory mechanism and that all memory may be explained according to a common set of rules. In one sense the present analysis is compatible with this view, in that the capacity limit applies not only to the recall of recently presented stimuli, but also to the recall of information from long-term memory (see Section 3.4.2). However, any successful account must distinguish between the vast information potentially obtainable from an individual, on one hand, and the small amount of information that can be obtained from that individual, or registered with the individual, in a short segment of time; the capacity limit. The focus of attention, which serves as the proposed basis of the capacity limit in the present approach, has not played a major role in unitary accounts that have been put forward to date, though it could be added without contradiction.
Given a unitary memory view expanded to consider the focus of attention, one could account for the 4-chunk limit on the grounds that every chunk added to the focus diminishes the distinctiveness of all of the chunks. Such a mechanism of indistinctiveness would be analogous to the one that has been used previously to account for the recency effect in serial recall (e.g., Bjork & Whitten, 1974); except that the dimension of similarity between chunks would be their concurrent presence in the focus of attention, not their adjacent serial locations within a list. One article written from a unitary memory view (Brown, Preece, & Hulme, in press) does attempt to account for the number of chunks available in one situation. Specifically, their account, based on oscillatory rhythms that become associated with items and contexts, correctly predicted that the serial recall of a nine-item list with overt rehearsal is optimal when the list is rehearsed in groups of three items. The explanation offered was that "This represents the point at which the optimal balance between across-group errors and within-group errors is reached in the model." The account of the 4-chunk limit offered earlier in this target article on the basis of neural oscillatory rhythms (4.1.2) is similar (albeit on a neural level of analysis). It states that, with the neural representation of too many chunks simultaneously, the representations begin to become confusable (e.g., Luck & Vogel, 1998). The critical difference between explanations is that the neural account offered by Luck and Vogel and the present article refers to particular frequencies of neural oscillation, whereas Brown et al. allowed various oscillators and did not make predictions constrained to particular frequencies of oscillation.
4.3.4. The scheduling account. It has been proposed that supposed capacity limits might be attributable to limits in the rate at which subjects can produce responses in a multi-task situation without risking making responses in the incorrect order (Meyer & Kieras, 1997). That theory appears more applicable to some situations than to others. In situations in which the limit occurs during reception of materials and fast responding is not required (e.g., Luck & Vogel, 1997), the theory seems inappropriate. That seems to be the case with most of the types of phenomena examined in the present article. It is unclear how a scheduling account could explain these phenomena without invoking a capacity notion.
4.3.5. The multiple-capacity account. Some theorists have suggested that there is not a single capacity limit, but rather limits in separate capacities (e.g., visual and auditory or spatial and verbal; see Wickens, 1984). I would suggest that, although there may well be various types of distinct processes and storage facilities in the human brain, there is no evidence that they are limited by capacity per se (as opposed to other limitations such as those imposed by decay and interference). Sections 1 and 3 of the present paper should illustrate that strict conditions must apply in order for chunk capacity limits to be clearly observed at all, free of other factors. Moreover, the finding of Sirevaag et al. (1989) of a tradeoff between tasks in the the P300 response magnitude in event-related potentials (discussed in Section 4.2) seems to indicate that very disparate types of processes still tap a common resource. Even the left and right hemispheres do not appear to operate independently. Holtzman and Gazzaniga (1992) found that split brain patients are impeded in responses made with one hemisphere when a concurrent load is imposed on the other hemisphere, despite the breakdown in informational transmission between the hemispheres through the corpus collosum. There thus appears to be some central resource that is used in disparate tasks, and by both hemispheres.
4.3.6. The storage versus processing capacities account. Daneman and Carpenter (1980), like many other investigators, have noted that a working memory storage load does not interfere with processing nearly as much as would be expected if storage and processing relied upon a common workspace. Halford et al. (1998) noted the storage limit of about 4 items but also proposed, parallel to that limit but separate from it, a processing limit in which the complexity of relations between items being processed is limited to 4 dimensions in adults (and to fewer dimensions in children). Thus, within processing, "complexity is defined as the number of related dimensions or sources of variation" (p. 803). For example, transitive inference is said to be a ternary relation because it can be reduced to such terms: "the premises 'Tom is smarter than John, John is smarter than Stan' can be integrated into the ternary relational instance monotonically-smarter (Tom, John, Stan)" (p. 821), an argument with three fillers. The parallel between processing and storage was said to be that "both attributes on dimensions [in processing] and chunks [in storage] are independent units of arbitrary size" (p. 803). However, the model did not explain why there was the coincidental similarity in the processing and storage limits, to about 4 units each.
According to the present view, both processing and storage would be assumed to rely on a common capacity limit. The reason is that, ultimately, what we take to be stored chunks in short-term memory (and what I have, for simplicity, described as such up to this point) actually are relations between chunks. It is not chunks per se that have to be held in short-term memory (as they in fact are part of long-term memory), but rather chunks in relation to some concept. For example, "in-present-array (x, q, r, b)" could describe the quaternary relation leading to a whole report response in Sternberg's (1960) procedure. "Monotonically-later (3-7, x, 2, 4-8)" could describe a quaternary relation leading to partially correct serial recall of an attended list of digits for which 3-7 is a memorized initial chunk; x represents a placemarker for a digit that cannot be recalled; 2 represents an unchunked digit; and 4-8 represents another memorized chunk.
If this analysis is correct, there is no reason to expect a separation between processing and storage. The reason why a storage load does not much interfere with processing is that the storage load and the process do not have to be expanded in the focus of attention at the same time. Although both are activated at the same time, there is no capacity limit on this activation, only with its use (cf. Schneider & Detweiler, 1987). The subject might only hold in the focus of attention a pointer to the activated, stored information while carrying out the processing, and then the subject could shift the focus to the stored information when necessary to recall the memory load.
4.3.7. The task-specific capacities account. A skeptic might simply assume that although there are capacity limits, they vary from situation to situation for reasons that we cannot yet understand. This type of view probably cannot be answered through reasoned discourse as it depends on a different judgment of the presented evidence. Further assessment of the view that there is a fixed underlying capacity could be strengthened by subsequent research in which new conditions are tested and found to conform to or violate the capacity limit. Numerous examples of novel conditions leading to the predicted limit can be given, but two of them are as follows. First, in the research by Cowan et al. (1999), a capacity limit for ignored speech was expected to be similar to those that have been obtained for attended visual arrays (Luck & Vogel, 1997; Sperling, 1960) on the grounds that the task demands were logically analogous, even though the materials were very different. That expectation was met. Second, it was expected that one could observe the capacity limit by limiting rehearsal for spoken lists, and that expectation provided a very similar limit in numerous published experiments (as shown in Table 2). A third example has yet to be tested. Specifically, it was predicted (in Section 1.2.1) that the capacity limit could be observed in a modified n-back task in which subjects must only indicate, as rapidly as possible, if a particular item has been included in the stimulus set previously, and in which some items would be repeated in the set but other, novel items also would be introduced.
4.4. Boundaries of the central-capacity-limit account. The boundaries
of the present type of analysis have yet to be examined. For example, Miller
(1956) indicated that absolute judgments in perception are limited in a
way that is not clear; apparently not in chunks as for other types of phenomena.
For unidimensional stimuli the limit appears to be up to about 7 categories
that can be used consistently, but the limit in the number of total categories
is considerably higher for multidimensional stimuli (e.g., judgments of
tones differing in both intensity and pitch). One possibility is that the
subject need only retain, in short-term memory, pointers to the dimensions
while accessing category divisions one dimension at a time. Because faculties
that are not specifically capacity-limited, such as sensory memory, can
be used for supplementary storage, the focus of attention is free to shift
to allow the sequential use of the capacity limit to judge the stimulus
on different dimensions, one at a time. This analysis might be tested with
absolute judgments for backward-masked stimuli, as backward masking would
prevent sensory storage from holding information while the focus of attention
is shifted from one dimension to another. Thus, as in this example, the
capacity concept potentially might have a broad scope of application indeed.
5. Conclusion
In this target article I have stressed several points. The first is the remarkable degree of similarity in the capacity limit in working memory observed with a wide range of procedures. A restricted set of conditions is necessary to observe this limit. It can be observed only with procedures that allow assumptions about what the independent chunks are, and that limit the recursive use of the limited-capacity store (in which it is applied first to one kind of activated representation and then to another type). The preponderance of evidence from procedures fitting these conditions strongly suggests a mean memory capacity in adults of 3 to 5 chunks, whereas individual scores appear to range more widely from about 2 up to about 6 chunks. The evidence for this pure capacity limit is considerably more extensive than that for the somewhat higher limit of 7 + 2 stimuli; that higher limit is valid nevertheless as a commonly observed, compound STM limit for materials that allow on-line rehearsal, chunking, and memorization, for which the exact number of chunks in memory cannot be ascertained. The fundamental capacity limit appears to coincide with conditions in which the chunks are held in the focus of attention at one time; so it is the focus of attention that appears to be capacity-limited.
When the material to be remembered is diverse (e.g., some items spoken and some printed; some words and some tones; or some verbal and some nonverbal items), the scene is not coherent and multiple retrievals result in considerably better recall. This all suggests that the focus of attention, as a capacity-limited storage mechanism, can shift from one type of material to another or from one level of organization to another, and that the individual is only aware of the handful of separate units of a related type within a scene at any one moment (Cowan, 1995; Mandler, 1985).
References
Anderson, J.R., & Matessa, M. (1997). A production system theory of serial memory. Psychological Review, 104, 728-748.
Atkinson, J., Campbell, F.W., & Francis, M.R. (1976). The magic number 4 + 0: A new look at visual numerosity. Perception, 5, 327-334.
Atkinson, J., Francis, M.R., & Campbell, F.W. (1976). The dependence of the visual numerosity limit on orientation, colour, and grouping of the stimulus. Perception, 5, 335-342.
Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes. In K. W. Spence & J. T. Spence (Eds.), The psychology of learning and motivation: Advances in research and theory (Vol. 2, pp. 89-195). New York: Academic Press.
Avons, S.E., Wright, K.L., & Pammer, K. (1994). The word-length effect in probed and serial recall. Quarterly Journal of Experimental Psychology, 47A, 207-231.
Baars, B.J. (1988). A cognitive theory of consciousness. London: Cambridge University Press.
Baddeley, A.D. (1986). Working memory. Oxford: Clarendon Press.
Baddeley, A. (1992). Working memory. Science, 255, 556-559.
Baddeley, A., Lewis, V. & Vallar, G. (1984). Exploring the articulatory loop. The Quarterly Journal of Experimental Psychology, 36A, 233-252.
Besner, D. (1987). Phonology, lexical access in reading, and articulatory suppression: A critical review. Quarterly Journal of Experimental Psychology, 39A, 467-478.
Bjork, R.A., & Whitten, W.B. (1974). Recency-sensitive retrieval processes in long-term free recall. Cognitive Psychology, 6, 173-189.
Bower, G.H., & Winzenz, D. (1969). Group structure, coding and memory for digit series. Journal of Experimental Psychology Monographs, 80, 1-17.
Braun, H.A., Wissing, H., Schäfer, K., & Hirsch, M.C. (1994). Oscillation and noise determine signal transduction in shark multimodal sensory cells. Nature, 367, 270-273.
Broadbent, D.E. (1958). Perception and communication. London: Pergamon Press.
Broadbent, D.E. (1975). The magic number seven after fifteen years. In A. Kennedy & A. Wilkes (eds.), Studies in long-term memory. Wiley. (pp. 3-18)
Brown, G.D.A., & Hulme, C. (1995). Modeling item length effects in memory span: No rehearsal needed? Journal of Memory & Language, 34, 594-621.
Brown, G.D.A., Preece, T., & Hulme, C. (in press). Oscillator-based memory for serial order. Psychological Review.
Cardozo, B.L., & Leopold, F.F. (1963). Human code transmission. Ergonomics, 6, 133-141.
Chase, W., & Simon, H.A. (1973). The mind's eye in chess. In W.G. Chase (ed.), Visual information processing. New York: Academic Press. (pp. 215-281)
Cherry, E.C. (1953). Some experiments on the recognition of speech, with one and with two ears. The Journal of the Acoustical Society of America, 25(5), 975-979.
Chi, M.T.H., & Klahr, D. (1975). Span and rate of apprehension in children and adults. Journal of Experimental Child Psychology, 19, 434-439.
Cleeremans, A., & McClelland, J.L. (1991). Learning the structure of event sequences. Journal of Experimental Psychology: General, 120, 235-253.
Cohen, J.D., Perlstein, W.M., Braver, T.S., Nystrom, L.E., Noll, D.C., Jonides, J., & Smith, E.E. (1997). Temporal dynamics of brain activation during a working memory task. Nature, 386, 604-608.
Corballis, M.C. (1967). Serial order in recognition and recall. Journal of Experimental Psychology, 74, 99-105.
Cowan, N. (1988). Evolving conceptions of memory storage, selective attention, and their mutual constraints within the human information processing system. Psychological Bulletin, 104, 163-191.
Cowan, N. (1995). Attention and memory: An integrated framework. Oxford Psychology Series, No. 26. New York: Oxford University Press.
Cowan, N., Cartwright, C., Winterowd, C., & Sherk, M. (1987). An adult model of preschool children's speech memory. Memory and Cognition, 15, 511-517.
Cowan, N., Lichty, W., & Grove, T.R. (1990). Properties of memory for unattended spoken syllables. Journal of Experimental Psychology: Learning, Memory, & Cognition, 16, 258-269.
Cowan, N., Nugent, L.D., Elliott, E.M., Ponomarev, I., & Saults, J.S. (1999). The role of attention in the development of short-term memory: Age differences in the verbal span of apprehension. Child Development, 70, 1082-1097.
Cowan, N., Saults, J.S., & Nugent, L.D. (1997). The role of absolute and relative amounts of time in forgetting within immediate memory: The case of tone pitch comparisons. Psychonomic Bulletin & Review, 4, 393-397.
Cowan, N., Winkler, I., Teder, W., & Näätänen, R. (1993). Memory prerequisites of the mismatch negativity in the auditory event-related potential (ERP). Journal of Experimental Psychology: Learning, Memory, & Cognition, 19, 909-921.
Cowan, N., Wood, N.L., Nugent, L.D., & Treisman, M. (1997). There are two word length effects in verbal short-term memory: Opposed effects of duration and complexity. Psychological Science, 8, 290-295.
Cowan, N., Wood, N.L., Wood, P.K., Keller, T.A., Nugent, L.D., & Keller, C.V. (1998) . Two separate verbal processing rates contributing to short-term memory span. Journal of Experimental Psychology: General, 127, 141-160.
Craik, F., Gardiner, J.M., & Watkins, M.J. (1970). Further evidence for a negative recency effect in free recall. Journal of Verbal Learning and Verbal Behavior, 9, 554-560.
Crowder, R.G. (1993). Short-term memory: Where do we stand? Memory & Cognition, 21, 142-145.
Daneman, M., & Carpenter, P.A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning & Verbal Behavior, 19, 450-466.
Daneman, M., & Merikle, P.M. (1996). Working memory and language comprehension: A Meta-Analysis. Psychonomic Bulletin & Review, 3, 422-433.
Darwin, C.J., Turvey, M.T., & Crowder, R.G. (1972). An auditory analogue of the Sperling partial report procedure: Evidence for brief auditory storage. Cognitive Psychology, 3, 255-267.
Dirlam, D.K. (1972). Most efficient chunk sizes. Cognitive Psychology, 3, 355-359.
Elman, J.L. (1993). Learning and development in neural networks: the importance of starting small. Cognition, 48, 71-99.
Engle, R.W., Kane, M.J., & Tuholski, S.W. (1999). Individual differences in working memory capacity and what they tell us about controlled attention, general fluid intelligence, and functions of the prefrontal cortex. In A. Miyake and P. Shah, Models of working memory: Mechanisms of active maintenance and executive control. Cambridge, UK: Cambridge University Press.
Ericsson, K.A. (1985). Memory skill. Canadian Journal of Psychology, 39, 188-231.
Ericsson, K.A., Chase, W.G., & Faloon, S. (1980). Acquisition of a memory skill. Science, 208, 1181-1182.
Ericsson, K.A., & Kintsch, W. (1995). Long-term working memory. Psychological Review, 102, 211-245.
Fisher, D.L. (1984). Central capacity limits in consistent mapping, visual search tasks: Four channels or more? Cognitive Psychology, 16, 449-484.
Frensch, P.A., & Miner, C.S. (1994). Effects of presentation rate and individual differences in short-term memory capacity on an indirect measure of serial learning. Memory & Cognition, 22, 95-110.
Frick, R.W. (1984). Using both an auditory and a visual short-term store to increase digit span. Memory & Cognition, 12, 507-514.
Glanzer, M., & Cunitz, A.R. (1966). Two storage mechanisms in free recall. Journal of Verbal Learning & Verbal Behavior, 5, 351-360.
Glanzer, M., & Razel, M. (1974). The size of the unit in short-term storage. Journal of Verbal Learning & Verbal Behavior, 13, 114-131.
Glenberg, A.M., & Swanson, N.C. (1986). A temporal distinctiveness theory of recency and modality effects. Journal of Experimental Psychology: Learning, Memory, & Cognition, 12, 3-15.
Gobet, F., & Simon, H.A. (1996). Templates in chess memory: A mechanism for recalling several boards. Cognitive Psychology, 31, 1-40.
Gobet, F., & Simon, H.A. (1998). Expert chess memory: Revisiting the chunking hypothesis. Memory, 6, 225-255.
Graesser II, A., & Mandler, G. (1978). Limited processing capacity constrains the storage of unrelated sets of words and retrieval from natural categories. Journal of Experimental Psychology: Human Learning and Memory, 4, 86-100.
Gray, C.M., König, P., Engel, A.K., and Singer, W. (1989). Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties. Nature, 338, 334-337.
Gray, J.A., & Wedderburn, A.A.I. (1960). Grouping strategies with simultaneous stimuli. Quarterly Journal of Experimental Psychology, 12, 180-184.
Greene, R.L. (1989). Immediate serial recall of mixed-modality lists. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 266-274.
Gruenewald, P.J., & Lockhead, G.R. (1980). The free recall of category examples. Journal of Experimental Psychology: Human Learning and Memory, 6, 225-240.
Guttentag, R. E. (1984). The mental effort requirement of cumulative rehearsal: A developmental study. Journal of Experimental Child Psychology, 37, 92-106.
Halford, G.S., Maybery, M.T., & Bain, J.D. (1988). Set-size effects in primary memory: An age-related capacity limitation? Memory & Cognition, 16, 480-487.
Halford, G.S., Wilson, W.H., & Phillips, S. (1998). Processing capacity defined by relational complexity: Implications for comparative, developmental, and cognitive psychology. Behavioral and Brain Sciences, 21, 723-802.
Hamilton, W. (1859). Lectures on metaphysics and logic. Vol. 1. Edinburgh: W. Blackwood.
Henderson, L. (1972). Spatial and verbal codes and the capacity of STM. Quarterly Journal of Experimental Psychology, 24, 485-495.
Hitch, G.J., Burgess, N., Towse, J.N., & Culpin, V. (1996). Temporal grouping effects in immediate recall: A working memory analysis. Quarterly Journal of Experimental Psychology, 49A, 116-139.
Holtzman, J.D., & Gazzaniga, M.S. (1982). Dual task interactions due exclusively to limits in processing resources. Science, 218, 1325-1327.
Hulme, C., Maughan,S., & Brown, G.D.A. (1991). Memory for familiar and unfamiliar words: Evidence for a long-term memory contribution to short-term memory span. Journal of Memory & Language, 30, 685-701.
Hummel, J.E., & Holyoak, K.J. (1997). Distributed representations of strucure: A theory of analogical access and mapping. Psychological Review, 104, 427-466.
Jacoby, L.L., Woloshyn, V., & Kelly, C. (1989). Becoming famous without being recognized: Unconscious influences of memory produced by divided attention. Journal of Experimental Psychology: General, 118, 115-125.
James, W. (1890). The principles of psychology. NY: Henry Holt.
Jevons, W.S. (1871). The power of numerical discrimination. Nature, 3, 281-282.
Jones, D., Farrand, P., Stuart, G., & Morris, N. (1995). Functional equivalence of verbal and spatial information in serial short-term memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 1008-1018.
Kareev, Y. (1995). Through a narrow window: Working memory capacity and the detection of covariation. Cognition, 56, 263-269.
Kareev, Y., Lieberman, I., & Lev, M. (1997). Through a narrow window: Sample size and the perception of correlation. Journal of Experimental Psychology: General, 126, 278-287.
Kaufman, E., Lord, M., Reese, T., & Volkmann, J. (1949). The discrimination of visual number. American Journal of Psychology, 62, 498-525.
Kintsch, W., & van Dijk, T.A. (1978). Toward a model of text comprehension and production. Psychological Review, 85, 363-394.
Kirschfeld, K. (1992). Oscillations in the insect brain: Do they correspond to thecortical -waves of vertebrates? Proceedings of the National Academy of Sciences, 89, 4764-4768.
Klapp, S.T., & Netick, A. (1988). Multiple resources for processing and storage in short-term working memory. Human Factors, 30, 617-632.
LaPointe, L.B., & Engle, R.W. (1990). Simple and complex word spans as measures of working memory capacity. Journal of Experimental Psychology: Learning, Memory, & Cognition, 16, 1118-1133.
Lewicki, P., Czyzewska, M., & Hoffman, H. (1987). Unconscious acquisition of complex procedural knowledge. Journal of Experimental Psychology, 13, 523-530.
Lisman, J.E., & Idiart, M.A.P. (1995). Storage of 7 + 2 short-term memories in oscillatory subcycles. Science, 267, 1512-1515.
Lockhead, G.R. (1970). Identification and the form of multi-dimensional discrimination space. Journal of Experimental Psychology, 85, 1-10.
Logan, G.D. (1988). Toward an instance theory of automatization. Psychological Review, 95, 492-527.
Logan, G.D., & Klapp, S.T. (1991). Automatizing alphabet arithmetic: I. Is extended practice necessary to produce automaticity? Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 179-195.
Logie, R.H., & Baddeley, A.D. (1987). Cognitive processes in counting. Journal of Experimental Psychology: Learning, Memory, & Cognition, 13, 310-326.
Logie, R.H., Gilhooly, K.J., & Wynn, V. (1994). Counting on working memory in arithmetic problem solving. Memory & Cognition, 22, 395-410.
Longoni, A.M., Richardson, J.T.E., & Aiello, A. (1993). Articulatory rehearsal and phonological storage in working memory. Memory & Cognition, 21, 11-22.
Luck, S.J., & Vogel, E.K. (1997). The capacity of visual working memory for features and conjunctions. Nature, 390, 279-281.
Luck, S.J., & Vogel, E.K. (1998). Response from Luck and Vogel. (A response to "Visual and auditory working memory capacity," by N. Cowan, in the same issue.) Trends in Cognitive Sciences, 2, 78-80.
MacGregor, J.N. (1987). Short-term memory capacity: Limitation or optimization? Psychological Review, 94, 107-108.
Mandler, G. (1967). Organization and memory. In K.W. Spence & J.T. Spence (eds.), The psychology of learning and motivation, Vol. 1. New York: Academic Press. (pp. 327-372)
Mandler, G. (1975). Memory storage and retrieval: Some limits on the reach of attention and consciousness. In P.M.A. Rabbitt & S. Dornic (Eds.), Attention and performance V. New York: Academic Press.
Mandler, G. (1985). Cognitive psychology: An essay in cognitive science. Hillsdale, NJ: Erlbaum.
Mandler, G., & Shebo, B.J. (1982). Subitizing: An analysis of its component processes. Journal of Experimental Psychology: General, 111, 1-22.
Martin, M. (1980). Attention to words in different modalities: Four-channel presentation with physical and semantic selection. Acta Psychologica, 44, 99-115.
McGeoch, J.A. (1932). Forgetting and the law of disuse. Psychological Review, 39, 352-370.
McKone, E. (1995). Short-term implicit memory for words and nonwords. Journal of Experimental Psychology: Learning, Memory, & Cognition, 21, 1108-1126.
McLean, R.S., & Gregg, L.W. (1967). Effects of induced chunking on temporal aspects of serial recitation. Journal of Experimental Psychology, 74, 455-459.
Melton, A.W. (1963). Implications of short-term memory for a general theory of memory. Journal of Verbal Learning and Verbal Behavior, 2, 1-21.
Meyer, D.E., & Kieras, D.E. (1997). A computational theory of executive processes and multiple-task performance: Part 1. Basic mechanisms. Psychological Review, 104, 3-65.
Miller, G.A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81-97. http://cogprints.soton.ac.uk/archives/psyc/papers/199807/199807022/doc.html/miller.html
Miller, G.A., & Selfridge, J.A. (1950). Verbal context and the recall of meaningful material. American Journal of Psychology, 63, 176-185.
Milner, P.M. (1974). A model for visual shape recognition. Psychological Review, 81, 521-535.
Miltner, W.H.R., Braun, C., Arnold, M., Witte, H., & Taub, E. (1999). Coherence of gamma-band EEG activity as a basis for associative learning. Nature, 397, 434-436.
Moray, N. (1959). Attention in dichotic listening: Affective cues and the influence of instructions. Quarterly Journal of Experimental Psychology, 11, 56-60.
Murray, D.J. (1968). Articulation and acoustic confusability in short-term memory. Journal of Experimental Psychology, 78, 679-684.
Nairne, J.S. (1991). Positional uncertainty in long-term memory. Memory & Cognition, 19, 332-340.
Nairne, J.S. (1992). The loss of positional certainty in long-term memory. Psychological Science, 3, 199-202.
Neath, I. (1998). Human memory: An introduction to research, data, and theory. Pacific Grove, CA: Brooks/Cole.
Neath, I., & Nairne, J.S. (1995). Word-length effects in immediate memory: Overwriting trace decay. Psychonomic Bulletin & Review, 2, 429-441.
Newport, E.L. (1990). Maturational constraints on language learning. Cognitive Science, 14, 11-29.
Nissen, M.J., & Bullemer, P. (1987). Attentional requirements of learning: Evidence from performance measures. Cognitive Psychology, 19, 1-32.
Pashler, H. (1988). Familiarity and visual change detection. Perception & Psychophysics, 44, 369-378.
Penney, C.G. (1980). Order of report in bisensory verbal short-term memory. Canadian Journal of Psychology, 34, 190-195.
Peterson, L.R., & Johnson, S.T. (1971). Some effects of minimizing articulation on short-term retention. Journal of Verbal Learning and Verbal Behavior, 10, 346-354.
Peterson, L. R. & Peterson, M. J. (1959). Short-term retention of individual verbal items. Journal of Experimental Psychology, 58, 193-198.
Pollack, I. (1953). The information in elementary auditory displays. II. Journal of the Acoustical Society of America, 25, 765-769.
Pollack, I., Johnson, I.B., & Knaff, P.R. (1959). Running memory span. Journal of Experimental Psychology, 57, 137-146.
Posner, M.I. (1969). Abstraction and the process of recognition. In Bower, G.H., and Spence, J.T. (eds.), Psychology of learning and motivation. Vol. 3. New York: Academic Press. (pp. 43-100)
Posner, M., Snyder, C., & Davidson, B. (1980). Attention and detection of signals. Journal of Experimental Psychology: General, 109, 160-174.
Postman, L., & Phillips, L.W. (1965). Short-term temporal changes in free recall. Quarterly Journal of Experimental Psychology, 17, 132-138.
Poulton, E.C. (1954). Eye-hand span in simple serial tasks. Journal of Experimental Psychology, 47, 403-410.
Pylyshyn, Z., Burkell, J., Fisher, B., Sears, C., Schmidt, W., & Trick, L. (1994). Multiple parallel access in visual attention. Canadian Journal of Experimental Psychology, 48, 260-283.
Pylyshyn, Z.W., & Storm, R.W. (1988). Tracking multiple independent targets: Evidence for a parallel tracking mechanism. Spatial Vision, 3, 179-197.
Raaijmakers, J.G.W., & Shiffrin, R.M. (1981). Search of associative memory. Psychological Review, 88, 93-134.
Reber, P.J., & Kotovsky, K. (1997). Implicit learning in problem solving: The role of working memory capacity. Journal of Experimental Psychology: General, 126, 178-203.
Reisberg, D., Rappaport, I., & O'Shaughnessy, M. (1984). Limits of working memory: The digit-digit span. Journal of Experimental Psychology: Learning, Memory, & Cognition, 10, 203-221.
Rensink, R.A., O'Regan, J.K., & Clark, J.J. (1997). To see or not to see: The need for attention to perceive changes in scenes. Psychological Science, 8, 368-373.
Richman, H.B., Staszewski, J.J., & Simon, H.A. (1995). Simulation of expert memory using EPAM IV. Psychological Review, 102, 305-330.
Rodriguez, E., George, N., Lachaux, J.-P., Martinerie, J., Renault, B., & Varela, F.J. (1999). Perception's shadow: long-distance synchronization of human brain activity. Nature, 397, 430-433.
Ryan, J. (1969). Grouping and short-term memory: Different means and patterns of groups. Quarterly Journal of Experimental Psychology, 21, 137-147.
Sanders, A.F. (1968). Short term memory for spatial positions. Psychologie, 23, 1-15.
Sanders, A.F., & Schroots, J.J.F. (1969). Cognitive categories and memory span. III. Effects of similarity on recall. Quarterly Journal of Experimental Psychology, 21, 21-28.
Scarborough, D.L. (1971). Memory for brief visual displays: The role of implicit speech. Paper presented to the Eastern Psychological Association, New York.
Schneider, W., & Detweiler, M. (1987). A connectionist/control architecture for working memory. In G.H. Bower (ed.), The psychology of learning and motivation (vol. 21). New York: Academic Press.
Schneider, W., & Shiffrin, R. M. (1977). Controlled and automatic human information processing: I. Detection, search, and attention. Psychological Review, 84, 1-66.
Schweickert, R., & Boruff, B. (1986). Short-term memory capacity: Magic number or magic spell? Journal of Experimental Psychology: Learning, Memory, and Cognition, 12, 419-425.
Schweickert, R., Hayt, C., Hersberger, L., & Guentert, L. (1996). How many words can working memory hold? A model and a method. In S.E. Gathercole (ed.), Models of short-term memory. East Sussex, U.K.: Psychology Press.
Service, E. (1998). The effect of word length on immediate serial recall depends on phonological complexity, not articulatory duration. Quarterly Journal of Experimental Psychology, 51A, 283-304.
Shah, P., & Miyake, A. (1996). The separability of working memory resources for spatial thinking and language processing: An individual differences approach. Journal of Experimental Psychology: General, 125, 4-27.
Shastri, L., & Ajjanagadde, V. (1993). From simple associations to systematic reasoning: A connectionist representation of rules, variables, and dynamic bindings using temporal synchrony. Behavioral and Brain Sciences, 16, 417-494.
Shiffrin, R.M. (1993). Short-term memory: A brief commentary. Memory & Cognition, 21, 193-197.
Shiffrin, R.M., & Nosofsky, R.M. (1994). Seven plus or minus two: A commentary on capacity limitations. Psychological Review, 101, 357-361.
Shiffrin, R.M., & Schneider, W. (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attending, and a general theory. Psychological Review, 84, 127-190.
Simon, H.A. (1974). How big is a chunk? Science, 183, 482-488.
Simon, T.J., & Vaishnavi, S. (1996). Subitizing and counting depend on different attentional mechanisms: Evidence from visual enumeration in afterimages. Perception & Psychophysics, 58, 915-926.
Simons, D.J., & Levin, D.T. (1998). Failure to detect changes to people during a real-world interaction. Psychonomic Bulletin & Review, 5, 644-649.
Sirevaag, E.J., Kramer, A.F., Coles, M.G.H., & Donchin, E. (1989). Resource reciprocity: An event-related brain potentials analysis. Acta Psychologica, 70, 77-97.
Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs, 74 (Whole No. 498.)
Sperling, G. (1967). Successive approximations to a model for short-term memory. Acta Psychologica, 27, 285-292.
Stadler, M. (1989). On learning complex procedural knowledge. Journal of Experimental Psychology: Learning, Memory, & Cognition, 15, 1061-1069.
Sternberg, S. (1966). High-speed scanning in human memory. Science, 153, 652-654.
Tehan, G., & Humphreys, M.S. (1996). Cuing effects in short-term recall. Memory & Cognition, 24, 719-732.
Tiitinen, H., Sinkkonen, J., Reinikainen, K., Alho, K., Lavikainen, J., & Näätänen, R. (1993). Selective attention enhances the auditory 40-Hz transient response in humans. Nature, 364, 59-60.
Toms, M., Morris, N., & Ward, D. (1993). Working memory and conditional reasoning. Quarterly Journal of Experimental Psychology, 46A, 679-699.
Trick, L.M., & Pylyshyn, Z.W. (1993). What enumeration studies can show us about spatial attention: Evidence for limited capacity preattentive processing. Journal of Experimental Psychology: Human Perception and Performance, 19, 331-351.
Trick, L.M., & Pylyshyn, Z.W. (1994a). Why are small and large numbers enumerated differently? A limited-capacity preattentive stage in vision. Psychological Review, 101, 80-102.
Trick, L.M., & Pylyshyn, Z.W. (1994b). Cueing and counting: Does the position of the attentional focus affect enumeration? Visual Cognition, 1, 67-100. Tulving, E., & Colotla, V. (1970). free recall of trilingual lists. Cognitive Psychology, 1, 86-98.
Tulving, E., & Patkau, J.E. (1962). Concurrent effects of contextual constraint and word frequency on immediate recall and learning of verbal material. Canadian Journal of Psychology, 16, 83-95.
Tulving, E., & Patterson, R.D. (1968). Functional units and retrieval processes in free recall. Journal of Experimental Psychology, 77, 239-248.
Tulving, E., & Pearlstone, Z. (1966). Availability versus accessibility of information in memory for words. Journal of Verbal Learning and Verbal Behavior, 5, 381-391.
Vallar, G., & Baddeley, A.D. (1982). Short-term forgetting and the articulatory loop. Quarterly Journal of Experimental Psychology, 34A, 53-60.
Vogel, E.K., Luck, S.J., & Shapiro, K.L. (1998). Electrophysiological evidence for a postperceptual locus of suppression during the attentional blink. Journal of Experimental Psychology: Human Perception and Performance, 24, 1656-1674.
von der Malsburg, C. (1995). Binding in models of perception and brain function. Current Opinion in Neurobiology, 5, 520-526.
Watkins, M.J. (1974). Concept and measurement of primary memory. Psychological Bulletin, 81, 695-711.
Watkins, O.C., & Watkins, M.J. (1975). Build-up of proactive inhibition as a cue-overload effect. Journal of Experimental Psychology: Human Learning and Memory, 1, 442-452.
Waugh, N.C., & Norman, D.A. (1965). Primary memory. Psychological Review, 72, 89-104.
Wickelgren, W.A. (1964). Size of rehearsal group and short-term memory. Journal of Experimental Psychology, 68, 413-419.
Wickelgren, W.A. (1966). Phonemic similarity and interference in short-term memory for single letters. Journal of Experimental Psychology, 71, 396-404.
Wickens, C. D. (1984). Processing resources in attention. In. R. Parasuraman & D. R. Davies (Eds.), Varieties of attention (pp. 63-102). New York: Academic Press.
Wickens, D.D., Moody, M.J., & Dow, R. (1981). The nature and timing of the retrieval process and of interference effects. Journal of Experimental Psychology: General, 110, 1-20.
Wilkes, A.L. (1975). Encoding processes and pausing behaviour. In A. Kennedy & A. Wilkes (eds.), Studies in long-term memory. London: Wiley. (pp. 19-42)
Wood, N., & Cowan, N. (1995). The cocktail party phenomenon revisited: How frequent are attention shifts to one's name in an irrelevant auditory channel? Journal of Experimental Psychology: Learning, Memory, & Cognition, 21, 255-260.
Yantis, S. (1992). Multielement visual tracking: Attention and perceptual organization. Cognitive Psychology, 24, 295-340.
Zhang, G., & Simon, H. A. (1985). STM capacity for Chinese words
and idioms: Chunking and acoustical loop hypotheses. Memory and Cognition,
13,
193-201.
Acknowledgments
This project was supported by NICHD Grant R01 21338. I thank Monica
Fabiani, Gabriele Gratton, and Michael Stadler for helpful comments. Address
correspondence to Nelson Cowan, Department of Psychology, University of
Missouri, 210 McAlester Hall, Columbia, MO, USA. E-mail: CowanN@Missouri.edu.