1. The Challenge of Everyday Memory
Much of the traditional laboratory research on memory conducted in the past century has followed Ebbinghaus (1895) in using tightly controlled experiments that facilitate the quantification of memory (see Baddeley, 1990; Schacter, 1989). This tradition has been strongly criticized in the past two decades, however, most notably by Neisser (1978), who provocatively dismissed the laboratory research of the past 100 years as largely worthless for answering "the important questions about memory," and called for a shift to the "realistic" study of memory. Since Neisser's call, there has been a growing number of studies on such varied topics as autobiographical memory, eyewitness testimony, prospective memory, "flashbulb" memory, memory for action, memory for faces, memory for places, etc. (see, e.g., Cohen, 1989; Davies & Logie, 1993; Gruneberg, Morris, & Sykes, 1988; Harris & Morris, 1984; Neisser & Fivush, in press; Neisser & Winograd, 1988; Rubin, 1986; Winograd & Neisser, 1992). This new wave of everyday memory research has resulted in a proliferation of research methods that are quite removed from those traditionally employed in the laboratory.
The rift between proponents of naturalistic and laboratory memory research, as well as efforts at reconciliation, may be seen in the lively debate (to which American Psychologist devoted its January, 1991 issue) sparked by Banaji and Crowder's (1989) paper. It is apparent from the commentaries that "everyday memory" is an ill-defined category (Klatzky, 1991), and that the dimensions of the controversy are not simple to specify. In general, the battles appear to be raging on three distinct fronts: what memory phenomena should be studied, how they should be studied, and where.
For some researchers the major issue seems to involve the content ("what") of memory research. This is reflected, for example, in the title of Neisser's (1978) leading paper, "Memory: What are the important questions." Thus, everyday memory research has been characterized by its attempt to understand "the sorts of things people do every day" (Neisser, 1991, p. 35), by its choice of topics having "obvious relevance to daily life" (Klatzky, 1991, p. 43), and in particular, by its concern with the practical applications of memory research (e.g., Gruneberg & Morris, 1992). This is in contrast to the alleged irrelevance of traditional memory research, which has "chiefly focused on explicit recognition or recall of isolated items from lists" (Neisser, 1991, p. 35; but see Roediger, 1991).
Other discussions have treated the controversy as being over the proper research policy (the "how" question), that is, about "the most valuable ways of gaining knowledge and understanding about memory" (Loftus, 1991, p. 16; see Banaji & Crowder, 1989; Tulving, 1991). Proponents of the naturalistic study of memory have questioned the ecological validity of much laboratory experimentation (e.g., Aanstoos, 1991), whereas laboratory proponents have stressed the importance of experimental control and generalizability of results. Banaji and Crowder (1989), for instance, argue that because naturalistic research methods often lack experimental control, the "ecological validity of the methods as such is unimportant and can even work against generalizability" (p. 1187; see also Morton, 1991; Roediger, 1991). In general, naturalistic memory researchers acknowledge the desirability of controlled experimentation, but claim that a strict adherence to this methodology would leave out many interesting memory phenomena (Conway, 1991, 1993; Gruneberg & Morris, 1992).
Finally, still other researchers have underscored the "where" as being a fundamental, inseparable aspect of memory phenomena. For example, Neisser (1988a) has stressed the affinity between the ecological approach to the study of everyday memory, and the ethological approach to studying animal behavior, both of which focus on organism-environment interactions (see also, Ceci & Brofenbrenner, 1991). He therefore emphasizes the social-functional context of remembering, stating that "the theory we require will have to deal with persons, motives, and social situations . . . Most of all, it will have to deal with functional issues" (1988b, p. 553; see also Baddeley, 1988; Barclay, 1993; Bruce, 1985, 1989, 1991; Fivush, 1988, 1993; Neisser, 1978; 1991; Winograd, 1988). The implication is that studying the same phenomena in the laboratory and in natural settings may lead to very different conclusions. Indeed, Gruneberg, Morris and Sykes (1991), point to findings (Morris, Tweedy, & Gruneberg, 1985) in which "the real-life nature of the experience made a considerable difference to memory processing" (1991, p. 74; see also Aanstoos, 1991; Bahrick, 1992; Baker-Ward, Ornstein, & Gordon, 1993; Ceci & Brofenbrenner, 1985, 1991; Conway, 1991, 1993).
Importantly, however, although the three dimensions--the what, how, and where dimensions--are correlated in the reality of memory research, they are not logically interdependent. For instance, many everyday memory topics can be studied in the laboratory (Neisser, 1991; Roediger, 1991), and memory research in naturalistic settings may be amenable to strict experimental control (Conway, 1991; e.g., Ceci & Brofenbrenner, 1985; Koriat, Fischhoff & Razel, 1976; Loftus, 1979a). Therefore, we sought a further dimension of the controversy that might lie concealed behind the commonly debated issues.
We propose that the everyday-laboratory controversy harbors what appears to be a more fundamental breach--a difference in the very metaphor of memory implicitly espoused by each camp (see also Koriat & Goldsmith, 1994a; in press). These metaphors, the storehouse and correspondence metaphors, embody two essentially different ways of thinking about memory and about how memory should be evaluated: The storehouse metaphor, which likens memory to a depository of input elements, implies an evaluation in terms of the quantity of items remaining in store. In contrast, the correspondence metaphor, which treats memory as a perception or description of the past, implies an evaluation in terms of the accuracy or faithfulness of that description.
In this article, we delineate the two contrasting metaphors, and examine their respective quantity-oriented and accuracy-oriented approaches to the evaluation of memory. We believe that this analysis can help tie together some of the various aspects of the everyday-laboratory controversy. More importantly, however, we contend that the distinction between the two metaphors, with their ensuing approaches to memory, is a crucial distinction in its own right, with serious implications that span across the two camps. Thus, our primary aim is to explicate the unique metatheoretical foundation of the accuracy-oriented approach to memory, as opposed to the traditional, quantity-oriented approach, in order to promote a more effective exploitation of the correspondence metaphor in both naturalistic and laboratory research contexts.
The structure of the presentation is as follows: In Section 2, we delineate the distinction between the storehouse and correspondence metaphors, and show how this distinction can capture part of the friction between proponents of everyday and traditional laboratory research. In subsequent sections we use the everyday-laboratory controversy as a backdrop, focusing primarily on the correspondence metaphor, and examining its potential to serve as a productive metaphor for memory research. Thus, in Section 3 we consider how a correspondence view of memory seems to be emerging in current memory theorizing. In Section 4, we explicate the unique logic of the correspondence metaphor for the evaluation of memory, and outline several possible approaches to correspondence-oriented memory assessment. In Section 5, we illustrate the utility of the correspondence-storehouse distinction by reviewing recent experimental work that addresses some of the troubling issues that arise in attempting to reconcile accuracy-oriented, naturalistic results with traditional, quantity-oriented laboratory findings. In particular, the correspondence metaphor is shown to call for a more serious consideration of the active role of the subject in controlling the faithfulness of his or her memory report. Finally, in Section 6, we return to the everyday-laboratory controversy, and outline a scheme for capturing the interrelationships between conceptual metaphors, on the one hand, and the content (what), context (where), and methods (how) of memory research, on the other. In light of our analysis of the role of metaphors in memory research, we address the issue of whether the differences between the two approaches to memory may ultimately be reconciled.
2. Two Competing Metaphors of Memory
The study of memory is replete with a variety of metaphors for conceptualizing different aspects of memory and remembering (see Kolers & Roediger, 1984; Malcolm, 1977; Marshall & Fryer, 1978; Roediger, 1980; see also Gentner & Grudin, 1985). Roediger (1980) compended a "fairly complete, but certainly not exhaustive" list of 36 memory metaphors used by psychologists and philosophers from Plato until modern times, sometimes in jest, but more often quite seriously. On the lighter side, Hintzman (1974) has ironically compared memory to a cow's stomach, and more seriously, but even less flattering, Landauer (1975) has used the analogy of a garbage can. Although some students of memory have expressed reservations regarding the use of metaphors to conceptualize memory (e.g., Ebbinghaus, 1895; Roediger, 1980; Tulving, 1979), there is no question that such metaphors have exerted a considerable influence on memory research and theory.
We concentrate here on the contrast between two general types of memory metaphors--the storehouse metaphor, which has played a dominant role in guiding traditional laboratory memory research, and the correspondence metaphor, which seems to be gaining impetus in the new wave of everyday memory research. With regard to the former metaphor, Roediger (1980) observed that "the conception of the mind as a mental space in which memories are stored and then retrieved by a search process has served as a general and powerful explanation of the phenomena of human memory. There is currently no other general conception of the mind or memory that rivals this view" (p. 238). We begin, then, with an exposition of this metaphor, focusing on its implications for memory assessment. We have chosen to present a rather strict version of the storehouse conception, to serve as a contrasting background against which to introduce the correspondence metaphor in the section that follows. Although perhaps no investigator today would endorse such an extreme version, it is important nonetheless to confront its implicit logic, which still pervades much contemporary research and thinking about memory. In this regard, we subscribe to the rationale offered by Jussim (1991) for his critical analysis of the strong constructivist perspective in social perception: "Regardless of whether anyone actually believes in the [strict storehouse metaphor], clearly many choose research topics, write, and interpret research as if they believed it" (p. 55, brackets substituted for original text).
2.1. The Storehouse Metaphor and its Implications
Despite the skepticism expressed by Ebbinghaus himself regarding the utility of memory metaphors, much of the Ebbinghaus tradition of laboratory memory research has been guided by a metaphor of memory as a storehouse of discrete, elementary "units." The origin of this metaphor may be traced back as far as Plato (see Hermann & Chafin, 1988; Marshall & Fryer, 1978), but its more modern development may be seen in the British empiricistic philosophies and their associative-atomistic conception of the mind as a store of elementary "ideas" and "associations" (see Mandler & Mandler, 1964; O'Neill, 1968). Thus, according to Locke (1690), memory "is as it were the storehouse of our ideas. . . . a repository to lay up those ideas" (Book II, Ch. 10). In this conception, multitude of stimuli is assumed to impinge upon the senses, and discrete impressions of these stimuli are retained as memory units for later retrieval. As a result of decay or interference some of the units may become lost, weakened, or otherwise inaccessible.
With regard to memory assessment, the storehouse metaphor has legitimized the use of discrete, elementary stimuli as experimental input, allowing for the quantification of memory (Schacter, 1989). This approach permeates a vast number of studies carried out in the past century using a variety of experimental paradigms and memory measures (see, e.g., Crowder, 1976; Gregg, 1986; Murdock, 1974). Indeed, the prototypical list-learning paradigm, the workhorse of the laboratory tradition, essentially simulates the course of events assumed to take place when the input elements are first "deposited" in the memory store and later "recollected" or "retrieved." The stimuli, typically referred to as "items," consist of nonsense syllables, words, etc., whose salient characteristic is their countability--they allow measures of memory effectiveness based on the number of recovered elements. Implicit in this approach is the conception of forgetting as information loss, either the loss of the elementary units themselves (item information) or the loss of the associative links between them (associative information; see Murdock, 1974). Thus, the most natural measure of memory is simply how many of the units of information originally presented can be recovered on a given memory test. In fact, in the great majority of laboratory studies on free recall, incorrect responses (i.e., commission errors) are simply ignored. Also, it makes no difference, for instance, whether HAT was remembered and GUN was forgotten, or vice versa: All elements retrieved from the memory store are equivalent, that is, interchangeable as far as the total memory score is concerned. In sum, what matters is not what is remembered, but rather how much.
The storehouse metaphor, with its associated quantity-oriented approach to memory, has had a pervasive impact on the mainstream of traditional memory research. It has directed researchers' thinking towards such aspects of memory as storage capacity, the internal architecture of the store, the transfer of units from one department to another, competition between units, and, of course, information loss. It has also shaped both the experimental paradigms used to study memory (e.g., list learning, paired associates, etc; see Puff, 1982), and the type of phenomena investigated (e.g., the effects of list length, retention interval, spacing, serial order, etc.). Moreover, although the storehouse conception is perhaps most firmly rooted in the verbal-learning tradition, its influence has extended to other research traditions as well. Importantly, this conception appears to have been well-suited for the more recent, information-processing framework and its associated metaphor, the digital computer (see Lachman, Lachman, & Butterfield, 1979; Tulving, 1979). The computer metaphor adds greater sophistication in terms of internal organization, coding, processing, and transformation, yet it has served primarily to reinforce the fundamental storehouse features--the input, storage, and retrieval of discrete units of information. In fact, the "modal model" of separate memory stores (e.g., Atkinson & Shiffrin, 1968) appears to represent but one example of a more modern generation of computerized storehouses (see Marshall & Fryer, 1978; Roediger, 1980).
It may be seen, then, that as memory research has progressed, the "pure" strain of storehouse metaphor has evolved into a variety of related species, adapted to fit the different requirements posed by different memory phenomena. For instance, some discussions imply a differentiated, organized storehouse made up of many departments, with items stocked under specific addresses (cf. "library" or "dictionary" metaphors; Broadbent, 1971; Loftus, 1977; Marshall & Fryer, 1978). Others imply the stacking of items in layers, one on top of the other, so that "buried" items are more difficult to reach (Bekerian & Bowers, 1983). Still others assume that some of the departments are more limited in space than others, so that items must be pushed from one to another (Atkinson & Shiffrin, 1968; Waugh & Norman, 1965). Also, the "items" themselves may be more or less complex, ranging from simple features, verbal units, ideas and associations, to propositions. chunks, and templates (see Malcolm, 1977).
Nevertheless, although modern treatments of memory have come a long way from the simplistic storehouse conception, many of the respected theories and experimental paradigms still bear the stamp of their storehouse ancestry in adhering to the quantity-oriented approach to memory (see Schacter, 1989). Indeed, the staying-power of the storehouse metaphor remains apparent not only in the models themselves, but also in the classical experimental tools that continue to dominate laboratory memory research, the "memory drum" and its modern descendents. We should point out, however, that these tools and paradigms are often used with different aims than those for which they were originally intended (see below), and in connection with memory phenomena that would appear to call for a different type of metaphor. Not coincidentally, then, laboratory procedures that seem to break with the storehouse mold have met with relative approval from advocates of everyday memory research, who admit that they are "appreciably closer (than the old methods were) to the sorts of things people do everyday" (Neisser, 1991, p. 35). In fact, laboratory proponents have had to contest the placement of entire areas of laboratory research under the banner of everyday memory, such as "research on the tip-of-the-tongue phenomenon . . ., the feeling-of-knowing experience . . ., eyewitness testimony . . ., and reality monitoring" (Roediger, 1991, p. 38). Such research does not just deal with memory phenomena that occur everyday: We suggest that there is an additional, more fundamental aspect which tends to distinguish such studies from the traditional laboratory approach--the implicit metaphor of memory on which they are based.
2.2. The Correspondence Metaphor and its Implications
Despite its dominance in guiding traditional laboratory research, the storehouse metaphor would seem to have limited value for the study of many everyday memory phenomena. Consider, for example, a situation in which a person on the witness stand is asked to report what she can regarding the circumstances of a crime. This situation, like many other real-life situations, motivates a different way of thinking about memory, one in which the intrinsic quality of memory is not its storage capability, but rather, its ability to faithfully represent the past. Thus, the basic criterion for evaluating memory is not the quantity of items remaining in store, but rather, the correspondence between what the person reports and what actually happened (see Winograd, in press). Unfortunately, there appears to be no single concrete metaphor (like the storehouse) which alone can provide the essential features for such an alternative conception. Therefore, we have chosen to explicate a more abstract, correspondence metaphor (Koriat & Goldsmith, 1994a, in press), in terms of the following interrelated attributes:
First, memory is conceived as being about some past event, to constitute a representation or description of the past episode (see, e.g., Conway, 1991, 1993). Consequently, memory reports are treated as propositions that have a truth value, i.e., that can be judged as right or wrong, or as being more or less "true" to aspects of the actual event (e.g., the actual speed of the car).
Second, as just stated, the essential feature of memory is its ability to faithfully represent past events. Thus, memory is evaluated in terms of its accuracy, that is, its "fit" with past events, the extent to which it accords with reality (or some other criterion; see Ross, in press), rather than in terms of the number of items remaining in store. Likewise, forgetting is conceived as a loss of correspondence between the memory report and the actual event, i.e., as a deviation from veridicality, rather than as just a loss of items. Thus, this conception entails a unique concern with the many different types of qualitative memory distortions--fabrication, confabulation, simplification, and the like (see, e.g., Alba & Hasher, 1983; Bahrick, Hall, & Dunlosky, 1993; Bartlett, 1932; Brewer & Nakamura, 1984; Dawes, 1966; Goldmeier, 1982; Loftus, 1979a, 1979b, 1982; Neisser, 1981, 1988c; Riley, 1962; Wells & Loftus, 1984).
Third, memory correspondence is content laden. Unlike the storehouse metaphor, which engenders a predominant concern with how much is remembered, the correspondence metaphor (and virtually all real-life memory situations), entails an additional concern with the quality of memory (Schacter, 1989), that is, with what is remembered (Conway, 1991, 1993). In the courtroom, for instance, it might make a crucial difference whether the witness remembered that the burglar "had a gun," but forgot that he "wore a hat," rather than vice versa. Thus, functional considerations are intrinsic to the evaluation and study of memory correspondence (see below).
Fourth, in contrast to the evaluation of memory "storage," the evaluation of memory correspondence is inherently output bound: Rather than begin with the input and ask how much of it is recovered in the output, one naturally begins with the output (i.e., the memory report) and examines to what extent it accords with the input. In general, accuracy is meaningful only for what a person reports (e.g., the color of a shirt, the speed of a car), not for what is omitted. Thus, while under the storehouse view subjects are held accountable primarily for what they fail to report, under the correspondence view subjects are accountable primarily for what they do report.
Finally, the correspondence conception of memory has much in common with the way we think about perception. In perception, interest lies in the correspondence between what we perceive and what is out there, that is, in the (output-bound) veridicality of our perceptions, and in the various ways in which they may deviate from reality (e.g., illusions). Likewise, under the correspondence metaphor, memory may be conceived as the perception of the past, and the question then becomes to what extent is this perception dependable (cf. "memory psychophysics," Algom, 1992).
Indeed, many of the metaphors underlying perceptual theory would appear to imply their counterparts in correspondence views of memory. Just as perception has been viewed alternatively as a passive reflection of the external environment (Locke, 1690), as an active construction of reality (e.g., Neisser, 1967; Rock, 1983), or as a direct, "resonance" to ecological "affordances" (Gibson, 1979), likewise, memory may be conceived as mirroring past experience (see Brewer, in press; Malcolm, 1977), as an active reconstruction of past events (e.g., Bartlett, 1932; Neisser, 1967), or even as a "stage-setting" attunement (Bransford, McCarrell, Franks & Nitsch, 1977). Also, with regard to assessment, under the correspondence metaphor, as in perception, one is generally not concerned with how much of the impinging information is remembered (perceived), but rather, with the output-bound correspondence or "goodness of fit" (see Section 4.1, below) between what is remembered (perceived) and what actually occurred.
Collectively, these aspects of the correspondence metaphor characterize what we have called, an accuracy-oriented approach to memory (Koriat & Goldsmith, 1994a; in press). This approach is reflected in much of the work on everyday memory, particularly in areas such as autobiographical memory and eyewitness testimony, which disclose a pervasive preoccupation with the faithfulness and dependability of memory for past events (e.g., Barclay, 1988, 1993; Barclay & Wellman, 1986; Brewer, 1988, in press; Deffenbacher, 1988; 1991; Hilgard & Loftus, 1979; Loftus, 1979a, 1979b, 1982; Neisser, 1981, 1988b; Neisser & Fivush, in press; Ross, 1989; Ross, in press; Ross & Buehler, in press; Rubin, 1986; Wells & Loftus, 1984; Winograd & Neisser, 1992; Winograd, in press).
This preoccupation is not arbitrary. The affinity between the correspondence metaphor and everyday memory research appears to stem from the basic character of memory in everyday life, where what is being remembered is certainly no less important than how much, and where memory reports are naturally considered to be about personally experienced, past events and states (see Conway, 1991, 1993). The difference between treating memory as being about something (correspondence metaphor) and treating it as the mere retrieval of something (storehouse metaphor) is so obvious that it can easily be overlooked. A recalled list of words, for instance, need not be considered as being about anything; it can simply be treated as the retrieval of the items that remain in store. Thus, as Neisser (1981) observed, traditional laboratory research has generally studied memory for "material that has no reference beyond itself" (p. 4). By contrast, stressing the intentionality ("aboutness") of real-life memory, Conway (1991) goes as far as to propose that the study of everyday memory may require a different theory of mind than one which would have us "study human memory as if it were a chemical reaction--like dough rising." He asserts that "one difference between mental and physical states is that mental states have content, whereas physical states do not. Thus, my memory of dough rising is about something, some representation of an event I once experienced. But actual dough rising is not about anything; it is simply what it is--dough rising" (p. 24, emphasis in original).
Conway's remarks reflect a view where memory does not serve merely as a depository of isolated, lifeless units, but rather, affords a meaningful representation of real-life events that can be effectively utilized in future interactions. This functional perspective (e.g., Baddeley, 1988; Barclay, 1993; Bruce, 1985, 1989, 1991; Fivush, 1988, 1993; Neisser, 1978, 1988b, 1991; Nilsson, 1979; Winograd, 1988) motivates a concern for the dependability of memory, the extent to which it can be counted on to faithfully reflect past events.
Furthermore, when memory is viewed with reference to past events, it becomes clear that memory representations can deviate from reality in many different ways. Thus, the recent wave of everyday memory research has brought a renewed interest in memory errors, particularly in the qualitative changes that occur in memory for complex, meaningful material (Bartlett, 1932). This interest is inherent to the correspondence view. Indeed, because real-life experience is made up of richly structured scenes and events (see McCauley, 1988, Neisser, 1986, 1988c), many naturalistic errors pertain to wholistic and relational changes that cannot be readily captured in terms of the mere loss of "items" (see, e.g., Alba & Hasher, 1983; Bartlett, 1932; Brewer & Nakamura, 1984; Dawes, 1966; Neisser, 1981, 1986, 1988c). Also, such changes may reflect social, motivational, and functional biases that are quite foreign to the passive storehouse conception (see, e.g., Boon & Davies, 1988; Neisser, 1988a, 1988b; Nigro & Neisser, 1983; Ross, 1989; Ross & Buehler, in press).
At the extreme, the concerns of correspondence-oriented and storehouse-oriented researchers may be so different as to seem almost unbridgeable. Consider, for example, the following quote from Neisser (1981), regarding the quality of John Dean's memory (all emphases added in order to highlight the correspondence way of thinking): Analysis of Dean's testimony does indeed reveal some instances of memory for the gist of what was said on a particular occasion. Elsewhere in his testimony, however, there is surprisingly little correspondence between the course of a conversation and his account of it. Even in those cases, however, there is usually a deeper level at which he is right. He gave an accurate portrayal of the real situation, of the actual characters and commitments of the people he knew, and of the events that lay behind the conversations he was trying to remember. Psychology is unaccustomed to analyzing the truthfulness of memory at this level, because we usually work with laboratory material that has no reference beyond itself (p. 4).
We leave it to the reader to consider how the concerns expressed by Neisser might be accomodated within the storehouse conception.
In sum, there seems to be more brewing within the everyday-laboratory controversy than just the what, how, and where issues. Clearly, the uprising of the everyday memory camp does not stem from a disdain of controlled experimentation, nor is it simply a reaction against the laboratory context as such. Rather, we contend that it reflects, at least partly, an acute disillusionment with the kind of thinking about memory that has permeated the traditional laboratory approach. This particular way of thinking--the storehouse conception--is embodied in the established laboratory tools and paradigms used in the quantity-oriented study of memory. Of course the relationship between the metaphors and the everyday-laboratory affiliations is more a correlation than a perfect mapping--much work on memory correspondence has been (and hopefully will be) conducted by laboratory researchers (see Section 3), and much everyday memory research continues to submit to the alluring power of the storehouse metaphor (see Section 6). Nonetheless, the correspondence metaphor sketched above consolidates many of the objections levelled against traditional memory research, and seems to underlie the type of accuracy-oriented approach now gaining impetus in the study of everyday memory.
In the following sections, then, we focus on the emerging correspondence conception of memory, and examine its potential as a viable alternative to the storehouse conception in providing a productive framework for memory research. In doing so, we have several aims: First, to document the emergence of the correspondence metaphor in current memory theorizing. Second, to explicate the unique logic of the correspondence metaphor, and to pursue its implications for the study of memory. Third, to show how the storehouse-correspondence distinction can help clarify some troubling issues that arise when comparing laboratory (quantity-oriented) and naturalistic (accuracy-oriented) research findings. Finally, and more generally, to illustrate the way in which a conceptual metaphor can help shape both the theories and the methods of scientific research.
3. The Correspondence Metaphor in Memory Research and Theory
In the foregoing discussion we argued that the new wave of everyday memory research discloses a correspondence-oriented approach to memory, which differs fundamentally from the storehouse-oriented approach that has dominated traditional, laboratory research. The correspondence view of memory, however, appears to be gaining influence in current laboratory-based research and theorizing as well. Indeed, signs of a general shift away from storehouse-guided theorizing towards a correspondence-oriented metatheory may be discerned in a wide variety of contemporary approaches, including the reconstructive, attributional, ecological, functional, nonmediational, procedural, and connectionist approaches to memory. We now consider each of these approaches in turn, and briefly discuss how each seems to manifest different facets of the correspondence metaphor.
The first serious proposal for a correspondence-oriented view of memory was advanced by Bartlett (1932). According to Bartlett, "remembering is not the re-excitation of innumerable fixed, lifeless and fragmentary traces. It is an imaginative reconstruction or construction" (1932, p. 213). Bartlett conceived of remembering as an attempt to make sense of experience by applying cognitive structures, called "schemata" (a concept introduced by Head, 1920). These structures constitute "an active organization of past reactions, or of past experiences" operating as a "unitary mass" (1932, p. 201). Bartlett's reconstructive approach to memory was given further impetus by Neisser (1967), and today this approach clearly encompasses a substantial amount of both everyday and laboratory memory research. Indeed, many current theoretical notions, such as schemata, frames, scripts, plans, MOP's, TOP's, mental models, and story grammars (e.g., Johnson-Laird, 1983; Kintsch & van Dijk, 1978; Mandler, 1979; Minsky, 1975; Rumelhart, 1975, 1980; Schank, 1982; see Rumelhart & Norman, 1988) reflect the basic assumption that remembering is an active, constructive "effort after meaning" (Bartlett, 1932, p. 20).
The implications of the reconstructive view have been investigated experimentally using a wide range of rich and complex stimulus materials and tasks, including memory for sentences, stories, and real-life events (for reviews, see Alba & Hasher, 1983; Brewer & Nakamura, 1984). In eyewitness research, for instance, Loftus (1979a, 1979b, 1982) and her colleagues have been very influential in demonstrating the many ways in which memory for witnessed events can be distorted by reconstructive inference, particularly inference based on post-event information (e.g., Loftus, Miller, & Burns, 1978; Loftus & Palmer, 1974). Also, in autobiographical memory research, the "self-schema" (Markus, 1977) has been used extensively to explain both the accuracy and inaccuracy of memory constructions for personally experienced states and events (e.g., Barclay, 1986, 1988, 1993; Barclay & Wellman, 1986; Markus, 1980; Neisser, 1988b, Ross, 1989; Ross & Buehler, in press; Winograd, in press). More generally, the reconstructive view has inspired the postulation and study of a variety of selective, integrative, and interpretive processes in memory (e.g., Bower, Black, & Turner, 1979; Bransford & Franks, 1971; Bransford & Johnson, 1972; Dooling & Christiaansen, 1977; Johnson, Bransford, & Solomon, 1973; Morris, Stein, & Bransford, 1979; Pichert & Anderson, 1977; Seifert, Robertson, & Black, 1985; Spiro, 1980; Wagenaar & Boer, 1987; but see Alba & Hasher, 1983, for reservations).
On the whole, the reconstructive approach goes far beyond the storehouse conception of memory, in emphasizing the active role of the rememberer in creating a meaningful and organized representation of past events, and in admitting a variety of qualitative ways in which this representation can deviate from reality. This approach, then, is perhaps the most clear and productive example of a contemporary, correspondence-oriented approach to memory.
A fundamental criticism that has been leveled against the reconstructive approach, however, is that it has not moved far enough away from the atomistic and mediational assumptions of the storehouse metaphor. For instance, Marshall and Fryer (1978) contend that "currently, Bartlett is often cited in support of the notion that 'much of what is remembered is reconstructed from stored fragments' (Fodor, 1975). . . . but this is not to impugn the storehouse metaphor at all, it is merely to offer a variation on its contents" (p. 8). This criticism is perhaps somewhat overstated. Bartlett himself was emphatic in dissociating his ideas from the storehouse metaphor, stressing that "a storehouse is a place where things are put in the hope that they may be found again when they are wanted exactly as they were when first stored away. The schemata are . . . living, constantly developing, affected by every bit of incoming sensational experience of a given kind. The storehouse notion is as far removed from this as it well could be" (1932, p. 200). However, as will be discussed below, this criticism does seem to hold for some of the specific experimental practices and evaluative procedures employed by students of memory reconstruction, among others (see Section 4).
More far-reaching departures from the storehouse view of memory seem to be emerging, however. For example, Jacoby and his associates (e.g., Jacoby, 1988; Jacoby, Kelley, & Dywan, 1989; Jacoby, Lindsay, & Toth, 1992; Kelley & Jacoby, 1990) have promoted a constructive-attributional approach which places a special emphasis on the subjective experience of remembering. This experience is seen to result from attributions to the past that are evoked by a feeling of familiarity for a present stimulus. When the fluent processing of a present stimulus, and hence its subjective familiarity, actually does derive from previous exposure, then the attribution to the past should result in correct or veridical remembering. However, a feeling of familiarity may also derive from other sources, and when improperly attributed to the past, may give rise to confabulations and memory illusions. Thus, in investigating the genesis of memorial experiences, Jacoby and coworkers have demonstrated how false memories may be created by altering perceptual processes independently of past experience. In their view, "the conscious experience of remembering is not to be found in a memory trace. Rather, remembering is an inference based on internal and situational cues" (Kelley & Jacoby, 1990, p. 49). This work, then, displays a concern with both veridical and nonveridical remembering, and suggests possible reasons why people's current "perceptions" of the past might deviate from reality.
Another road, which also stresses the link between perception and memory, has been taken by researchers advocating an ecological or "direct" approach to memory, along the lines proposed by Gibson (1979; see, e.g., Bransford et al., 1977; Neisser, 1986, 1988a, 1988c). Neisser (1986) for instance, has renounced his earlier (1967) conception of memory reconstruction for a more direct, ecological view: Events and extenditures [extended events] have nested levels of structure, and so does our experience of them. How does that correspondence come about? This question does not even arise in traditional information-processing theories, because they do not discuss the real environment at all. They simplify matters by assuming that, whatever the world may actually be like, only very molecular information about it is available to perceivers. . . . The ecological approach assumes, in contrast, that molar events are often perceived just as directly as molecular ones. The environment is equally real at different levels of analysis, and events at those different levels are typically specified by different kinds of information. . . . It is my hypothesis, that these several levels of analysis are each represented in memory, leaving more or less independent "traces" behind (pp. 75-76).
Neisser, then, emphasizes the structured correspondence between perception and memory, on the one hand, and reality, on the other. He also suggests a natural starting point for investigating that correspondence: "Rather than beginning with the hypothetical models of mental functioning, ecological psychologists start with the real environment and the individual's adaptation to that environment" (Neisser, 1988a; p. 153).
The concern with functional issues expressed by Neisser represents another manifestation of current disillusionment with the storehouse metaphor. In fact, the functional approach to memory tends to be associated with two somewhat different emphases (see Winograd, 1988). The first, of course, is the heightened concern with memory function itself, i.e., what memory is for (Baddeley, 1988; see also Barclay, 1993; Bruce, 1985, 1989, 1991; Fivush, 1988, 1993; Neisser, 1978, 1988a, 1988b, 1991; Nilsson, 1979; Sherry & Schacter, 1987; Tulving, 1985). This concern may range from the adaptive role of memory at an evolutionary scale (e.g., Bruce, 1985; Sherry & Schacter, 1987; Tulving, 1985), down to the level of social or individual interests that are served (e.g., Barclay, 1993; Fivush, 1993; Neisser, 1981, 1988b). As we shall see below, such functional considerations are also critical in developing the kind of memory measures that follow uniquely from the correspondence metaphor (see Section 4).
The second aspect is the deemphasis of concern with the mediating mechanisms of memory. For instance, Bahrick (1987) explains, "by functional approaches, I refer to theories that attempt to establish parsimonious relations between manipulated variables and memory performance, without necessarily attempting to reach conclusions about internal processing" (pp. 389-90; also see Jacoby, 1988). Proponents of a nonmediational approach to memory have gone so far as to entirely reject the existence of a mediating "substrate of memory" (Watkins 1990), particularly as it is embodied in the concept of "memory trace" (see also, Bransford et al., 1977; Craik, 1983, 1991; Crowder, 1993; Kolers, 1973; Kolers & Roediger, 1984; Kolers & Smythe, 1984; Lockhart & Craik, 1990; Malcolm, 1977). Such views imply a revolt against the storehouse metaphor, whose input-bound perspective naturally leads to a concern for the fate of the memory trace.
An additional proposal comes from those subscribing to a "proceduralistic" view of memory (e.g., Craik, 1983, 1991; Crowder, 1993; Kolers, 1973; Kolers & Roediger, 1984; Kolers & Smythe, 1984; Lockhart & Craik, 1990; Roediger, Weldon, & Challis, 1989), in which memory is assumed to involve the retention of information by the same units that processed it originally. According to this view, "the memory trace should be understood, not as the result of a specialized memory-encoding process, but rather as a by product or record of normal cognitive processes such as comprehension, categorization, or discrimination" (Lockhart & Craik, 1990, p. 89). This conception is based on evidence showing "that the means of acquisition of information form part of its representation in mind, that recognition varies with the similarity of procedures in acquisition and test, and that transfer between tasks varies with the degree of correspondence of underlying procedures" (Kolers & Roediger, 1984, p. 425).
Several facets of the proceduralistic view may be especially valuable in advancing theories of memory correspondence. The first is the emphasis on the context of remembering, in particular, on the congruence between the relevant processing activities of the rememberer at the time of remembering and at the time of witnessing the original event. Although laboratory research has typically focused on the importance of contextual congruence in affecting the quantity of information that can be recovered (e.g., the effects of "encoding specificity" and "state-dependent learning"; see, e.g., Eich, 1980; Fisher, 1981; Fisher & Craik, 1977; Mantyla, 1986; Morris, Bransford, & Franks, 1977; Smith, Glenberg, & Bjork, 1978; Tulving & Thomson, 1973; Watkins & Tulving, 1975), such congruence may also be critical in determining the correspondence between remembered and actual events (see, e.g., Fisher & Geiselman, 1992; Geiselman & Fisher, 1989). The second aspect is the emphasis on the active role of the subject, both in determining the way in which information is initially processed (elaborated, etc.), and in directing the subsequent act of remembering (e.g., the "levels of processing" approach; see Craik & Lockhart, 1972; Lockhart & Craik, 1990). Third, is the conception of memory as embodied in global changes to the response tendencies of the rememberer, rather than in the storage of discrete memory traces. This conception too stresses the affinity between memory and perception: If remembering is closely akin to perceiving, then it is perhaps no more likely that memory traces exist in the absence of remembering than percepts exist in the absence of perceiving: The activity must be studied while it is occurring. Clearly something in the system must change as a result of experience, but the changes may be diffuse and widespread modifications of the whole cognitive system so that the system now interacts with aspects of the environment in a different way, rather than events being recorded specifically and discretely like events on a video recorder (Craik, 1983, p. 356).
A similar conception, of course, is intrinsic to the connectionist, parallel distributed processing (PDP) approach to modeling human memory (e.g., Hinton & Anderson, 1981; McClelland & Rumelhart, 1986a, 1986b; Rumelhart & McClelland, 1986; for a review, see Hintzman, 1990). This approach, based on a brain metaphor (Rumelhart, 1989), offers a promising vehicle for the development of correspondence-based memory models. In PDP models, knowledge is not "stored" in any specific location, but rather, is embodied in the connections between a multitude of interacting processing units. Both the input and the output are represented as complex patterns of activity, and the system learns by adjusting the connection weights, not by incorporating new discrete elements or representations. Thus, memory is distributed across the entire system, and learning reflects changes in the response tendencies of the system as a whole.
Connectionist models treat memory, like perception, as essentially a problem in pattern recognition, based on a principle of global matching (see Hintzman, 1990). Thus, the accuracy of memory responses in such models is most naturally evaluated in terms of the overall correspondence between distributed patterns of activity, rather than in terms of the mere amount of information recovered (e.g., "resonance scores," Metcalfe, 1990; Metcalfe Eich, 1985). This approach is also well suited for investigating memory distortions (e.g., memory "blending"; Metcalfe, 1990). Indeed, both the way in which new inputs are incorporated into the existing network structure (assimilation), and the gradual modification of that structure which results from the processing itself (accommodation), make connectionism a natural choice for modelling the wholistic, schematic, and constructive aspects of memory processing (see, e.g, Grossberg, 1987; Hintzman, 1986; McClelland & Rumelhart, 1985, 1986b; Metcalfe, 1990; Rumelhart, Smolensky, McClelland, & Hinton, 1986). Moreover, connectionism may offer a solution to the troubling issue of intentionality or "aboutness" (Bechtel, 1988; Bechtel & Abrahamsen, 1991) which plagues traditional cognitive models (cf. the "symbol grounding" problem; Harnad, 1990). As argued by Bechtel and Abrahamsen (1991), because connectionist representations "constitute the system's adaptation to the input, there is a clear respect in which they would be about objects or events in the environment if the system were connected, via sensory-motor organs, to such an environment" (p. 129, emphasis in original; see also Harnad, 1990, 1992).
In sum, although quite varied in many aspects of their conception of memory, the approaches considered in this brief survey all seem to reflect a basic shift from the traditional storehouse conception towards a correspondence oriented perspective, emphasizing the congruence between current remembering and past events, and specifying factors that may affect this congruence. Many of the approaches also embody a view of memory as the perception the past, and are specifically concerned with the veridicality of this perception. Thus, although the preoccupation with memory correspondence is most salient in everyday memory research, the emergence of the correspondence view of memory would seem to represent a broad undercurrent that is gaining momentum in the study of memory generally.
It is curious, however, that despite the growing theoretical interest in memory correspondence, many of the methodological practices applied to the study of memory correspondence still seem to pay homage to the storehouse metaphor in the way that memory is evaluated. In fact, as we have noted elsewhere (Koriat & Goldsmith, in press), although many memory researchers today talk correspondence, they still practice storehouse. A couple of examples should suffice to illustrate this point. The reconstructive view of memory, for instance, might have been expected to promote the development of memory measures that uniquely reflect the kind of global correspondence (or miscorrespondence) between current memories and past events that is assumed to ensue from reconstructive processes. Indeed, Bartlett (1932) firmly rejected the applicability of Ebbinghaus-style memory measures. Yet, many predictions of the reconstructive view are commonly tested by focusing on the quantity of discrete items of information remembered (e.g., Bransford & Johnson, 1972; Brewer & Treyens, 1981; Dooling & Mullet, 1973; Kozminsky, 1977; Morris et al., 1979; Pitchert & Anderson, 1977). Also, in naturalistic accuracy-oriented research, many of the memory measures are still based on the number of stimulus items correctly recalled or recognized, for instance, the number of correct propositions about a witnessed event (e.g., Fisher, Geiselman, & Amador, 1989; Ornstein, Gordon, & Larus, 1992; see Hilgard & Loftus, 1979), the number of correct mugshot or lineup identifications (e.g., Brown, Deffenbacher, & Sturgill, 1977; Gorenstein & Ellsworth, 1980; Lindsay & Wells, 1985; see Deffenbacher, 1991), and so forth. Although valuable information can certainly be gained from such measures, they do not capture many aspects of memory that are of unique concern in the study of memory correspondence.
There appears to be a serious gap, then, between the concern with memory correspondence, on the one hand, and the standard methods by which memory is in fact evaluated, on the other. This gap may derive from the failure by students of memory to realize that the focus on the faithfulness of memory implies a different memory metaphor--and hence a different approach to memory assessment--than the traditional focus on memory quantity. Indeed, it is unfortunate that in contrast to the storehouse-guided, quantity-oriented assessment of memory, which has benefitted from a great deal of methodological analysis, little systematic effort has been invested in clarifying the unique logic underlying the evaluation of memory correspondence. This lack of methodological base can lead to difficulties and confusions in comparing research findings that cut across the quantity-oriented and accuracy-oriented approaches (see Section 5). More importantly, it is our belief that a careful examination of the unique logic underlying correspondence-oriented memory assessment can provide a first step towards a more effective exploitation of the correspondence metaphor in the practice of memory research, both in the laboratory and in naturalistic settings.
4. The Correspondence-Oriented Evaluation of Memory
In this section, we examine the implications of the correspondence metaphor for the evaluation of memory, and consider several ways in this metaphor can actually be implemented in memory assessment. We distinguish between two general approaches: The analytic approach assumes that memory (or at least memory reports) can be meaningfully sliced into individual, isolated units. This assumption is shared with the storehouse conception, and therefore its adoption in the context of the correspondence metaphor allows both accuracy-based and quantity-based measures of memory to be derived and compared within a common framework. This is the approach taken in our own research which will be sketched further below (see Section 5).
However, it is in the context of the wholistic approach that the correspondence metaphor finds its most unique expression. This approach attempts to avoid the segmentation of experience into separate units, and strives to reach an overall measure of correspondence for the memory output as a configured whole. We begin, then, with a discussion of this approach, which best illustrates the unique flavor of correspondence-oriented memory assessment.
4.1. The Wholistic Evaluation of Memory Correspondence
The logic of the storehouse metaphor is well reflected in the so-called "forgetting functions," which convey the percentage of items recoverable at any point in time. Turning to the correspondence metaphor, are there accuracy measures that can allow us to plot analogous correspondence-based functions, conveying reductions in the overall faithfulness of memory? Unfortunately, the derivation of such measures is no simple task. Once it is admitted that forgetting involves more than the mere omission of items, memory assessment becomes complicated by the fact that memory reports can deviate from the original event in many different ways. Thus, memory researchers emphasizing the qualitative changes that occur in memory have generally confined themselves to the study of specific types of distortions (see, e.g., Bartlett, 1932; Dawes, 1966; Goldmeier, 1982; Loftus, 1982; Riley, 1962) rather than tackle the serious problems inherent in deriving an overall "faithfulness" measure that cuts across various dimensions of miscorrespondence (but see Neisser & Harsch, 1992; Neisser, Winograd, & Weldon, 1991). What would the development of such a measure entail?
Consider a simple situation where several persons are exposed to the same event and each is asked to report what he or she remembers. How should an experimenter proceed who wishes to quantitatively evaluate the overall correspondence of each report to the actual event?
First, the experimenter must identify the relevant aspects or dimensions of the event, their relative importance (weight), and the relative criticality of different types of distortion and error. Clearly, these decisions will depend on the given memory domain, and will need to incorporate functional considerations pertaining to the reasons for remembering and the particular circumstances of the memory report (see Neisser, 1988b, 1988c). Furthermore, it may be necessary to take into account the level of detail or "grain" of the reports, as the correspondence of memory reports to past events can increase dramatically when responses are more general and less detailed (Neisser, 1988b; Yaniv & Foster, 1990, 1994). It also might be helpful to have some theory of the various ways in which memory can go wrong, to aid in the identification and measurement of the relevant dimensions of miscorrespondence (e.g., Bartlett, 1932; Dawes, 1966; Goldmeier, 1982).
Second, given that such decisions have been made, a quantitative assessment model will need to be developed in order to allow for the computation of an overall correspondence score. This model must specify the various aforementioned aspects, as well as operationally define (a) how each dimension of correspondence is to be measured (e.g., Mandler & Johnson, 1977), (b) how the correspondence score should be integrated across dimensions (e.g., Neisser & Harsch, 1992), and (c) how differences in the "grain size" of the reports are to be taken into account (e.g., Yaniv & Foster, 1990, 1994).
A major obstacle to the development of wholistic correspondence measures, then, is that unlike the traditional measures of memory quantity (e.g., percent correct on a free-recall task), which may be applied across a wide variety of testing situations, overall measures of memory faithfulness may need to be theory-specific, domain-specific, and function-specific. (A similar view has been expressed with regard to evaluating the accuracy of social perception; see Kruglanski, 1989.) This, of course, may tend to limit the generalizability of results that are obtained with such measures (cf. Banaji & Crowder, 1989). Despite these difficulties, however, some attempts have been made to develop wholistic correspondence measures in certain circumscribed domains, primarily in the area of memory for visual and spatial information (see, e.g., Allen, Siegel & Rosinski, 1978; Hart, 1979, 1981; Pick & Lockman, 1981; Siegel, 1981; Siegel & Schadler, 1977; Waterman & Gordon, 1984). Here, measures of overall correspondence have been based on pattern-matching techniques to compute the goodness of fit between a particular target stimulus and its reconstruction from memory. Three illustrative procedures will now be considered, which convey some of the distinctive aspects of correspondence-based memory assessment. All pertain to the measurement of distortion in mental maps.
Waterman and Gordon (1984) had subjects draw the map of Israel from memory, and the correspondence of each map to the actual map was assessed with respect to eight clearly identifiable geographic points appearing in all of the map reproductions. After applying transformations designed to neutralize differences in rotation, translation, and scale, an overall "distortion index" was computed for each map in terms of the squared deviations (distances) between corresponding points on the output map and the criterion map. This index was normalized by expressing the measured distortion as the proportion of the maximum distortion possible given the assessment procedure.
In a similar vein, Siegel (1981) presented subjects with a simulated "campus walk" slide show, and then obtained ordinal distance ratings between nine landmarks by the method of multidimensional rank order (Subkoviak, 1975). Using nonmetric multidimensional scaling techniques, the ratings from each subject were transformed into a one-dimensional best-solution map, whose correspondence with the one-dimensional representation of the actual route was computed by a goodness-of-fit procedure that compares the coordinates of points between n-dimensional arrays without regard to array shrinkage, expansion or rotation. Using the resulting congruence index, Siegel found, for example, better memory for scenes with high versus low landmark potential, and for routes that were viewed twice rather than once.
A third example comes from Hart (1979, 1981), who used a somewhat different approach in assessing children's memory for the layout of their home town. Hart obtained memory "reports" from the children by having them build scale models. In evaluating the accuracy of the models, he identified and scored individual constellations of local features, as well as the more global configuration. The average local correspondence score was then multiplied by the global configuration score in order to obtain an "integrated map score." Hart noted, however, that the models varied greatly in the extent of the mapped area, that is, in the amount of information reproduced. He therefore multiplied the integrated score by an "extent of area" score, arriving at an overall "composite map score" for each child.
These modest efforts illustrate some of the distinctive aspects of the wholistic assessment of memory correspondence: Most prominently, in each case the memory report is being assessed in terms of its overall fit with a complex target stimulus. In evaluating such a fit, the experimenter must always (at least implicitly) specify which features of possible correspondence are relevant and which are to be ignored. Thus, in the examples cited above, the overall orientation and scale of the reproduced maps were neutralized before correspondence was assessed, whereas in other cases orientation may be treated as a dimension of interest (e.g., Tversky, 1981). Also in some cases the completeness of the report (e.g., extent of mapped area, number of landmarks) may be important, and the correspondence measure must be adapted to take this aspect into account (e.g., Hart, 1981), or else the extent of the memory output may be controlled by the experimenter (e.g., Siegel, 1981). In other cases, however, the experimenter may be concerned only in evaluating the accuracy of the information contained in the output without regard to its completeness (e.g., Waterman & Gordon, 1984; and see below). Finally, the correspondence measure may need to be expressed in normalized units (e.g., Waterman & Gordon, 1984) in order to facilitate comparisons across different tasks. In sum, unlike traditional quantity measures, which have been designed as all-purpose tools, wholistic correspondence measures must be based on specific assessment models, tailored to particular purposes.
The task of deriving an overall correspondence measure can become even more complicated when attempting to assess the overall correspondence between a real-life event and its verbal reconstruction. Consider, as an extreme example, the derivation of a single quantitative score that reflects the faithfulness of each of the four Rashumon reports in Kurasawa's (1951) celebrated movie. The problem stems from the fact that real life events can submit to a multitude of different descriptions, each of which may be "accurate" depending upon the specific evaluative criteria employed (see McCauley, 1988; Neisser, 1981, 1986, 1988c; for a discussion of similar problems regarding the accuracy of social perception, see Funder, 1987; Kruglanski, 1989). Thus, in Rashumon, the four versions of the crime agreed in many details, but the "critical" aspects, particularly those open to subjective interpretation, differed drastically from report to report. Conversely, there are cases in which a memory report may be inaccurate in reproducing the details, or even the gist of what occurred on a specific occasion, but still convey a "true" impression of what was happening at the time (e.g., John Dean's memory in Neisser, 1981; see also Spence, 1982).
Clearly, then, the development of an overall assessment model capable of dealing adequately with the richness of real-life situations is a formidable task. In complex cases, a tempting option is to rely on subjective global accuracy ratings (e.g., Larsen, 1988; but see Neisser & Harsch, 1992, for reservations). Indeed, an ingenious variation on the use of subjective judgments has been devised for the assessment of memory for faces, by determining the proportion of correct target recognitions achieved by independent judges on the basis of the memory report alone (Ellis, Shepard & Davies, 1975; Wells & Turtle, 1988). This procedure has also been used to compare the effectiveness of verbal descriptions versus photofit reconstructions (Christie & Ellis, 1980).
Of course, other techniques might also be envisaged. In fact, it is odd that the problem of measuring accuracy has received so little attention from students of memory, compared, for instance, to the systematic analysis it has received from students of social perception and social judgment. In the latter domains, numerous papers have specifically addressed the conceptual and methodological issues that arise in measuring the overall accuracy and inaccuracy of interpersonal (and intrapersonal) judgments (e.g., Cronbach, 1955; Funder, 1987; Kenny, 1991; Kenny & Albright, 1987; Kruglanski, 1989; Sulsky & Balzer, 1988). This work should be of interest to memory researchers as well (see also, Ross, in press, for an interesting social-psychological treatment of the issue of memory accuracy). Sulsky and Balzer (1988), for example, compared five different conceptual and operational definitions of judgmental accuracy, all of which involved comparing a subject's judgments to a set of criterion judgments. An even wider range of conceptions of judgmental accuracy is examined by Kruglanski (1989; and see Ross, in press).
Overcoming the conceptual and methodological hurdles that impede the development of wholistic correspondence measures poses a crucial challenge for memory research. In principle, such measures could supply the needed tools for a bona fide psychology of memory correspondence, to parallel the quantity-oriented tradition. For example, they could enable researchers to trace the course of forgetting over time in the sense of a reduction in the overall faithfulness of memory, to examine the effects of a variety of factors on the rate of such forgetting, to study individual and group differences in memory accuracy, to explore the effectiveness of different questioning procedures in improving the faithfulness of memory reconstructions, and so forth. This is the type of approach that would seem to follow most naturally from many discussions of everyday memory, as well as from the correspondence metaphor itself (see further discussion in Section 6). Because such data would undoubtedly be of great value, their absence from the memory literature is conspicuous. Greater efforts in this direction are certainly called for.
4.2. The Analytic Evaluation of Memory Correspondence
The second, more common approach to the evaluation of memory correspondence is the analytic approach. Indeed, in light of the complexities described above, it is not surprising that most research on memory accuracy has shied away from a direct confrontation with the wholistic assessment of memory faithfulness. In the study of autobiographical memory, for instance, most researchers still treat memory "as if it were just a set of remembered concrete experiences" (Neisser, 1988c, p. 356). Such a treatment, which follows from the storehouse metaphor, is less suited to the evaluation of memory correspondence. Nevertheless, many of the unique concerns of correspondence-oriented memory assessment can still be expressed even when the analytic approach is adopted. Thus, particularly in view of the widespread use of analytic memory measures, it is important to examine how accuracy-oriented and quantity-oriented memory assessment differs in the context of this approach.
As a framework for our analysis, we assume an item-based memory testing situation (see Puff, 1982) in which the target information has been segmented into discrete units (items or propositions). Quantity-based and accuracy-based memory measures may then be distinguished as follows: Quantity measures are input-bound, assessing the likelihood of correctly remembering (recalling, recognizing, etc.) an input item. Accuracy measures, in contrast, focus on the dependability of the reported information. Hence, they are output-bound, reflecting the likelihood that a reported item is "correct," i.e., corresponds to the input (see Koriat & Goldsmith, 1994a). Whereas input-bound measures are traditionally used to estimate the amount of stored information that can be recovered, output-bound measures, being conditional on the output, evaluate the accuracy of the information that is reported. These latter measures are of particular interest in situations such as eyewitness testimony, in which the dependability of reported information is often no less important than its amount (see, e.g., Deffenbacher, 1988, 1991; Fisher et al., 1989; Hilgard & Loftus, 1979; Loftus, 1979a; Wells & Lindsay, 1985; Wells & Loftus, 1984). As we shall now show, in some cases the two types of measures yield equivalent scores, though each implies a different "attitude" in its interpretation, whereas in other cases they differ operationally as well.
4.2.1. The intent to measure accuracy
Consider a simple situation in which memory is tested for only one single item of information using a forced-choice procedure. For example, in the well-known study by Loftus, Miller, and Burns (1978), subjects were required to decide whether the traffic sign in the witnessed event was a stop or a yield sign. In that study and others like it (e.g., Boon & Davies, 1988; Wagenaar & Boer, 1987), memory accuracy is assessed simply by noting whether the provided answer is correct or incorrect. This might be compared to the hypothetical case in which a studied list of paired associates is followed by a single probe and two alternative responses, e.g., "SIGN - STOP/YIELD," and the intention is to assess memory quantity. Operationally, the two measures, accuracy and quantity, are equivalent; the difference between them is solely a matter of the experimenter's intent. Whereas in the former case the test is designed to examine whether the person's memory is a faithful reproduction of the witnessed event, in the latter case the intent has traditionally been to determine whether the designated item is still in store and accessible.
The same is also true when memory for a list of items is tested through a forced-choice procedure. For example, assuming that the memory for 20 items of information is tested, and 12 items are answered correctly, then the likelihood of correctly remembering each item (quantity) and the likelihood that each answer is correct (accuracy) are both .60. In general, then, when a forced-choice item-based procedure is employed, the same exact test can be used as a measure of either accuracy or quantity, and will yield the same memory performance score regardless of which property is intended.
How can the researcher's intent be distinguished in such cases? The intent to measure accuracy rather than quantity is sometimes explicitly stated by the investigator, as is typically the case in eyewitness research. In other cases, however, it can only be inferred from a variety of cues that disclose the implicit treatment of the subject's responses. For example, an analysis of memory errors often discloses a focus on accuracy. Also, asking subjects to report how confident they are in the answer they chose (see Lichtenstein, Fischhoff, & Phillips, 1982; Nelson & Narens, 1990) may imply that the subjects' responses are being treated as propositional statements having a truth value. In general, however, the differences can be quite subtle. 4.2.2. Distinguishing accuracy measures operationally
Notwithstanding these subtleties, there are two conditions in which item-based accuracy and quantity measures differ operationally as well. The first is when the stimulus information solicited from the subject may be evaluated on a continuous dimension, and the second is when the option to reply is under the subject's control.
4.2.2.1. Dimensional accuracy. As noted earlier, storehouse-inspired quantity measures are typically based on some type of counting operation, scoring individual items in a dichotomous (present/absent or correct/incorrect) fashion. Accuracy, in contrast, is more graded in nature, admitting different degrees of deviation from veridicality for any continuous or ordered dimension. For instance, given that the height of a burglar was actually 5 feet 8 inches, a report of 6 feet is clearly less accurate than a report of 5 feet 10 inches.
It is easy to overlook the fact that dimensional accuracy assessment is actually quite foreign to the storehouse metaphor, and implies a correspondence metaphor instead. Indeed, the measurement techniques themselves are often borrowed from the study of perception, and are most readily applied when the memory target is the value of some perceptual attribute. Thus, many studies of memory for visual form and spatial information have evaluated subjects' memory reports with respect to such biases as increased closure and symmetry, changes in orientation, angular and radial deviation, etc. (e.g., Bartlett, 1932; Byrne, 1979; Goldmeier, 1982; Huttenlocher, Hedges, & Duncan, 1991; McNamara, 1986; Nelson & Chaiklin, 1980; Riley, 1962; Tversky, 1981; Tversky & Schiano, 1989). Also, studies on the "psychophysics of memory" (e.g., Algom & Cain, 1991; Algom, Wolf, & Bergman, 1985; Kerst & Howard, 1978; Moyer, 1973; Moyer & Dumais, 1978), have shown, for example, how "memory scale values map onto their physical referents via the same functional relation (power transform) as perceptual scale values do" (Algom et al., 1985, p. 468). More generally, dimensional accuracy has been investigated with regard to memory for the date and time of past events (e.g., Baddeley, Lewis, & Ninno-Smith, 1978; Huttenlocher, Hedges, & Bradburn, 1990; Huttenlocher, Hedges, & Prohaska, 1988; Linton, 1975; Loftus & Marburger, 1983; White, 1982), and for a variety of other variables ranging from SAT scores (Bahrick et al., 1993) to the height of Mt. Everest (Yaniv & Foster, 1994).
Dimensional accuracy assessment was already implied in the preceding discussion of the wholistic assessment of memory correspondence. However, when only a single attribute is of concern, many of the problems involved in integrating information across dimensions can be avoided while still adhering to a correspondence type of measurement.
4.2.2.2. The option of free report: Input-bound versus output-bound measures. The second case in which accuracy and quantity measures operationally differ applies to the more standard item-based procedures in which memory is tested for a set of dichotomously scored items. As indicated above, with such procedures, a forced-choice memory report yields equivalent performance whether it is evaluated for quantity or accuracy. This evaluation can differ substantially, however, under free-report conditions, where subjects are free to volunteer or withhold information.
Consider, for example, an eyewitness who is asked to remember which people she saw at the scene of a crime, and reports that she saw A, B, and C. If A, B, and C were indeed present, then this testimony is entirely accurate. The fact that other people, D and E, were also present but were not reported by the witness will not detract from the (output-bound) accuracy of the information that was provided. In contrast, construed as a free-recall task intended to tap the (input-bound) amount of information that can be reproduced, reporting only three people out of five will obviously count against the reporter. Thus, for quantity-based measures, omission errors (failing to report an item of information, i.e., loss of information) are the more serious errors, whereas for accuracy measures, commission errors (falsely reporting something that did not occur, i.e., loss of dependability) are critical and omissions may be ignored.
In general, input-bound and output-bound measures will be equivalent when the output list (e.g., people reported as being present at the scene of a crime) is the same length as the input list (people actually present), and such is the case when a forced-report procedure is used. Under free-report conditions, however, the option to reply is controlled by the subject, and therefore input-bound and output-bound measures may be expected to differ. Thus, the operational distinction between output-bound accuracy (i.e., "dependability") and input-bound quantity is applicable in the context of the standard item-based approach, but only when the subject is free to decide whether to volunteer or withhold specific items of information.
The role of report option in differentiating accuracy-based and quantity-based memory measures illustrates how a concern with memory correspondence may bring to the fore issues that are less intrinsic to a storehouse framework--in this case, the active role of the rememberer in controlling his or her memory output. Indeed, the issue of subject control may be helpful in elucidating the proposed relationship between memory metaphors and the methods, content, and context of memory research. For example, as will be discussed later (see Section 6.1), not only are conceptual metaphors instrumental in dictating the methods by which memory should be studied and evaluated, but there is also a reciprocal relationship, in which the methods may illuminate and elaborate certain aspects of the metaphor. Thus, although subject control over memory performance is perhaps an optional aspect of the correspondence metaphor (cf. the "picture" or "copy" metaphors argued against by Neisser, 1967), as a practical matter, such control can have a substantial impact on measures of memory accuracy (particularly in naturalistic settings; see following section). This, in turn, calls for a more "active" correspondence conception, leading to a focus on further, substantive issues that might otherwise be overlooked.
In the following section, we summarize experimental work that we have done which demonstrates more concretely the importance of the correspondence-storehouse distinction in the reality of item-based memory research. Although the item-based approach is restrictive for the evaluation of memory correspondence, it has the advantage of allowing some of the important features that distinguish the accuracy-oriented and quantity-oriented approaches to memory (e.g., differences in subject control) to be directly compared within a common framework.
5. The Accuracy-Oriented and Quantity-Oriented Approaches in Item-Based Research
In the work reviewed below, we first show how interest in the two memory properties, accuracy and quantity, tends to be confounded not only with the contrast between the everyday and laboratory approaches, but also with other important dimensions of memory assessment. This analysis will help clarify some apparent inconsistencies that arise in comparing accuracy-oriented, naturalistic findings and quantity-oriented, laboratory findings. We then focus on the issue of subject control over memory reporting--its unique role in the accuracy-oriented study of memory, and the challenges that it provides for memory research and assessment. Finally, we consider some of the implications of this work for the everyday-laboratory controversy.
5.1 Dimensions of Memory Assessment
Our experimental work (Koriat & Goldsmith, 1994a, 1994b) was originally motivated by some apparent discrepancies between findings obtained in the laboratory and those obtained in naturalistic contexts. In attempting to resolve these discrepancies, we identified four dimensions of memory assessment which tend to be confounded in the reality of memory research. The first is the context of inquiry dimension--laboratory vs. real life. The second is the memory property of interest--quantity vs. accuracy. As discussed throughout this paper, much of the research carried out under the banner of everyday memory reveals a special concern with the accuracy or dependability of memory, in contrast to the predominant focus on memory quantity in traditional, laboratory-based research.
The third dimension is report option, the extent to which subjects are allowed to control their memory reporting by choosing which items of information to volunteer and which to withhold. As noted above, this dimension is crucial for the operational distinction between accuracy and quantity memory measures. Moreover, subject control over memory reporting is also generally confounded with the context-of-inquiry dimension: In traditional laboratory research, perhaps because of the high premium placed on experimental control (Banaji & Crowder, 1989; Nelson & Narens, 1994), subjects are generally given relatively little control over their memory reporting. In contrast, in naturalistic research situations, as in everyday life, people are typically allowed much more freedom in choosing what aspects of the event to relate, which to play down or ignore entirely, what perspective to adopt, how hard to try and get the details right, and so forth (see, e.g., Fisher & Geiselman, 1992; Hilgard & Loftus, 1979; Neisser, 1981, 1988b; Nigro & Neisser, 1983). Such control can substantially enhance output-bound accuracy, and indeed, in everyday contexts, for instance in the type of free-narrative reporting commonly used to obtain accurate reports from witnesses (see Hilgard & Loftus, 1979), it may be employed precisely towards that end.
The fourth dimension is test format. Test format refers to whether subjects produce their own answers, or instead must select or recognize a response from among those provided by the experimenter. As will be discussed shortly, this dimension, too, is often confounded with the previously mentioned factors, complicating further the comparison of results between laboratory and everyday settings.
A study by Neisser (1988b) can help illustrate some of the issues that stem from these confoundings. Neisser examined memory for real-life events that took place during the course of a seminar that he taught. Memory was assessed using either a cued recall or a multiple-choice recognition procedure. Neisser found recall memory to be much more accurate than recognition memory, and pointed out that such a finding might come as a surprise to traditional memory researchers, who are accustomed to the general superiority of recognition memory found in laboratory studies.
Neisser's finding brings to the fore some of the potential sources of confusion in the comparison of findings obtained in naturalistic and laboratory research contexts, and illustrates what we have called, the recall-recognition paradox (Koriat & Goldsmith, 1994a). On the one hand, the established wisdom in eyewitness research holds that testing procedures involving recognition or directed questioning can have "contaminating" effects on memory (see, e.g., Brown et al., 1977; Gorenstein & Ellsworth, 1980; Hilgard & Loftus, 1979; Loftus, 1979a, 1979b; Loftus & Hoffman, 1989). In fact, the general recommendation is to elicit information initially in a free-narrative format before moving on to directed questioning, and even then, to place greater faith in the former (see Fisher, Geiselman & Raymond, 1987; Hilgard & Loftus, 1979). On the other hand, however, this body of evidence stands in seeming defiance of the well established superiority of recognition over recall memory in traditional, list-learning laboratory experiments (e.g., Brown, 1976; Shepard, 1967; but see Tulving & Thomson, 1973). Thus, this discrepancy, and Neisser's finding, could be taken as yet one more example of the importance of factors specific to the context of inquiry, real-life versus laboratory, supporting the claim that memory behaves differently in the two settings (e.g., Baker-Ward et al., 1993; Conway, 1991, 1993).
The discrepant findings, however, may also implicate the concern with the two different memory properties, accuracy versus quantity: In Neisser's (1988b) study, for instance, as in many naturalistic studies, the focus is on memory accuracy, whereas traditional memory research has focused almost invariably on memory quantity. Thus, Neisser's recall subjects were more accurate than the recognition subjects in the sense that what they reported was almost never wrong, but as Neisser also pointed out, they did not provide much information either. Such findings could therefore reflect an interaction between memory property and test format that would be obtained, perhaps, regardless of the research context: Recognition yields better quantity performance than recall testing, but recall yields better accuracy (see also Hilgard & Loftus, 1979; Lipton, 1977).
An additional complication, however, stems from the common confounding between test format and report option. This confounding is evident in the reality of both naturalistic and laboratory research: In naturalistic research, for instance, free-narrative reporting not only guards against "leading" or contaminating information (a test-format variable), it also allows the witness the freedom to choose what information to report, and at what level of generality. Directed questioning, on the other hand, often involves explicit or implicit demands that an answer be provided. Similarly, traditional item-based laboratory research almost invariably implements recognition testing as forced recognition in two distinct respects: Not only are subjects confined to the alternatives presented (test format), they are also forced to answer each and every item (report option). In contrast, recall testing typically allows subjects the freedom to decide both how and whether to report what they remember.
A further possibility, then, is that the recall-recognition paradox actually reflects an interaction between report option and memory property. Indeed, Neisser (1988b) pointed out that his recall subjects seemed to achieve greater accuracy by providing fewer answers. In addition, they might also have utilized a different aspect of report option to boost their accuracy: control over the "grain size" or generality of their responses (cf. Yaniv & Foster, 1990, 1994). Clearly, the correspondence between memory reports and past events can improve when the answers are more general and less detailed. Thus, Neisser observed that his recall subjects tended to choose "a level of generality at which they were not mistaken" (1988b, p. 553).
5.2 Disentangling Memory Property, Report Option, and Test Format within a Laboratory Research Context
The confoundings implicated in the foregoing discussion led us to propose a three-factor classification of item-based memory assessment methods in terms of memory property, report option and test format (see Figure 1). In proposing this scheme (Koriat & Goldsmith, 1994a), we tried to show how it could serve as a guide in disentangling some of the empirical confusions discussed above, and also provide an integrative framework that might be exploited in future research. In what follows, we sketch some findings obtained within this framework (Koriat & Goldsmith, 1994a, 1994b), focusing on their implications for the distinction between the accuracy-oriented and quantity-oriented approaches to memory.
----------------------------------------- Insert Figure 1 about here -----------------------------------------
In one experiment (Koriat & Goldsmith, 1994a, Experiment 1), subjects answered 60 general-knowledge questions in either a recall or a multiple-choice recognition format (all items required a one-word answer in order to equate the "grain size" of the answers across the two test formats). In addition to the standard tests of free recall and forced-choice recognition, however, two relatively uncommon procedures were added: forced recall (requiring subjects to respond to all questions), and free recognition (permitting subjects to skip items). In this design, then, test format and report option were orthogonally manipulated. A payoff schedule provided all subjects with a common performance incentive, essentially rewarding them for each correct answer, but penalizing them by an equal amount for each incorrect answer. Performance was scored both for quantity (input-bound percent correct) and for accuracy (output-bound percent correct).
The results, superimposed on the classification in Figure 1, disclose several trends: When comparing the standard memory measures, free recall and forced recognition, our results replicated the pattern implicated in the recall-recognition paradox: Recall was indeed superior to recognition on the accuracy measure (g vs. f), but recognition was superior to recall on the quantity measure (b vs. c). However, examination of the remaining means indicates that although memory quantity performance does vary with test format, recognition better than recall (b vs. a, and d vs. c), it is report option that is critical for memory accuracy: The option of free report increased accuracy performance for both recall and recognition testing (g vs. e, and h vs. f). In fact, under free-report conditions (in which memory accuracy and quantity measures can be operationally distinguished), test format had no effect at all on memory accuracy: Given equal opportunity to screen their answers, the recall and recognition subjects achieved virtually identical accuracy scores (g vs. h)!
This basic pattern was replicated in several further experiments, employing both list-learning (episodic) and general knowledge (semantic) memory tasks (Koriat & Goldsmith, 1994a, 1994b). What are the implications of these findings? First and foremost, the results demonstrate the importance of distinguishing between the two memory properties, memory quantity and memory accuracy. These properties were found to be dissociable: Test format affected quantity performance but not accuracy, whereas report option affected accuracy but not quantity. Thus, it is clear that one can neither compare nor investigate "memory performance" without specifying the memory property of interest and the particular testing conditions under which it is evaluated. Second, the results question the general belief among everyday memory researchers that recognition testing is necessarily detrimental to memory accuracy. Although this may often be the case, it could be due primarily--perhaps entirely--to the typical confounding of recognition testing with forced report. Third, because the superior accuracy of free recall over forced recognition characteristic of naturalistic research was obtained in these experiments within a typical laboratory setting, it would appear that at least some of the underlying dynamics are not uniquely tied to real-life contexts (see further discussion in Section 5.4).
More generally, the results highlight the utility of the distinction between the accuracy-oriented and quantity-oriented approaches to memory assessment even within the standard item-based research framework. Beyond helping to unravel some of the confusions involving memory property and the other factors, the results underscore the need for a more careful consideration of some of the unique emphases of the accuracy-oriented approach, most notably, the active role of the subject in controlling his or her output-bound memory correspondence. We shall now consider the ramifications of such control for the correspondence-oriented study of memory.
5.3. The Strategic Regulation of Memory Performance
It is perhaps not cooincidental that the issue of subject control figures prominently both when comparing everyday and laboratory research in general, and when contrasting the accuracy-oriented and quantity-oriented approaches to memory in specific. As noted earlier, the investigation of real-life memory phenomena sometimes requires a compromise between the desire for strict experimental control, on the one hand, and the wish to remain true to the natural dynamics of the memory phenomena being investigated, on the other (see Gruneberg & Morris, 1992). Hence, in recent years there seems to be a greater willingness among students of real-life memory to allow subjects control over their memory reporting, as is seen, for instance, in the use of free-narrative and other open-ended questioning techniques which are seldom employed in traditional memory research (see Hilgard & Loftus, 1979).
In parallel, at the empirical level, the effects of subject control on memory performance also appear to differ markedly between the quantity-oriented and accuracy-oriented research contexts. On the one hand, quantity-oriented memory research suggests that subjects have little control over their memory performance: First, subjects cannot improve their memory quantity scores when given incentives to do so (e.g., Nilsson, 1987; Weiner, 1966a; 1966b). Second, encouraging or even forcing recall subjects to produce more items (by relaxing their criterion) does not seem to improve their memory quantity scores much or at all beyond what is obtained under standard instructions (e.g., Bousfield & Rosner, 1970; Erdelyi, Finks, & Feigin-Pfau, 1989: Roediger & Payne, 1985; Roediger, Srinivas, & Waddil, 1989). On the other hand, in sharp contrast to these results, accuracy-based findings (e.g., Barnes, Nelson, Dunlosky, Mazzoni, and Narens, 1994; Koriat & Goldsmith, 1994a, 1994b) indicate that accuracy performance is under strategic control: Not only can subjects improve their accuracy considerably when offered the option of free report (as discussed above), they can also increase their accuracy even further when given stronger incentives to do so. For example, in one experiment (Koriat & Goldsmith, 1994a, Experiment 3), we used the same free-report procedure described earlier, but this time subjects sacrificed all winnings if they volunteered even a single incorrect answer. Accuracy increased substantially compared to our earlier experiment, averaging over 90% for both recall and recognition (fully one quarter of the subjects were successful in achieving 100% accuracy)! This improvement, however, was attained at a cost in quantity performance (about a 25% reduction for both recall and recognition). Similar results were obtained using a 10:1 penalty-to-bonus payoff ratio (Koriat & Goldsmith, 1994b).
Clearly, then, subject control over memory reporting should be of special concern in correspondence-oriented memory research. In recounting their experiences, people can apparently regulate their reporting towards the enhancement of its accuracy (among other goals; see Neisser, 1988b; Ross & Buehler, in press): They may report only information about which they are confident (Barnes et al., 1994; Koriat & Goldsmith, 1994a, 1994b), or adopt a level of generality at which they are not likely to be wrong (Neisser, 1988b; Yaniv & Foster, 1990, 1994). This creates two fundamental challenges for students of memory correspondence: First, how can subject-controlled regulatory processes be made amenable to experimenter-controlled, scientific investigation? Second, given that memory correspondence is under the control of the subject, how can such control be accomodated by our methods of memory assessment?
5.3.1. Investigating subject control over memory reporting
As indicated above, in our work we attempted to address these questions by focusing on one specific type of subject regulation--the withholding or volunteering of particular items of information in free-report situations. As a framework for investigating such regulation, we put forward a model of monitoring and control processes which merges ideas from signal-detection theory with ideas from metamemory research (Koriat & Goldsmith, 1994b). We assume that when attempting to recount a past event, people monitor the subjective likelihood that an item of information that comes to mind is correct, and then apply a control threshold to the monitoring output in order to decide whether to volunteer that item or not. The setting of the control threshold is assumed to depend on the relative utility of providing complete versus accurate information. Several results supported the model: First, the tendency to report an answer increased strongly with increasing confidence in the correctness of the answer. Second, subjects given a high accuracy incentive (a 10:1 penalty-to-bonus ratio) were more selective in their reporting, adopting a stricter criterion than subjects given a more moderate incentive (a 1:1 ratio). Third, by employing these monitoring and control processes, subjects were able to enhance their memory accuracy under free-report conditions, but the accuracy improvement generally came at the expense of quantity performance.
These results can help shed light on some of the mechanisms underlyling the strategic control of memory accuracy. In particular, one aspect of the work is especially useful in demonstrating how the correspondence-oriented concern with memory accuracy leads to very different research emphases than the traditional concern with memory quantity: Despite the intuitive appeal of a criterion-based quantity-accuracy tradeoff (implied by signal-detection theory), our work shows that neither the accuracy advantage that typically derives from subject control over memory reporting, nor the quantity costs of such control, are inevitable. Rather, these effects depend critically on both accuracy motivation and monitoring effectiveness. These two factors have been virtually ignored in traditional, quantity-oriented memory assessment (which might in fact account for Roediger, Srinivas, & Waddil's, 1989, observation that a recall-criterion effect on quantity performance is "intuitive, but remarkably little evidence for it exists," p. 255). Let us, then, expand briefly on these two factors..
Consider first accuracy motivation. In previous, quantity-oriented investigations of motivational effects on memory (e.g., Nilsson, 1987; Weiner, 1966a, 1966b), the incentive manipulations were explicitly designed to increase memory quantity performance, and null effects were taken to imply that motivation "does not affect memory performance" (Nilsson, 1987, p. 187). By contrast, our results indicate that when the focus is on accuracy performance, the effects of accuracy motivation on both accuracy and quantity measures can be substantial. Similarly, in the previous demonstrations of null or very small effects of recall criterion on memory quantity performance mentioned earlier, there was no special motivation for accuracy in either the experimental or control conditions (forced-report vs. standard free-recall instructions). Had those studies, like our experiments, included a condition with a strong incentive for accuracy, they too would undoubtedly have found the ensuing changes in criterion level to yield substantial effects on both quantity performance and accuracy performance. However, in those studies, accuracy performance, and hence accuracy motivation, were of no direct interest.
The second factor that should be of special concern in accuracy-oriented research is the effectiveness of subjects' memory monitoring. Although this factor has attracted much attention among students of metacognition (see Metcalfe & Shimamura, 1994), its performance consequences have received relatively little attention (but see Barnes et al., 1994; Bjork, 1994; Metcalfe, 1993; Nelson & Narens, 1994). In the context of our research, monitoring effectiveness refers to the correspondence between the subjects' confidence regarding a candidate answer and the actual probability that the answer is correct. Importantly, this factor is distinct from the amount of information retained. To illustrate, consider a relatively difficult memory test, for which a subject fails to remember the answer to many items. Even though memory retention (and hence quantity performance) may be poor, the subject's monitoring of the correctness of their answers could still be perfect. In that case, the option of free report would allow him or her to volunteer only (the few) correct answers, achieving perfect accuracy with no tradeoff. On the other hand, the subject's monitoring might be very poor as well, in which case utilizing the option of free report should not enhance his or her accuracy much or at all, and would only reduce quantity performance.
The important implication of this analysis is that monitoring effectiveness can influence memory performance, particularly memory accuracy, independent of what might be called memory "retention." Thus, in one experiment (Koriat & Goldsmith, 1994b, Experiment 2) we manipulated monitoring effectiveness by using two different sets of general-knowledge items: One set (the "poor" monitoring condition) consisted of items for which the subjects' confidence judgments were expected to be generally uncorrelated with the correctness of their answers (see Fischhoff et al., 1977; Gigerenzer, Hoffrage, & Kleinbolting, 1991; Koriat, in press), whereas the other set (the "good" monitoring condition) consisted of more typical items, for which the subjects' monitoring was expected to be more effective. As predicted, although the two sets were matched on retention, as indexed by forced-report quantity performance, the good monitoring condition allowed subjects to attain a far superior joint level of free-report accuracy and quantity performance: Much better accuracy performance was achieved while maintaining equivalent quantity performance, compared to the poor monitoring condition (see Figure 2).
----------------------------------------- Insert Figure 2 about here -----------------------------------------
Our work, then, indicates that free-report memory measures tap the operation of memory components that are not disclosed by forced-report measures, and that these components have a critical role in the strategic regulation of memory accuracy. In particular, accuracy motivation and monitoring effectiveness emerge as crucial factors in determining memory accuracy under free-report conditions, and in modulating the rate of the quantity-accuracy tradeoff. Thus, this work motivates a greater concern with both the determinants and the accuracy of metacognitive processes, as they affect memory performance (see also, Nelson & Narens, 1994).
5.3.2. Incorporating subject control into the assessment of memory correspondence
The issue of subject control also presents a dilemma with regard to memory assessment: How can we sensibly assess a person's memory for an event if memory performance, particularly memory accuracy, is under the person's control? This issue is not just methodological, but also metatheoretical: The question is whether intervening activities on the part of the subject, such as deciding to volunteer or withhold information, are to be conceived as operations that are superimposed on memory (see Klatzky & Erdelyi, 1985; Lockhart & Craik, 1990), or rather, as being part and parcel of memory itself (see Nelson & Narens, 1994; Tulving, 1983).
As alluded to earlier, traditional storehouse-guided research has tended to treat subject control as a nuisance variable that should be eliminated or partialled out in order to achieve a pure measure of "true" memory. Indeed, Nelson and Narens (1994) note that "ironically, although the self-directed [subject-controlled] processes are not explicitly acknowledged in most theories of memory, there is an implicit acknowledgment on the part of investigators concerning the importance of such processes. The evidence for this is that investigators go to such great lengths to design experiments that eliminate or hold those self-directed processes constant via experimental control!" (p. 8).
Even when self-directed processes are not experimentally controlled, an attempt is often made to partial out their effects in order to derive a pure measure of memory. For instance, the signal-detection measure of true memory (d') is designed to provide an estimate of memory strength (or sensitivity) that is "unbiased" by variation in (commonly referred to as "response bias"). In this sense, signal-detection methods are often used like other techniques that correct for the effects of guessing (e.g., Budescu & Bar-Hillel, 1993; Cronbach, 1984; also see Koriat & Goldsmith, 1994a; and see further discussion below).
Such a treatment, however, would seem unsuitable for the evaluation of memory under the correspondence metaphor. Because subject-controlled metamemory processes actually constitute an important means of managing one's memory correspondence, they cannot simply be avoided or partialled out. On the contrary, when the researcher is explicitly concerned with the faithfulness of memory, and in particular, with the dependability of memory reports in real-life settings, it would seem imperative to treat the ongoing regulation of memory performance as an intrinsic aspect of memory functioning (Neisser, 1988b; see also Barnes et al., 1994; Koriat & Goldsmith, 1994a, 1994b; Metcalfe & Shimamura, 1994; Nelson & Narens, 1990, 1994). An important challenge, then, is to develop ways of making the contribution of metamemory processes explicit in the evaluation of memory performance.
One method that we have proposed was, in fact, already illustrated at the group level in Figure 2. Rather than seek a single point estimate of "true" memory, this method incorporates metamemory processes into memory assessment by charting memory performance profiles that take retention, monitoring, and control into account (Koriat & Goldsmith, 1994b). This approach resembles that of plotting memory operating characteristic (MOC) curves using signal-detection techniques (but see below). Like an MOC curve, the proposed quantity-accuracy profile (QAP) describes the joint levels of quantity and accuracy performance that can potentially be achieved under different conditions.
Also, like MOC curves, QAP's too can be plotted at the individual level. Consider, for example, the two QAP's depicted in Figure 3, which were computed for two selected recall subjects from our recent study (Koriat & Goldsmith, 1994b). If we were to look only at forced-report performance (criterion = 0) as a point-estimate of memory retention, subject B's performance would clearly be better than A's. Similarly, if we were to look only at the subjects' actual free-report memory scores (ignoring or perhaps overlooking important differences in accuracy motivation), B would be seen to achieve about equal memory quantity, but far superior accuracy than A. The profiles, however, offer much more than this. First of all, the QAP makes quite clear that the two subjects adopted different criteria in controlling their actual free-report responding. (In fact, Subject A responded under a moderate accuracy incentive, whereas B responded under a strong accuracy incentive.) Second, looking at potential memory performance, not only is B's potential quantity performance superior to A's across the range of criterion levels, but B's better monitoring effectiveness (.87 vs. .64 on the ANDI measure; see Yaniv et al., 1991), allows a high level of accuracy to be achieved at virtually no cost in the number of correct answers provided. On the other hand, A is potentially able to achieve 100% accuracy (though at a substantial quantity cost) when strongly motivated for accuracy, while B's maximal accuracy falls somewhat short of this. Ultimately, then, in order to evaluate the relative effectiveness of A's and B's memory, we will need to take into account functional considerations pertaining to the circumstances of the memory report. For instance, as a key witness in a capital trial, we might actually prefer A's memory, because of the very high premium placed on memory accuracy in such situations.
----------------------------------------- Insert Figure 3 about here -----------------------------------------
Compared to the standard point measures of memory performance, the derivation of quantity-accuracy profiles allows a more global evaluation of potential memory performance in terms of both accuracy and completeness. This approach also illustrates one way in which memory assessment can accommodate the contribution of subject-controlled metamemory processes to overt memory performance. Indeed, QAP's may be used to separate the sources of individual or group differences (e.g., developmental changes) and the effects of different manipulations on memory retention, monitoring, and control in a manner similar to the way in which signal-detection methods allow one to distinguish differential effects on d' and (see Koriat & Goldsmith, 1994b).
This brings us to an important point, however, which we should now address. Clearly, there is an overall resemblance between our proposed framework and the signal-detection approach to memory measurement (see, e.g., Banks, 1970; Bernbach, 1967; Kintsch, 1967; Klatzky & Erdelyi, 1985; Lockhart & Murdock, 1970; Murdock, 1966; Norman & Wickelgren, 1969). Signal detection is often thought to represent an "accuracy-oriented" approach to memory, and indeed, it raises many of the same issues brought out here. Yet, despite the apparent similarities, there are several fundamental differences that should be emphasized.
First and foremost, the signal-detection methodology cannot be applied at all to free-report situations, precisely because in such situations subjects have the option to decide whether to volunteer or withhold information (see Lockhart & Murdock, 1970). In fact, the response criterion () addressed by the signal-detection methodology is not whether to respond or not, but rather, whether to respond "old" (studied) or "new" (foil) to each and every item (under forced report). Thus, while the signal-detection approach has contributed greatly to a consideration of the role of decision processes in forced-recognition memory, it actually has little to say regarding the accuracy of a person's freely reported remembrances.
Second, the signal-detection methods do not separate between "retention" and "monitoring" (see Koriat & Goldsmith, 1994b; Lockhart & Murdock, 1970). For instance, in the forced-report old/new paradigm to which signal-detection methods are typically applied, "control" is isolated in terms of the parameter , yet "retention" (overall memory strength) and "monitoring effectiveness" (the extent to which the subject's confidence distinguishes "old" from "new" items) are operationally equivalent: Both are equally valid interpretations of d' (see, e.g., Banks, 1970; Lockhart & Murdock, 1970). By contrast, as discussed above, in our approach to free-report performance, these latter two aspects (as well as control) may be evaluated independently: One may have good monitoring resolution, yet very poor retention, or vice versa.
Third, although the logic of signal detection can be extended to free-report tasks (see Klatzky & Erdelyi, 1985; Koriat & Goldsmith, 1994b), the motivation for doing so has generally been to control for criterion effects (differences in accuracy) when comparing quantity measures (e.g., by using forced-recall procedures; Erdelyi & Becker, 1974), rather than to measure memory accuracy as a property of interest in its own right (see also, Koriat & Goldsmith, 1994a). Thus, for instance, in discussing the possible implications of a null effect of recall criterion on memory quantity performance, Erdelyi et al. (1989) observe: If response bias fails to affect recall performance level, forced-recall procedures for controlling shifts in productivity (e.g., Erdelyi & Becker, 1974), which subjects find tedious, can be dispensed with in favor of standard free recall. Of more importance, the difficulty of applying forced recall to complex, ecologically more valid materials than stimulus lists--such as prose passages or actual real-life events--becomes moot, and recall performance, irrespective of intrusion levels, could be trusted to reflect true recall level. Would that this were so, for no proven methodologies have been worked out for controlling response bias in the recall of complex stimuli for which forced recall is unwieldy or inapplicable . . . (p. 246, emphasis added).
As already noted, this same quantity-oriented attitude is also generally evidenced in the evaluation of forced-report recognition memory, in which case the signal-detection methodology can actually be applied.
Overall, then, our work demonstrates how some rudimentary aspects of subject control in situations of free memory reporting can be experimentally studied and taken into account in the assessment of memory correspondence. It also illustrates how even within an item-based framework, the accuracy-oriented approach to memory brings to the fore questions that might be neglected when the focus is strictly on memory quantity. Of course, in everyday life, people have many more means available to manage their memory correspondence than just the simple option of volunteering or withholding specific pieces of information (cf. John Dean's memory in Neisser, 1981). Thus, a better understanding of the faithfulness of memory in real-life contexts will require greater efforts to bring these other aspects of subject control under systematic investigation.
5.4. Implications for Naturalistic and Laboratory-Based Research
In concluding this section, let us reconsider how the research discussed above bears upon the context-of-inquiry issue, naturalistic versus laboratory.
As noted earlier, many everyday memory researchers stress factors related to the context of research in explaining differences in memory performance (e.g., Baker-Ward et al., 1993; Fivush, 1993; Conway, 1991, 1993; Neisser, 1988b). These factors generally have to do with the functional role of remembering in naturalistic settings. Neisser (1988b), for instance, argued that in real-life situations, forces operating at the time of remembering may have much more impact than has been acknowledged in traditional laboratory-based research. In his words, "we must take into account not only the stimuli present at retrieval but the reason for retrieval; the theory we require will have to deal with persons, motives, and social situations . . ." (1988b, p. 553).
These remarks suggest the possibility of inherent differences in the dynamics of remembering between everyday and laboratory contexts, which may limit the generalizability of results across the two contexts (see also Conway, 1993). Thus, for example, in noting the relatively high accuracy of eyewitness memory in naturalistic field studies, Fisher, et al. (1989) observe: It is interesting that the accuracy corroboration rates in the three field studies of eyewitness memory were considerably higher than their laboratory counterparts. If this difference between laboratory and field studies continues to appear, one may question the validity of describing in court the accuracy rates found in the laboratory as evidence of the general unreliability of eyewitness testimony in field cases. (p. 4, Note 6)
The important question, however, is whether some of the sources of the differences in memory performance between naturalistic and laboratory contexts can be identified.
Although context of inquiry was not manipulated in our research, the results nevertheless suggest some of the factors which might underlie discrepancies between naturalistic and laboratory findings. The most crucial factor, of course, is the memory property being assessed, quantity versus accuracy. Because most laboratory research has concerned itself with memory quantity, the focus of everyday memory research on accuracy could give the impression that memory performance is different in naturalistic settings than in the laboratory. Indeed, as discussed earlier, this factor partly accounts for the apparent discrepancy between the established superiority of recall over recognition in eyewitness research, versus the established superiority of recognition over recall in traditional laboratory research.
Other possible contributing factors are subject control and accuracy motivation: Functional and motivational factors operating at the time of remembering may not only be more salient in real-life than in laboratory settings (Baker-Ward et al., 1993; Neisser, 1988b), but also, the assessment methods commonly employed in naturalistic research may allow a greater opportunity for these factors to exert their influence. Consider, for instance, the open-ended, free-narrative methods of eliciting information recommended for questioning witnesses. As discussed earlier, such methods offer subjects much greater control over their memory reporting than is allowed in traditional, laboratory research, and this control can have a dramatic effect on memory performance, particularly, memory accuracy. Moreover, in naturalistic situations the functional incentives for accuracy are often much stronger than in typical laboratory experiments. In concert, then, these factors could explain the sometimes remarkable recall accuracy observed in naturalistic settings (Hilgard & Loftus, 1979; Neisser, 1988b). However, these factors can also produce high levels of accuracy under typical laboratory conditions, for such a banal task as memorizing a list of unrelated words (Koriat & Goldsmith, 1994a, Experiment 2).
Is this to say, then, that there are no actual differences between memory performance in real-life and laboratory settings? On the contrary, as just indicated, and in line with Neisser's (1988b) comments, our work in fact leads us to expect marked performance differences between the various social and functional contexts of remembering (especially between naturalistic and functionally "sterile" laboratory contexts), and also helps pinpoint some of the factors contributing to such disparities: differences in memory property, report option, accuracy motivation, monitoring effectiveness, and other aspects of subject control (e.g., control over the grain size of the report). Of course, there are undoubtedly many other important variables that were not addressed in our research. The point is that only by identifying and experimentally investigating such variables can at least some of the differences in memory dynamics between naturalistic and laboratory contexts be demystified and ultimately understood.
Taken as a whole, then, the present article essentially delivers a double message regarding the context-of-inquiry issue: First, as just pointed out, many disparities may be expected between everyday and laboratory findings, but some of these can be clarified by considering the different assessment approaches and functional concerns that are characteristic of each context.
Second, however, the methodological biases prevalent in the study of everyday memory appear to reflect a more fundamental departure from the laboratory tradition, a departure in the very metaphor of memory espoused. This shift towards the correspondence metaphor is expressed in the preference for complex stimulus materials having an internal structure, in the focus on the many qualitative ways in which memory can change over time and on the processes underlying these changes, in allowing for the contribution of subject variables and subject control to memory performance, in the study of motivational and functional factors that may affect such contributions, and of course, in the memory property of interest.
In the following, final section, we return to consider the broader ramifications of the distinction between the correspondence and storehouse metaphors, both for the everyday-laboratory controversy, and for the study of memory generally.
6. Memory Metaphors and the Everyday-Laboratory Controversy
In introducing the controversy between proponents of laboratory and naturalistic memory research, we identified three dimensions of the conflict, referred to as the "what," the "how," and the "where" of memory research: The first of these concerns the content of memory study, i.e., the substantive topics deemed worthy of investigation, the second concerns the proper methodology, and the third involves the appropriate context of inquiry. We stressed that although the three dimensions are intercorrelated in the reality of current research practices, they are not logically interdependent, and therefore we sought an implicit common denominator at the metatheoretical level. We then tried to show how part of the cleavage between the traditional laboratory approach and everyday memory research might be captured by the contrast between the storehouse and correspondence metaphors, and their respective quantity-oriented and accuracy-oriented approaches to memory. In particular, we focused on the emerging correspondence metaphor, and on the unique implications of this conception for the study and assessment of memory. In concluding this article, then, we first return to examine how this conceptualization helps bind together some of the issues pertaining to the what, where, and how of memory research. We then consider the implications of this analysis for the everyday-laboratory controversy itself.
6.1. The Guiding Role of Memory Metaphors
Figure 4 sketches a rough scheme depicting some of the interrelationships we assume to exist between memory metaphors, substantive questions (what), research methodology (how), and context of inquiry (where). Each aspect is represented by a separate node, and the links between the nodes indicate mutual influences and/or constraints. In addition, more specific features of memory research may be seen to fall into one of four quadrants, representing areas of interaction between the neighboring nodes.
-------------------------------------- Insert Figure 4 about here --------------------------------------
This scheme is based on the premise that, as in other areas of scientific inquiry (see, e.g., Arbib & Hesse, 1986; Black, 1962; Hesse, 1966; Hoffman, 1980; Kuhn, 1979; Leary, 1990; Oppenheimer, 1956), conceptual metaphors play a primary role in shaping memory research (e.g., Roediger, 1980; Tulving, 1979; Watkins, 1990). Thus, we assume that a memory metaphor combines a pretheoretical point of view with the desire to capture the nature of some memory phenomena. For example, at the metatheoretical level, the storehouse metaphor embodies the empiricist view of the mind as a passive depository of discrete elementary ideas, rather than as an active agent with intentions and goals (see Brewer & Nakamura, 1984). At the same time, the metaphor responds to certain basic properties of memory, and represents an attempt to understand the intangible mental processes of learning and remembering by drawing an analogy with better-understood aspects of the physical world. Thus, the storehouse metaphor embodies the mental analogue of object permanence, the fact that objects deposited in a particular place persist and may later be retrieved. This analogy allows memory to be understood in terms of such familiar physical-spatial notions as depositing, retrieving, losing, searching, displacing, and so forth (Roediger, 1980). Note, however, that even a more abstract metaphor, such as the notion of correspondence, may supply its own useful concepts, such as schemata, "goodness of fit," and distortion.
Once adopted, a memory metaphor provides a structured framework within which memory phenomena are analyzed and explained. It can help in abstracting the critical aspects of the phenomena, in defining the substantive questions of interest, and in choosing the methods of investigation. In fact, the metaphor facilitates the development of more specific explanatory theories and models of memory by supplying the basic terms and concepts, the very language of thought in which they are cast. This shared language allows theories based on the same general metaphor to be compared and pitted against each other. When such a common metatheoretical foundation is lacking, however, comparison between theories is much more difficult, if not impossible.
Moreover, together with its associated theories and models, the metaphor will guide the choice of research methodology. For instance, as mentioned earlier, many of the traditional, list-learning procedures constitute experimental simulations of the physical process of depositing and recovering elements of information from a memory store. Here, both the experimental techniques and the measures themselves are designed to tap the amount of information retained. By contrast, we have seen that the correspondence metaphor implies very different procedures, those that may aid in understanding the factors affecting the congruence between a person's memory of an event and his or her initial perception of that event. Hence, the preference for output-bound memory measures, complex stimulus materials, subject control, and so forth.
Of course, the picture is more complicated than the discussion so far would indicate, because many of these relationships actually involve mutual constraints. Thus, there is an interplay between the metaphor and the phenomena, so that each brings into focus the most compatible aspects of the other. Although the choice of metaphor is certainly constrained by the phenomena, the metaphor itself is selective in drawing attention to those memory phenomena, processes and variables falling within its "focus of convenience" (Kelly, 1955); other topics may be left out.
Furthermore, the research methods and paradigms originally shaped by the metaphor may in turn illuminate and elaborate certain aspects of the metaphor itself. For instance, we showed earlier how the criticality of subject control in affecting accuracy-based measures stimulates a more "active" correspondence conception, leading to a focus on further, substantive issues (e.g., metacognitive contributions) that might otherwise be overlooked. On the negative side, however, when the experimental tools and paradigms are too well established, they may become functionally autonomous (Allport, 1961), restricting the range of research topics, and constituting a target of study in their own right. In that case, many of the questions asked will be dictated more by the nature of the experimental tools themselves, than by any direct relevance to naturally-occurring memory phenomena (cf. Conway, 1993; Tulving, 1979).
With regard to the context dimension, the foregoing discussion implies a distinction between two separate aspects: the context of the phenomena, and the research setting. As noted above, an important advantage of adopting a conceptual metaphor is that it can serve as a stepping stone leading from concrete natural phenomena to their conceptualization within a more abstract theoretical framework. Thus, although a metaphor may sometimes induce an alienating remoteness of the theories and methods from the original context of the phenomena, it can also encourage a healthy detachment. In fact, such detachment has enabled the great bulk of traditional, storehouse-guided research to be conducted in the laboratory, taking advantage of increased experimental control. In the absence of a sufficiently articulate metaphor, it may be more difficult to develop general theories and standard experimental procedures, and the methods of investigation may tend to be more closely tied to the concrete phenomena themselves, and to their natural contexts.
When viewed from this perspective, the study of everyday memory in natural settings may perhaps be characterized as "phenomena in search of a metaphor," much as the storehouse tradition might be regarded as "a metaphor in search of phenomena." Although the revolt against the storehouse metaphor has enabled everyday memory research to address many new and important memory topics, at the same time, such research seems to suffer from the lack of guidance that a well-articulated alternativ