Below is the unedited preprint (not a quotable final draft) of:
Shanks, D.R. & St. John, M.F. (1994). Characteristics of
dissociable human learning systems.
Behavioral and Brain Sciences 17 (3): 367-447.
The final published draft of the target article, commentaries and
Author's Response are currently available only in paper.
1. Introduction
A classic issue faced by researchers attempting to understand the fundamental laws of learning is whether there is more than one basic learning mechanism. Can all of the phenomena of learning be accommodated by a unitary mechanism, or do we need to posit the existence of independent and dissociable human learning systems? In this article, we consider some of the experimental evidence -- much of it very recent -- that has addressed this issue.
We will focus in particular on two dimensions on which it has been suggested that functionally distinct learning systems differ. The first concerns the role of awareness during learning. Many authors (e.g., Hayes & Broadbent 1988; Lewicki, Czyzewska, & Hoffman 1987; Reber 1989) have argued that in addition to a learning system whose functioning is accompanied by concurrent awareness of what is being learned, humans possess a quite separate system that operates independently of awareness. The second dimension, which turns out to be closely related, is concerned with the content of learning. Specifically, it refers to the idea that distinct learning systems encode very different sorts of information, one system inducing rules (e.g., Lea & Simon 1979; Nosofsky, Clark, & Shin 1989), while a second system memorizes instances (e.g., Brooks 1978; Medin & Shaffer 1978).
We believe that an evaluation of the current evidence for and against the multiple-systems view is important at the present time for at least two reasons. First, each of the separate systems that has been hypothesized has tended to encourage researchers to develop a set of explanatory constructs which are unique to that system and which allow the main phenomena of the domain to be explained. But a drawback is that it is not uncommon to find experimental results interpreted just in terms of these restricted concepts, without consideration of whether they might also be understood (and possibly understood better) in terms of more general principles.
The second and perhaps more pressing reason for evaluating the evidence for dissociable learning systems is that there has in the last few years been an extraordinary amount of interest in the question of whether there are dissociable memory systems (for reviews, see Richardson-Klavehn & Bjork 1988; Schacter 1987; 1989; Squire 1992). The mounting evidence for this view comes from a variety of sources. For instance, amnesic patients have been shown to be dramatically impaired on certain direct tests of memory, such as free recall, but less impaired or even unimpaired on indirect tests of memory such as motor-skill tasks (see Squire 1992). Although dissociations between performance on direct and indirect tests do not force the conclusion of dissociable memory systems (e.g., Jacoby & Kelley 1991; Roediger 1990), some researchers have argued at length that the experimental results, together with current understanding of brain functioning, strongly encourage the conclusion of separable underlying systems (e.g., Schacter 1989; Squire 1992).
Few would argue that learning and memory can be studied in isolation from one another. On the contrary, consideration of the possible characteristics of dissociable learning systems should inform research on the issue of dissociable memory systems, and vice versa. Indeed, if there really are dissociable memory systems, it would seem highly likely that there are dissociable learning systems that supply them with information. Yet as several authors have noted (e.g., Berry & Dienes 1991; Reber 1989), research in the areas of learning and memory has tended to proceed independently. We hope that by examining carefully the question of whether there are distinct learning systems, we may be able to assist memory researchers in their attempts to understand the processes of information storage and retrieval, by setting out the characteristics of the learning mechanisms determining the acquisition of that information.
1.1. Proposed distinctions between types of learning
Distinctions between different types of learning have been common in psychology for many years. One such distinction is between declarative and procedural learning, where declarative learning refers to the acquisition of factual knowledge and procedural learning to the acquisition of skills (e.g., Cohen & Squire 1980; Morris 1984; Winograd 1975). Other distinctions include the acquisition of "habits" versus "memories" (Mishkin, Malamut, & Bachevalier 1984) and "taxon" versus "locale" learning (O'Keefe & Nadel 1978). Of course, if independent memory systems require independent learning mechanisms, then many more distinctions might be needed. For instance, we might require separate learning systems to feed semantic and episodic memory stores (Neely 1989; Tulving 1983).
Of these distinctions, the one between declarative and procedural learning has probably attracted the most attention, with a variety of empirical phenomena being interpreted within that framework. For example, Cohen and Squire (1980) suggested that amnesics have normal or near-normal procedural learning but impaired declarative learning, a theoretical notion that has been widely taken up by other researchers in the amnesia field. However, this distinction has in recent years been largely eclipsed by the alternative distinction between "explicit" and "implicit" learning. (Note that some authors have replaced the original declarative/procedural distinction with the terms "declarative" and "nondeclarative" [e.g., Shimamura & Squire 1989; Squire 1992]). The main reason for the shift in terminology and emphasis towards the terms "explicit" and "implicit" is dissatisfaction with the original terminology, the term "procedural" apparently being too narrow to encompass the relevant learning effects. For instance, the learning that is preserved in amnesia is not always of a "procedural" nature: It includes a variety of priming effects involving, for instance, the ability to complete word-stems (Graf, Squire, & Mandler 1984) and the increase in the likelihood of judging a non-famous name famous as a result of prior exposure (e.g., Squire & McKee 1992).
The term "implicit learning" was first coined by Reber (1967), who is responsible for much of the recent interest in the issue of distinct learning systems (see Reber 1989 for a review). While different authors have used a variety of different definitions to capture the fine detail of the explicit/implicit learning distinction (see Mathews, Buss, Stanley, Blanchard-Fields, Cho, & Druhan 1989 for examples), the key factor is the idea that implicit learning occurs without concurrent awareness of what is being learned, and represents a separate system from that which operates in more typical learning situations, where learning does proceed with concurrent awareness (i.e., explicitly). At the same time, it is clear that many authors have been concerned with the possibility that different learning tasks might give rise to different sorts of knowledge (e.g., Mathews et al. 1989; Reber 1989; Vokey & Brooks 1992), where one sort is abstract or rule-based, and the other is based on separate fragments or instances. For Reber (1989), implicit learning is not only unconscious but also involves the acquisition of abstract information.
The paradigm case is language learning, where people are assumed to be able implicitly to learn abstract grammatical rules. Few non-linguists are aware of or able to articulate the grammatical rules supposed to underlie their linguistic performance, and so it makes sense to imagine that those rules are acquired, if at all, without ever being directly represented in consciousness. The rules are abstract in the sense that they apply equally to any linguistic tokens, including novel ones, that come from the appropriate syntactic categories.
Because the aware/unaware and rules/instances dimensions are logically distinct, we believe that they must be treated independently, and accordingly, in this article we review evidence for these two dimensions separately. In what follows, we will reserve the term "unconscious learning" for learning without awareness, regardless of what sort of knowledge is being acquired. At the same time, we will use the terms "rule learning" and "instance learning" to refer to the acquisition of abstract and fragmentary knowledge, respectively, regardless of whether such learning is conscious or not.
The organization of the article is as follows. The major part of the article asks whether unconscious learning is really supported by empirical evidence. In section 2, we survey a wide range of learning paradigms, from subliminal learning phenomena to Pavlovian conditioning to artificial grammar learning and serial reaction time tasks. The stimuli and specific processes involved in performing and learning each of these tasks differ widely, and they may share some fundamental characteristics or they may exhibit some fundamental differences. We find, across these diverse paradigms, little actual support in any of them for unconscious rule induction (i.e., for implicit learning), or for the unconscious learning of any other type of information. However, in section 3 we do find evidence for a dissociation between a rule induction system and an instance memorization system, and we review evidence for this dissociation obtained in explicit or conscious learning tasks. Within each system, the range of different processes and information is still large, but they nevertheless seem to form two distinct types: slow, effortful hypothesis testing on the one hand, and fast, efficient memorization of instances and fragments of instances on the other.
We concentrate throughout this article on data from normal subjects. However, it is clear that amnesic patients have learning difficulties, and these difficulties have been widely interpreted within the explicit/implicit framework (e.g., Squire 1992). For our present purposes, the data from such subjects is tangential because the question of awareness during learning has not particularly been considered in amnesics (but see Knopman 1991). In section 4 we briefly comment on the interpretation of learning data from this population of subjects.
2. Can learning occur without awareness?
Proponents of the implicit/explicit distinction have argued that there are clear demonstrations of subjects' ability to encode new information without being aware of that information, and hence that awareness is the key dimension on which separable learning systems differ. In fact, the question of whether learning can occur without awareness goes back many decades (e.g. Adams 1957; Dulany 1961; Eriksen 1960; Krasner 1958; Thorndike & Rock 1934). In addition to the recent work of Reber, which we consider below, there have in the last 5 or 6 years been a large number of sequence learning reaction time studies which have adopted an interesting and novel technique for assessing the relationship between awareness and learning. A substantial part of our review concerns results obtained using this task. We also consider evidence from a variety of conditioning procedures. We begin with some comments on experimental methodology.
2.1. The Logic of Dissociations
Almost all studies of unconscious learning have adopted a very constrained version of the logic of dissociation. Using separate indices of learning and awareness, one attempts to find circumstances in which exposure to some set of stimuli leads to detectable learning unaccompanied by any reliable degree of awareness. On the face of it, such an approach may lead to unequivocal evidence of unconscious learning, but researchers using similar logic to try to establish the existence of unconscious perception have noted several problems (e.g., Reingold & Merikle 1988). What counts as a suitable test of awareness? Can we discount the possibility that our index of awareness is contaminated by unconscious information? Can we be sure it is sufficiently sensitive to exhaustively detect all conscious information? As we shall see, these are deep problems, and researchers have adopted a variety of strategies to try to circumvent them.
Rather than relying on this particular dissociation paradigm, it is possible that firmer evidence for unconscious learning may emerge from experiments adopting alternative methodologies. In the unconscious perception field, for instance, Reingold and Merikle (1988) have proposed an interesting and novel procedure in which one looks for greater sensitivity to some variable in an indirect test, in which instructions make no reference to the variable, than in an otherwise identical direct test in which the instructions do refer to the variable. Alternatively, one could try to demonstrate the independence of two learning systems by trying to establish qualitative differences between them (e.g., Merikle & Reingold 1992), such that for example one system is affected in one way by a variable, while the other is affected in the opposite way.
We know of only one study that has even come close to establishing such qualitative differences, and this study is therefore worth considering in some detail. Hayes and Broadbent (1988) began by postulating two independent systems: An unconscious system that would slowly accumulate information about predictive events in the environment, and a conscious system that would test hypotheses. Hayes and Broadbent further assumed that the conscious system would be highly dependent on a limited-capacity working memory system, while the unconscious system would be independent of any such limited-capacity memory system.
A rather straightforward prediction emerges from this plausible model of the cognitive system. Since the conscious learning mechanism relies on working memory, there should be situations where learning is profoundly affected by loading the working memory system with a secondary task such as generating random numbers. At the same time, since the unconscious system is not dependent on working memory, other (implicit) learning tasks should be unaffected by such a secondary task. Indeed, Hayes and Broadbent went so far as to say that unconscious learning might even be facilitated by a secondary task if it prevented the conscious system from exerting an interfering influence on the unconscious system. The importance of the Hayes and Broadbent (1988) study is that, in accordance with their model, they appeared to have found two learning tasks which differed in only a minor way, but for which one was inhibited by a secondary task and the other was facilitated.
In their experiments Hayes and Broadbent contrasted performance in two versions of the computer "person" task. On each trial, the subject entered an attitude (e.g., polite) to the computer, which then responded with its attitude (e.g., unfriendly). The subject's task was to try to get the computer to be friendly. If we designate the 12 possible attitudes -- going from very unfriendly to loving -- with the numbers 1...12 then the computer's attitude on each trial was a simple numerical function of the subject's input. In one (No-Lag) condition, the computer's attitude (Ot) on each trial was a function of the subject's attitude (It) on the same trial:
Ot = It - 2 + r (1)
where r is a random number (-1, 0, or 1) and the attitudes have the 12 numerical values mentioned above. In the other (Lag) condition, It was replaced by It-1, so that the computer's attitude was determined by the subject's attitude on the preceding trial:
Ot = It-1 - 2 + r (2)
Performance was measured in terms of the number of trials in which the subject's input was one that could (given the random element) have produced a friendly response from the computer person. While learning occurred in both groups, Hayes and Broadbent found that subjects could give highly accurate verbal reports about the No-Lag task, indicating that their learning had been accompanied by awareness, whereas the verbal reports of subjects in the Lag version were very poor. This result encourages the view that learning in the No-Lag task can be readily achieved by the explicit system, but that the Lag task requires the implicit system. Thus we might predict that a concurrent secondary task would have an effect on learning in the No-Lag condition but not the Lag condition.
To test this, Hayes and Broadbent (1988) gave subjects a block of learning trials using either Equation (1) (No-Lag group) or Equation (2) (Lag group). After 30 trials in the No-Lag condition and 50 in the Lag condition, performance was approximately equated, and at this point Hayes and Broadbent changed the rules by replacing the -2 in the equations with +2. They then presented a further 30 (No-Lag group) or 50 (Lag group) re-learning trials. Under single-task conditions (Hayes and Broadbent 1988, Experiment 1), performance in the Lag condition was affected more detrimentally than performance in the No-Lag condition by this rule-change. In contrast, when subjects were required to perform a concurrent secondary task (generating random letters or digits; Experiments 2 and 3), a change in the rule interfered more with performance in the No-Lag than in the Lag task, exactly the opposite of the result obtained when there was no secondary task. The results conform to Hayes and Broadbent's theory -- and hence to their conception of separate implicit and explicit learning systems -- if we simply assume that the secondary task occupied the conscious working memory system and therefore interfered with the explicit system, while removal of the working memory system meant that the implicit system could operate without any interfering influence from the explicit system.
Unfortunately, Green and Shanks (1993) were unable to replicate Hayes and Broadbent's results. In the single-task groups, Green and Shanks found that the introduction of the equation change had similar effects on performance in the No-Lag and Lag groups, thus failing to replicate Hayes and Broadbent's (1988, Experiment 1) finding that performance was more detrimentally affected in the Lag condition. Under dual-task conditions the situation was the same: Performance was approximately equally affected in the two groups. There was not the slightest hint that performance in the Lag group was less affected by the equation change, and hence Hayes and Broadbent's (1988, Experiments 2 and 3) dual-task results were not replicated. Green and Shanks (1993) suggest that Hayes and Broadbent may have obtained the results they did as a result of the inappropriate inclusion of subjects who had learned very little prior to the equation change.
Hayes and Broadbent's dissociation posed a genuine problem for theories of learning relying on a single learning mechanism. Because the secondary task appeared to have opposite effects on the two tasks, Hayes and Broadbent appeared to have data that genuinely supported the claim that there exist dissociable learning systems. Obviously, the fact that their results could not be replicated undermines those conclusions.
With the exception of this study, implicit learning experiments have universally adopted the dissociation logic of attempting to demonstrate learning in the absence of any detectable degree of awareness. As we shall see, various methodological problems with the dissociation procedure make it doubtful whether unconscious learning has yet been established. However, it is worth bearing in mind that future experiments using alternative methods may license stronger inferences concerning the dissociability of learning systems. We now begin our discussion of the empirical evidence.
2.2. Unconscious learning with subliminal stimuli
Most studies of unconscious learning have asked whether people can learn about relationships between stimuli without being aware of those relationships, but before discussing the results of such studies, we will briefly consider evidence from experiments asking a more direct question: Can people learn about stimuli when they are unaware of the existence of these stimuli, that is, when the stimuli are subliminal? One situation in which unconscious learning would, on the face of it, be fairly straightforward to establish is one in which a subject is entirely unaware that the critical stimulus in the learning phase is present at all, yet still shows evidence of learning something about that stimulus.
There have, of course, been a huge number of experiments in which subjects are presented brief or low-intensity stimuli intended to be below the threshold of awareness and which attempt to measure effects of such stimulation on subsequent behavior. We ignore much of this literature for two reasons. First, in some cases such effects may be only tenuously related to learning. For instance, many subliminal activation experiments ask whether the way in which a stimulus is interpreted may be biased by a supposedly subliminal stimulus presented a few hundred milliseconds previously (e.g., Marcel 1983). However, it is doubtful that such biasing effects would occur over longer intervals: Instead, they are typically interpreted as examples of some sort of short-lived facilitation. Needless to say, it is difficult to draw a sharp line between perception and learning, but if unconscious learning is to have any real significance, it must be demonstrable over reasonable intervals of time (at the very least, seconds or minutes rather than milliseconds). Secondly, many subliminal activation experiments which do appear to show longer-lasting effects (e.g., Eich 1984) have already been the subject of extensive criticism in this journal (see Holender 1986, and accompanying commentaries). We have no wish to repeat those arguments except to point out that it is extremely difficult to be confident in such experiments that the stimuli really are below the threshold of conscious perception.
Accordingly, we focus in this section on studies which avoid these problems. Andrade (in press), Bornstein (1992), Ghoneim and Block (1992), Greenwald (1992), and Schacter (1987) review a number of relevant studies examining learning with subliminal stimuli. While there have been some positive results, a corresponding number of negative findings leads us to suggest that unconscious learning with subliminal stimuli has not yet been conclusively demonstrated.
Subliminal stimuli may be presented to awake subjects as auditory messages at extremely low intensity or in some scrambled form, or as images presented for very brief durations or embedded in other figures; alternatively, they may be presented to subjects while asleep or anesthetized. There is a widespread belief amongst the public in the ability of such subliminal messages to condition attitudes or preferences, or to otherwise influence behavior. Indeed, the belief is so powerful that the families of two young men who died from self-inflicted gunshot wounds sought more than 6 million dollars in damages from the rock group Judas Priest on the grounds that subliminal messages on one of the group's records had caused the men to commit suicide (see Loftus & Klinger 1992). However, recent investigations suggest that the concern is misplaced. Controlled experiments attempting to see whether subliminal messages can influence behavior or whether people can use self-help audiotapes as learning aids have yielded exclusively negative results (British Psychological Society 1992; Greenwald, Spangenberg, Pratkanis, & Eskenazi 1991; Vokey & Read 1985). It seems unlikely that unconscious learning can occur in such conditions.
Several investigations of spared cognitive functions under general anesthesia have obtained evidence of small but reliable amounts of learning, but these are matched by a comparable number of negative results (see Andrade in press, and Ghoneim & Block 1992, for reviews). If the anesthetic has been adequately administered and renders the patient entirely unconscious, then spared learning must in turn be unconscious. A typical positive result was reported by Jelicic, Bonke, Wolters, and Phaf (1992). They gave anesthetized patients repeated auditory presentations of two words (e.g., yellow, green) from a semantic category. Later, when the anesthetic had worn off, subjects were asked in a priming test to generate members of those categories. Subjects were significantly more likely to produce the pre-exposed words than were control subjects who had not been read the words during anesthesia. Thus some information does seem to have been encoded while the subjects were unconscious.
Another positive result was reported by Kihlstrom, Schacter, Cork, Hurt, and Behr (1990). They gave anesthetized patients lists of strongly associated cue-target word pairs, with each list being presented about 67 times during the operation. Later, when the anesthetic had worn off, subjects were given a cued recall and a recognition memory test, while in a third test, they were read the cue words and had to say the first word that came to mind. Although the recall and recognition tests yielded no evidence of retention, subjects were more likely on the generation test to produce target items to pre-exposed cue words than to non-preexposed cue words, whether the test was relatively soon after the exposure phase (median 87 min) or was much later (median 14 days). Thus, again, some degree of unconscious registration seems to have occurred.
Against this are the many negative results that have been published. Some of these results are particularly revealing because they come from experiments using procedures very similar to those of studies that have found positive results. For instance, Cork, Kihlstrom, and Schacter (1992) failed to replicate the Kihlstrom et al. (1990) results using a different anesthetic but with otherwise identical procedures. Furthermore, despite the likelihood that sleep renders a person less unconscious than general anesthetic, in a well- controlled experiment Wood, Bootzin, Kihlstrom, and Schacter (1992) were unable to obtain evidence of learning during sleep, with procedures again similar to those used in the Kihlstrom et al. (1990) study. Similarly, Ghoneim, Block, and Fowles (1992) found no evidence of Pavlovian conditioning in anesthetized patients, using experimental procedures that did reveal conditioning in non- anesthetized subjects.
Of course, this pattern of results might simply indicate that learning under anesthesia is a genuine phenomenon, but that relatively subtle methodological factors determine whether a given study will or will not obtain evidence of it. However, Andrade (in press) discusses a large number of studies, including over 20 published reports of failures, and is unable to find any clear factors that determine whether learning will or will not occur. For instance, it does not seem to be especially related to the type of stimuli used. More significantly, it remains an open possibility that many positive results have been due to inadequately administered anesthetic that left some or all of the patients at least partially conscious. It is worth noting that in the Cork et al. study 3 subjects were excluded from the analysis because they had explicit memory of the study items! As Cork et al. (1992) say, "...the extent to which implicit expressions of memory are affected by general anesthesia remains uncertain" (p. 897).
Conclusions. Experiments in which subjects are presented with stimuli of which they are likely to be unaware at the time of exposure yield some evidence of unconscious learning, but this is offset by a substantial body of negative evidence. At present, it would be premature to conclude from the available studies that unconscious learning is feasible.
2.3. Criteria for establishing unconscious learning with supraliminal stimuli
In the rest of this section we focus on situations where the stimuli are above the threshold for detection and identification. In such situations subjects may be unaware of the relationships between stimuli even though they are aware of the stimuli themselves. Learning of inter-stimulus relationships may therefore be unconscious.
We will argue that essentially all unconscious learning experiments with supraliminal stimuli can be conceptually reduced to the arrangement shown in Figure 1. The figure illustrates an associative learning episode in which subjects have the opportunity to learn that two events, A and B, stand in a predictive relationship. Event A might be a tone conditioned stimulus (CS) and event B a shock unconditioned stimulus (US), and the measure of learning might be a galvanic skin response (GSR) at time t2 when the CS is presented again. Or event A might be a feature or set of features, event B might be a category, and the measure of learning might be the probability of making the category response at t2. We are interested in whether subjects can learn the predictive relationship in the absence of concurrent awareness of that relationship. We assume for the sake of simplicity that there is just one learning trial.
(insert Figure 1 about here)
Learning itself presumably takes place during and/or after presentation of event B, and we wish to ascertain the subject's state of awareness during this learning episode. Unfortunately, there are likely to be profound technical difficulties involved in assessing awareness of a predictive relationship at just the moment that learning itself occurs. Apart from anything else, asking the subject at time t1 whether he or she is aware of the relationship between stimuli A and B is likely to direct his or her attention to that relationship. As an illustration, in a study by Baeyens, Eelen, and van den Bergh (1990) that will be discussed in more detail later, the proportion of A-B relationships of which the subjects appeared to be aware on a post-conditioning recognition test increased from 18% to 77% when subjects also gave concurrent estimates of awareness during the learning stage. Clearly, the concurrent index of awareness directed subjects' attention to the relationship and affected the very entity it was designed to measure.
Hence, we will usually have to settle for assessing awareness after the target learning trial. At this time (t2 in Figure 1), suppose we present event A (a tone previously paired with shock) and both measure the GSR and also ask the subject whether he or she has any particular expectancy of event B. If we obtain a GSR but no evidence of a conscious expectancy of event B, then we have obtained the crucial finding that lies at the heart of all attempts to demonstrate implicit learning with supraliminal stimuli. For, if the subject has no expectancy of event B at t2, we have some basis for inferring that he or she was not aware of the A-B relationship at t1.
While this might seem like a very strong inference, we believe that such inferences will inevitably have to be accepted if unconscious learning is to be established. It is simply very difficult to assess awareness concurrently with learning, and so one is forced to rely on some later test. Of course, we also make a backward inference concerning learning itself: If performance at time t2 is no better than we would expect by chance, we often infer that learning did not occur at t1. Conversely, if performance is better at t2 than we would expect by chance, then we conclude that learning did occur.
2.3.1. The relationship between unconscious learning and implicit memory. The basic design shown in Figure 1 allows us to see the intimate relationship between unconscious learning and implicit retrieval: Demonstrations of unconscious learning are a proper subset of the larger set of demonstrations of implicit retrieval.
Implicit retrieval is defined as the ability of information from some prior episode to be retrieved and hence to influence current processing, but in the absence of conscious recollection of that prior episode (e.g., Schacter 1987; we use the term "implicit retrieval" rather than the more common term "implicit memory" to emphasize that we are specifically considering what happens during the retrieval process). Thus implicit retrieval requires the absence of a conscious re-experience of the study episode. Now, lack of awareness of a contingency at t2 presumably means the absence of any consciously-recallable episodic memory traces in which that contingency is embedded, and hence any piece of evidence that allows us to infer unconscious learning must also be an example of implicit retrieval: This is case (iii) shown in Figure 1. However, the converse does not hold; an example of implicit retrieval does not necessarily represent evidence of unconscious learning.
Suppose that a subject emits a GSR when presented at test with a tone stimulus. There are three possible scenarios, shown in Figure 1:
(i) The subject remembers the study episode, in which case the GSR response does not count as an example of implicit retrieval according to Schacter's (1987) definition. Because remembering the episode entails remembering the content of that episode (i.e., the A-B contingency), the learning could not have been implicit either.
(ii) The subject does not remember the study episode, but is aware -- that is, has semantic knowledge -- that this tone predicts shock (cf. source amnesia). Although this qualifies as a case of implicit retrieval, we would not infer that learning itself had been unconscious, since the subject at t2 is aware that A predicts B. (Note that this ignores the possibility that the subject could have been unaware of the A-B relationship at t1, but aware of it at t2 as a result for instance of observing his or her own behavior. Observation of a GSR in response to the tone might lead the subject to believe that the tone must therefore predict shock. Quite how one might exclude this possibility is a difficult problem).
(iii) The subject neither remembers the study episode nor has conscious semantic knowledge of the A-B relationship. This final case again qualifies as implicit retrieval. More importantly, we now have evidence that is relevant to unconscious learning, since lack of awareness of the relationship at retrieval licenses the inference that learning too took place without awareness.
Thus in order for us to infer unconscious learning from implicit retrieval, it is necessary that the subject be unaware of the relevant relationship that occurred in the study episode, as well as unaware of the episode itself. In summary, an unconscious learning experiment just is an implicit retrieval experiment, but with the added component of meeting this further condition. For researchers in the field of implicit retrieval, all that is of interest is whether the subject is unaware of the relevant study episode as in cases (ii) and (iii). But it is only case (iii) that is relevant to the question of unconscious learning; the subject must also be unaware of the relationship that occurred in that episode. It is for this reason, we argue, that much of the data obtained from amnesics is irrelevant to the question of unconscious learning (see section 4).
2.3.2. Dissociation of task performance and verbal reports. Within the dissociation paradigm (Reingold & Merikle 1988), many studies have shown that subjects can acquire information without being able at a later time (t2) to verbally report it, and have used such findings in support of the claim of unconscious learning. Suppose that subjects are presented with some information at time t1, and that a subsequent performance test indicates that they have encoded this information. We will argue that if the aim is to establish what the subjects' state of awareness was at t1, examining the content of their verbal reports at t2 is certainly not the only -- and may not be the best -- way to do this.
To illustrate this, note that the condition mentioned above (that the backwards inference must be valid) can be made more specific by dividing it into two further criteria. The first concerns the match between the information that is responsible for performance changes and the information that is revealed by the test of awareness. We call this the Information Criterion. The second criterion concerns the sensitivity of the test for awareness. We call this the Sensitivity Criterion.
Information Criterion: Before concluding that subjects are unaware of the information they have learned and which is influencing their behavior, the experimenter must be able to establish that the information he or she is looking for in the awareness test is indeed the information responsible for performance changes.
This criterion is intended to exclude situations like the following. Suppose the experimenter sets up a task in which performance can be improved if the subjects learn information I. Performance does indeed improve, and subjects are apparently unaware at time t2 that they have learned I. However, an adequate explanation of the improvement in performance is that subjects are not learning I, but instead I*. By the experimenter's criteria, awareness of I* would be disregarded as irrelevant, and so the experimenter would erroneously conclude that the subjects' performance was under the control of some information or knowledge of which they were unaware. The Information Criterion is closely related to the notion of "correlated hypotheses" introduced by Adams (1957) and Dulany (1961) and which will be discussed in section 2.6.1.
Our second criterion is far from new (e.g., Brewer 1974; Brody 1989; Dawson & Schell 1985; Ericsson & Simon 1980; 1984; Eriksen 1960; Reingold & Merikle 1988). It is simply that tests of unaware learning must achieve an adequate level of sensitivity:
Sensitivity Criterion: In order to show that two dependent variables (in this case, tests of conscious knowledge and task performance) relate to dissociable underlying systems, we must be able to show that our test of awareness is sensitive to all of the relevant conscious knowledge.
Unless this criterion is met, the fact that subjects are able to transmit more information in their task performance than in a test of awareness may simply be due to the greater sensitivity of the performance test to whatever conscious information the subject has encoded. Let us take as our null hypothesis the claim that there is a single source of conscious knowledge that can manifest itself both on the performance and on the awareness test. If performance is above chance, but there is no detectable awareness, then an immediate inference is that our test of awareness is simply less sensitive than the performance test to the available resource of conscious information. Or, to put it another way, there is conscious knowledge which is not being detected by the supposed test of awareness but is contributing to task performance.
To rule out this possibility, we must have either (i) some independent reason to believe that the test of awareness is sensitive to all of the potentially-relevant conscious information, or (ii) some reason to believe that the awareness test is at least as sensitive as the performance test in terms of its ability to detect relevant conscious information. The first of these requires proving that the awareness test is exhaustive, something that Reingold and Merikle (1988) have noted is likely to be very difficult to establish. In contrast, the second can be met if we try to make the performance and awareness tests as similar as possible in terms of retrieval context, but differ merely in terms of task instructions. If the instructions in the awareness test encourage the subject to retrieve as much conscious information as possible, and if the retrieval contexts in the two tests are approximately matched, then the Sensitivity Criterion may be met since it is unlikely that the performance test would retrieve more conscious information than the awareness test when the latter has provided subjects with a stronger motivation to do so. If we still obtain a dissociation between performance and awareness under such circumstances, then we will have good evidence of unconscious learning1.
As an illustration of the application of these criteria, consider a widely-cited implicit learning study by Lewicki, Czyzewska, and Hoffman (1987). In the first phase, each trial consisted of the presentation of a target item in one of the four quadrants of a computer screen and the subjects' task was simply to press a button corresponding to that quadrant as quickly as possible. If we designate the quadrants as A, B, C, and D, then the basic idea of these experiments can be simply stated: The quadrant the target appeared in on each trial was nonrandom, and the question is whether the subjects are able to detect this nonrandomness.
Subjects were presented with sequences of 7 trials, with rules constructed so that target location on the 7th trial could be predicted from its locations on trials 1, 3, 4, and 6. On each of the first 6 trials, the digit 6 appeared on its own in one of the quadrants of the screen, but on trial 7 (the "complex" trial), it was embedded in a display containing 36 digits. Reaction time on the 7th trial was the measure of interest. Again, the rules specifying target location were deterministic: Thus, if the target appeared in locations C, A, D, and B on trials 1, 3, 4, and 6 respectively, then on trial 7 the target would be in location 1.
In common with many other such results (which will be reviewed in section 2.7 below), Lewicki et al. (1987, Experiment 1) found that RTs on the target trials decreased significantly across 4,608 complex trials. In addition, RTs increased significantly when, towards the end of the experiment, the rules were changed so that the target now appeared in the diagonally opposite quadrant on the complex trials from where it had appeared previously. This latter finding rules out nonspecific factors as the locus of the speed-up effect. In a second experiment, Lewicki et al. applied deterministic rules only on 2 out of 3 sets of 7 trials; on the remaining sets, target location on trial 7 was random. Here, a change in the rules only affected RTs in the sets which were rule-determined, and not in those that were not.
Lewicki et al. found that none of their subjects came even close to being able to report any of the rules. In fact "none of the subjects were even able to correctly specify which four out of six simple trials were the crucial ones" (p. 529). Thus we appear to have good evidence of a dissociation between performance and reports. However, it is highly doubtful whether these results meet either the Information or the Sensitivity Criteria. With regard to the former, Lewicki et al. required subjects to try to report "at least one pair of co-occurring elements (i.e., a sequence of four target locations in simple trials and the corresponding location of the target in the subsequent matrix-scanning [complex] trial)" (p. 528). Thus a subject would only have been classified as able to report something about the sequence, and hence aware, if they were able to specify a complete sequence of 4 simple trials and 1 complex trial. But the problem with this classification is that to show a speed-up in RT, complete knowledge of the sequences was not necessary.
For instance, analysis of the sequences shows that even the last simple trial on its own was informative about target location on the 7th trial: If the target was in quadrant A on trial 6, it was twice as likely to be in quadrants A and D on trial 7 as in quadrants B or C. Trial 6 provides a great deal of information on its own about target location on trial 7. Knowledge about trials 4 and 6 provides yet more information about target location on trial 7, but if the subjects could report this sort of regularity, it would still not have counted as correct according to Lewicki et al.'s criterion. It is true that knowledge of the sequence across the 4 relevant simple trials provides absolute certainty about the 7th trial, but our point is that considerable amounts of speed-up in RT could be attributable to fragmentary knowledge of "micro-rules" that Lewicki et al. would not have counted as evidence of awareness, even if the subjects could articulate them.
Turning to the Sensitivity Criterion, we may ask whether the verbal report test is an adequate measure of the subject's awareness in this procedure. We suggest that it is not. First, we cannot be sure that the performance and awareness tests are matched in terms of the conscious information they pick up, because quite different retrieval contexts are provided for the two tests. In the case of RTs, performance is elicited in a context where (i) stimuli are presented on the computer screen, (ii) responses are made on the keyboard, (iii) a horizontal and a vertical line appear on the screen dividing it into quadrants, (iv) a response is made very shortly after the preceding response, and so on. All of these cues are pertinent, in that they were present during the learning phase (which just is the RT task). In the case of verbal report, none of these cues is present. Instead, the subject is required to retrieve the sequence rules from memory, without the aid of any of the aforementioned cues.
Secondly, we have little reason to believe that the test of verbal report provides an exhaustive index of conscious information, since there are other tests such as recognition which manifestly detect information left undetected by verbal report tests. For instance, Nelson (1978) compared the sensitivity of recognition and verbal recall in the following way. Suppose we have two memory tests A and B. Subjects learn a list of items and then are given test A. Then, test B is applied to only those study items which test A failed to detect. If test B detects any of these items, it is said to be a more sensitive test than test A. It is important to also apply the tests in the reverse order, test B then test A, and fail to observe an increase in sensitivity. Using such a procedure, Nelson (1978) showed that recognition tests can detect items not detected by free recall tests, but the converse was not true. Hence recognition is a more sensitive test than free recall, and the latter is therefore not exhaustive.
Moreover, note that it is possible that subjects might misinterpret free report questions to mean that they should only report rules. They might believe that fragmentary information is not supposed to be reported. Many researchers have attempted to avoid this problem by asking more and more specific questions about what stimuli may begin or end a sequence, and so on. Such questions are somewhat better from a sensitivity standpoint because they are more specific (and provide more cues), and may be better from an informational standpoint if they ask about the information that subjects actually learn.
In sum, we suggest that the Information and Sensitivity Criteria are not met in Lewicki et al.'s experiment. The default hypothesis -- that there is only a single resource of conscious information -- may be correct, with less of that knowledge being detected by the verbal report test than by the RT task. There is no evidence that the knowledge used to perform the RT task is any different or was in any way acquired independently of the knowledge that the subjects' reports are based on. Verbal reports are impoverished compared to task performance simply because less of the available information is retrieved in the test of reportable knowledge. If the subject were given enough retrieval cues, there is every reason to believe that this knowledge could be brought to consciousness and reported; it is simply that a normal test of verbal report does not do this. Lastly, if sufficient cues could make the information conscious, then there is every reason to believe that it was conscious at the time of encoding.
It is important to note that we are not denying the empirical fact that performance and verbal reports can be dissociated. On the contrary, we acknowledge that there have been numerous satisfactory demonstrations of this (for instance, in Lewicki et al.'s experiment), and that this has interesting implications for applied psychology. Subjects' performance indicates that they have learned something, yet they are poor at verbally articulating what they have learned. Instead, we are suggesting that this dissociation is only very weak evidence for the claim that the original learning was unconscious, and that it provides no evidence at all for the functional dissociation of conscious and unconscious learning. Its status is exactly the same as the difference that commonly emerges between tests of recall and recognition. For the same reason, amnesic patients' inability to recall information which an earlier test shows they had learned (e.g., Nissen & Bullemer 1987) is not evidence in its own right of unconscious learning. Since we are claiming that a dissociation between performance and verbal report is not compelling evidence for unaware learning, we place special weight below on studies that have tried to use more sensitive tests of awareness.
It is also important to recognize that our criteria do not make unconscious learning unprovable. As Bowers (1984) has noted, it is pointless to argue about a possible unconscious process if one's criteria for its existence make it a logical impossibility. But the Information Criterion can readily be met in any study that establishes unequivocally what it is that the subject is learning, and the Sensitivity Criterion can be met by tests that adequately reinstate the learning context or that attempt to be exhaustive with respect to conscious information. Indeed, we will see below in section 2.7 that a replication of Lewicki et al.'s experiment by Stadler (1989) met both of these criteria by using an alternative test of awareness. Furthermore, successful demonstrations of unconscious perception have been possible in experiments that use tests of awareness that meet these criteria (e.g., Merikle & Reingold 1990). In sum, Lewicki, Czyzewska, and Hoffman's (1987) experiments demonstrate the dangers of asking the wrong questions and of ignoring substantial differences between different types of test.
With these considerations in mind, we now turn to an examination of other evidence for learning without awareness. In the following sections, we focus on four areas of experimental evidence: Conditioning, artificial grammar learning, instrumental learning, and sequential pattern acquisition.
2.4. Awareness and conditioning
2.4.1. Pavlovian conditioning. We begin with a consideration of whether classical or Pavlovian conditioned responses can be acquired in the absence of awareness of the scheduled contingency of reinforcement. Since many researchers regard conditioning as representing a relatively primitive learning system (see Boakes 1989), it is plausible to imagine that learning without awareness can occur in this context. However, the conclusion from a huge number of studies is quite the opposite: There is no compelling evidence for conditioning in human subjects without awareness of the reinforcement contingency. This conclusion was first reached in a classic review by Brewer (1974), and more recent studies have not changed the situation (see Boakes 1989, and Dawson & Schell 1985, for reviews). Such conclusions have not always been heeded, however, since there are still claims in the literature to the effect that conditioning can occur without awareness (e.g., Musen, Shimamura, & Squire 1990, p. 1074) and is therefore an instance of implicit, unconscious learning.
There have been two general approaches to examining the relationship between conditioning and awareness. First, some studies have sought to ascertain whether instructions to the subject concerning the nature of the relationship between a cue and a reinforcer affect conditioning as measured, for instance, by galvanic skin responses. The rationale is that if conditioning is a relatively automatic form of learning that can proceed independently of awareness, then changes in the subjects' conscious beliefs ought to have little effect on their behavior. Using this logic, Grings, Schell, & Carey (1973), for example, presented subjects with two conditioned stimuli (CSs), one of which (CS+) was followed by a shock unconditioned stimulus (US), and one of which (CS-) was not. At the end of the training stage, CS+ elicited a larger conditioned GSR than did CS-. Prior to the second stage, subjects were correctly told that the relationship between stimuli and shocks would now be reversed, with shocks following CS- but not CS+.
As has been observed in many other studies, these instructions had a powerful effect on conditioned responding. Grings et al. found that their subjects responded on the first trial of the second stage to CS- but not to CS+, indicating that the subjects' knowledge at least partially controlled their responding. Significantly, responding to CS+, a stimulus that had been paired several times with shock, was no greater than to a control stimulus that in the first stage had been presented with uncorrelated USs. Similar results of verbal instruction have been obtained in experiments using phobic stimuli such as pictures of snakes (Davey 1992), where it was once thought that conditioned responding could proceed independently of instructions (e.g., Hugdahl & Ohman 1977).
While such results are unsupportive of the notion that conditioning can proceed without awareness, they do not address the issue directly because awareness itself is not examined. A recent experiment by Lovibond (1992) exemplifies the approach of eliciting measures of awareness concurrently with conditioned responses. Lovibond presented subjects with two stimuli (slides depicting flowers or mushrooms), one of which (the CS+) was paired with shock while the other (CS-) was nonreinforced. Awareness of the relationship between the stimuli and shock was measured in two ways. First, during the learning phase subjects continually adjusted a pointer to indicate their moment-by-moment expectation of shock (note that asking for a rating of shock expectancy does not specifically direct attention to the A-B relationship), and secondly, at the end of the experiment they were given a structured interview designed to assess their awareness.
It should be apparent how the design conforms to the basic procedure depicted in Figure 1, except that there are 4 learning trials. In Lovibond's experiments each of trials 2-4 in fact represents a new learning trial, an assessment of whether learning occurred on the preceding trial(s), and an assessment of the subject's awareness on the preceding trial(s). The Information Criterion should not raise particular problems here, since there is little doubt that the information the subjects learn (the contingency between the CS and US) corresponds with what the awareness test asks them to report.
In each of the experiments, some subjects gave no indication on either of the tests of awareness that they associated A with shock to a greater extent than B. Critically, these subjects also gave no hint of stronger conditioned responding to A than to B. For subjects who were aware of the conditioning contingencies, GSRs were stronger to A than to B. Thus on these results we would have to conclude that learning about a CS-shock relationship does not occur in the absence of awareness of that relationship. It is also worth noting that Lovibond's experimental design is well-suited to demonstrating that our criteria for implicit learning do not make it a logical impossibility. If his results had been different -- something which is simply an empirical matter -- the criteria would have been met and implicit learning could have been firmly established.
Other studies have tried to mask the CS-US relationship and again compare awareness and conditioning. The results have been clear: So long as awareness is measured by an immediate test, usually a recognition test, significant conditioning only occurs in situations where the subject is aware of the contingency (see Boakes 1989; Dawson & Schell 1985). One recent experiment serves to illustrate the typical result. Marinkovic, Schell, and Dawson (1989) presented their subjects with a recognition memory task for odors. On each trial, one odor was presented for 8 sec as a "target," followed in succession by 3 further odors. Subjects' primary task was to say which of the 3 was the same as the target. One of the 3 recognition odors was in fact either the CS+ or the CS-. If it was CS+, a shock was presented at its offset; skin conductance was measured as the conditioned response. The question of interest was whether acquisition of GSRs could occur without concurrent awareness of the contingency between the CS+ and the shock. Marinkovic et al. measured awareness with a test in which subjects were required to indicate their expectancy of the shock during each odor on a 7-point scale. Because awareness was measured during the CSs, this again represents a concurrent assessment of awareness, rather than a post hoc one.
The outcome was that differential conditioning to CS+ was only observed in subjects classified as aware, indicating that awareness is necessary for conditioning. In addition, Marinkovic et al. obtained some evidence that when conditioned responding did occur, it only commenced after the onset of awareness. In sum, results from conditioning experiments appear to contradict the notion that this type of learning can proceed without concurrent awareness.
For a variety of reasons, some researchers have questioned whether galvanic skin responses condition in the same way as other responses such as the eyeblink or salivary reflexes. Thus it is worth noting that correspondences between awareness and conditioning seem to occur with other response systems as well (e.g., for eyelid conditioning, Baer and Fuhrer 1982).
The conclusion from these studies is clear, and confirms Brewer's (1974) earlier analysis: Pavlovian conditioning, which is often cited as a fundamental form of learning, does not seem to occur in the absence of awareness of the reinforcement contingency.
2.4.2. Evaluative conditioning. Evaluative conditioning refers to a form of learning which manifests itself in changes in affective responding to a stimulus (Martin & Levey 1978). Specifically, it refers to the transfer of affect from a US to a CS. Some authors (e.g., Baeyens, Eelen, & Van den Bergh 1990; Martin & Levey 1987) have suggested that -- unlike standard Pavlovian conditioning -- this form of learning can proceed in the absence of awareness of the CS- US relationship. We briefly review some of the relevant evidence.
Baeyens, Eelen, and Van den Bergh (1990) presented subjects with 10 repetitions of a CS-US pair in which the CS slide had previously been evaluated by the subject as affectively neutral and the US slide as either liked, neutral, or disliked. Evaluative conditioning was observed in that on a post-conditioning test of affect, the CS slides became affectively positive (liked) if they had been paired with a liked US, negative (disliked) if they had been paired with a disliked US, and remained neutral if they had been paired with another neutral stimulus.
As a test of awareness, at the end of the learning phase Baeyens et al. (1990) showed the subjects each of the CS pictures and asked them to identify which picture had been the relevant US. If subjects failed to respond correctly they were then asked whether the US had been liked, neutral or disliked. They were classed as "unaware" of the CS-US relationship if they failed on both of these questions. Evidence that evaluative conditioning occurred without awareness emerged in the observation that conditioning was the same for CS- US pairs regardless of whether the subject was aware or not of the relationship.
Of course, the test of awareness may have been an insensitive one. Baeyens et al. tried, therefore, to use a more sensitive concurrent measure of awareness. One group of subjects was required to indicate during the 4 sec interval between the onset of the CS and US slides whether they expected a liked, neutral, or disliked US stimulus on that trial. Subjects were classified as "unaware" if they failed to respond correctly on the final 3 pairings of each stimulus combination. Unfortunately, results from this group undermine the notion of unaware learning. As discussed in section 2.3, subjects could accurately report most of the pairings, and for those few they could not report, there was no significant evaluative conditioning. Further, in another study, Baeyens, Eelen, Crombez, and Van den Bergh (1992) found that while the magnitude of evaluative conditioning increased in groups of subjects given increasing numbers of CS-US pairings, awareness as measured by a post-conditioning test did too. In sum, these studies of evaluative conditioning have failed to prove that it can occur unconsciously. (See Shanks & Dickinson 1990 for some further criticisms of this research).
Although they are not usually classified as studies of evaluative conditioning, Lewicki's (1986; Lewicki, Hill, & Sasaki 1989) experiments on the learning of nonsalient contingencies can be readily conceived as such. Lewicki presented subjects with photographs of people accompanied by personality descriptions such as "kind" or "capable." In fact, for some subjects all "kind" people had long hair and all "capable" people had short hair, while for other subjects the opposite was the case. Lewicki reported that on test trials in which subjects had to affirm or disconfirm statements classifying new people as either "kind" or "capable," they responded "yes" more often when the description preserved the study-phase correlation than when it broke the correlation. (They also took reliably longer to answer "yes" when the correlation was preserved).
Lewicki's subjects were apparently unaware of the relationship between hair-length and personality description, since "not one subject mentioned haircut or anything connected with hair" (p. 138) in a test of verbally reportable knowledge. If we take the personality description as being an evaluative response conditioned to the cue of hair-length, then the results would again appear to suggest unconscious evaluative learning. However, that conclusion requires us to assume, without any supportive evidence, that the Sensitivity Criterion has been met in these studies. In addition, some of Lewicki's results have proven hard to replicate (see de Houwer, Hendrickx, Baeyens, & van Avermaet in press; Dulany & Poldrack 1991), and so we must at this stage reserve judgment on whether this form of learning really can occur unconsciously.
Conclusions. In experiments examining the relationship between learning and awareness in Pavlovian conditioning, researchers have striven to meet the Sensitivity Criterion by using multiple tests of awareness. The Information Criterion does not raise particular problems, since there is little doubt that the information the subjects learn (the contingency between the CS and US) corresponds to what the awareness test asks them to report. Thus these studies provide a reasonably good test of the role of awareness in learning. The results we have surveyed give little reason to believe that unconscious learning can occur in these situations. For evaluative conditioning the evidence is less clear-cut, but we have few reservations in suggesting that unaware evaluative learning has not yet been adequately established.
2.5. Awareness in artificial grammar learning tasks
Studies of subjects learning artificial grammars present the classic pattern of unconscious learning: Subjects clearly learn something about the domain, but they appear unable either to report the rules of the grammar or to explain their performance. Such studies provide evidence of unconscious learning if learning involves rule induction. In this section, we examine the evidence for unconscious learning of artificial grammars and conclude that memorization rather than rule-induction is the principal process involved, and we conclude that evidence for unconscious learning is weak. Later, in section 3.5, we review several further studies that have examined conscious hypothesis testing in artificial grammar tasks.
In the prototypical experiment, Reber (1967) required subjects to memorize either a series of letter strings generated from a small finite-state grammar, or a series of strings generated at random (see Figure 2). Subjects who learned the rule-governed strings then performed a grammaticality test in which they were asked to accept novel strings that fit the rules and reject novel strings that did not. They categorized 79% of the 44 test strings correctly, which is significantly above chance. Yet, these subjects were unable to report the rules they had apparently learned and then used in the grammaticality task.
(insert Figure 2 about here)
Reber's (e.g., 1967; 1989) account of such grammar learning results, endorsed by many other investigators since then, proposes that subjects use an unconscious, or implicit, rule induction mechanism. This mechanism creates a knowledge-base of rules that may be used in a grammaticality task but that is inaccessible to conscious report. As with the other unconscious learning paradigms, we believe that there is another way to interpret the data. We can raise two questions. The first (the Sensitivity Criterion) is whether retrospective verbal report is sufficiently sensitive to test for conscious knowledge of the rules. More sensitive measures of subjects' knowledge, such as concurrent thinking-aloud protocols and recognition tests, might reveal marginal or uncertain knowledge. The second question (the Information Criterion) concerns what the subjects are learning from the training strings. If subjects have learned something other than rules, then asking them about rules may lead to erroneous conclusions. On the other hand, if we ask the subjects questions about what they did in fact learn, we may get reasonable answers. It may be that usable knowledge is always both consciously learned and consciously applied. The experimenter's job is to discern what information subjects are aware of during training and whether that information is used to perform the grammaticality task.
2.5.1. Types of knowledge. The literature has identified three types of knowledge that might be acquired by subjects: Rules, memory for whole strings, and knowledge of the frequency and position of substrings such as pairs of letters. There are several problems with rules. First, it is not really clear what a "rule" would be like: Is it a re-write rule or a transition graph? How complex can it be, and how many are there? Second, such rules would be very difficult for any but very sophisticated subjects to articulate even if they did explicitly acquire them. Third, it is not clear what sort of mechanism is capable of acquiring such rules, particularly since it must ex hypothesi operate outside consciousness. In the face of these questions, it seems sensible to consider other types of knowledge first, and determine to what extent they can account for subjects' performance. We return to the evidence for knowledge of rules in artificial grammar learning tasks in section 2.5.3.
The picture with regard to memory for whole strings and knowledge of substrings seems reasonably clear. Such knowledge is easy to articulate, and in fact there is ample evidence that subjects do acquire this information, since they do articulate it. These types of knowledge are also consistent with a variety of contemporary memory models, such as chunking (Servan-Schreiber & Anderson 1990), distributed memory (Cleeremans & McClelland 1991), and memory array models (e.g., Estes 1986; Hintzman 1986; Nosofsky 1986). Further, these models have been shown to approximate subjects' grammaticality test performance. For instance, Dienes (1992) compared a number of these memory models on a set of grammaticality judgment data and was able to achieve good fits, particularly with distributed memory models. We return to this topic in section 3.3.
With these different knowledge types in mind, we can now ask what sort of information subjects in artificial grammar learning tasks actually acquire, and whether they are conscious of it. A number of studies have asked these questions using several methods and have asked them at various points during training and testing. Mathews et al. (1989) periodically interrupted subjects during training and asked them to instruct an imaginary confederate how to distinguish the grammatical strings. The trained subjects performed better on the grammaticality test than did the yoked subjects, suggesting that not all of the trained subjects' knowledge was explicit and reportable. This verbal report procedure, however, is essentially uncued recall, and so is unlikely to evoke all of the subjects' knowledge of the grammar. More interestingly, though, the verbal instructions that subjects did report consisted mainly of legal bigrams and other short sequences, sometimes coded by their positions in legal strings.
In a study by Servan-Schreiber and Anderson (1990), subjects were trained on grammatical strings using a recall task. For training, the strings were divided into substrings using gaps (T PPP TX VS). Servan-Schreiber and Anderson hypothesized that subjects in all grammar learning tasks encode the strings into substring chunks, and the gaps were used to ensure consistent chunkings across subjects. Subjects' written recall preserved these gaps. Servan- Schreiber and Anderson suggested that this phenomenon demonstrates that subjects were in fact encoding the strings as sequences of short strings in accord with the gaps. The subjects' later grammaticality judgements supported this contention as follows. Servan-Schreiber and Anderson constructed ungrammatical strings that consisted of illegal sequences of legal substrings (e.g., PPPTXTVS). If subjects were in fact learning just the substrings, then these strings would be falsely accepted as legal strings. They were: 50% of these strings were mistakenly accepted. On the other hand, test strings that violated specific substrings were correctly rejected. Only 26% of these strings were mistakenly accepted. Both subjects' written protocols during training and their test performance, then, support the hypothesis that subjects learn simple substring information in grammar learning tasks. Note that the fact that only 50% rather than 100% of the strings containing illegal sequences of legal substrings were accepted does not imply that knowledge of substrings is insufficient to completely account for performance. Compared to grammatical strings, these nongrammatical strings (by definition) still contain illegal bigrams (e.g., XT in the example above). Further, subjects' knowledge at test time is clearly incomplete: Previously-seen grammatical strings were only accepted 79% of the time.
Moreover, Servan-Schreiber and Anderson went on to build a model that acquired chunks and then used them to evaluate the grammaticality of test strings. The model performed at the level of trained subjects (r = 0.935). This result supports their claim that subjects are learning and using chunks by demonstrating that chunks are learnable and sufficient to account for the level of performance of subjects on the grammaticality task.
It is possible that Servan-Schreiber and Anderson's presentation technique, placing gaps in the training strings, biased subjects' learning procedure. A similar experiment by Perruchet and Pacteau (1990), however, used the standard, no-gap, format during training and found similar results. Subjects were trained on strings generated from the same grammar that Reber and Allen (1978) used. To test for awareness of simple substrings, trained subjects performed a recognition test on letter pairs present in the training strings. Subjects performed quite well: Only 3 out of 25 old pairs were judged less familiar than any new pair. The correlation between recognition scores and the frequency of occurrence of pairs in the training strings was 0.61. By the results of the recognition test, then, subjects were aware of the relative frequencies of letter pairs. Similarly, Dulany, Carlson, and Dewey (1984) concluded that a recognition test of awareness could elicit as much knowledge as was projected in the grammaticality test.
Perruchet and Pacteau also constructed test strings that either contained illegal orders of legal pairs or that contained illegal pairs. If subjects only had information about legal pairs on which to judge the grammaticality of test strings, then the illegal pairs should have been rejected, but the illegal orders of legal pairs should have been mistakenly accepted as grammatical. This is the pattern of results Perruchet and Pacteau obtained. Discriminability, measured in D scores (zero indicates random responding), was 22 for illegal pairs but only 7 for illegal orders. Therefore, these results further support the hypothesis that subjects are aware of and make use of only simple substring information.
Perruchet and Pacteau then considered a model that used pair frequency information to make grammaticality judgements. The model produced the same level of performance as subjects, except in one particular. Subjects were sensitive to the beginnings and endings of strings, but the model was not. Perruchet and Pacteau concluded that subjects primarily knew letter pairs, but also knew which pairs could legally start and end strings. Together with the behavior of Servan-Schreiber and Anderson's chunking model, these results show that simple fragment-memorization systems can be sufficient to account for subjects' imperfect performance on grammaticality tests.
Dienes, Broadbent, and Berry (1991) also found evidence that subjects were sensitive to more than just pairs. Following training and a grammaticality task, subjects were given incomplete letter sequences varying in length from zero letters upwards (e.g., VXT...) and asked to judge which single-letter continuations (M? V? X? R? T?) were acceptable at the next location in the string. In this sequential letter dependencies (SLD) task, which was hypothesized to be sensitive to conscious knowledge of the grammar, subjects were sensitive to illegal orders of legal pairs even in the middle of strings. Dienes et al. showed that the knowledge that subjects demonstrated in the completion task correlated with their grammaticality judgments and could be used to model the grammaticality judgment data. They found in addition that knowledge gleaned from subjects' free reports also correlated with their grammaticality judgments, but that less knowledge was reported in the free report task than in the continuation task. These correlations suggest that a single knowledge source is tapped by both tasks, but that the free report task, uncued recall, is less sensitive.
Reber and Allen (1978) asked subjects to retrospectively describe their learning experience and to concurrently justify their grammaticality judgments. Overall, subjects justified their classifications on 821 out of 2000 test strings. Subjects reported using a variety of information in making their grammaticality judgments. The violation or nonviolation of bigrams was the most common justification, especially concerning the first bigram of a string. String-initial bigrams accounted for fully 30% of the justifications. Violations of single letters, particularly the first or last letter of a string, and violations of trigram or longer sequences were also reported, as well as recognition of and similarity to whole training strings. The grammaticality responses to the remaining unjustified cases presumably consisted of guessing or knowledge that could not be elicited by verbal report.
So much for substring knowledge. Vokey and Brooks (1992; Brooks & Vokey 1991) have argued that subjects can encode whole- item information in addition to substring information. They found that the similarity of test strings to specific whole study strings is an important factor in subjects' grammaticality judgments. When the grammaticality and the similarity of the test strings were varied independently, they were shown to be additive factors on grammaticality judgments. Vokey and Brooks argued that such a result indicates that subjects have encoded the whole strings and can determine similarity relationships between strings.
Brooks and Vokey's evidence for whole string information raises no particular problems for our interpretation of the artificial grammar learning data, since subjects are clearly aware of their whole string knowledge just as they are aware of the substring knowledge; the study task, after all, requires the subjects specifically to memorize the whole strings. However, as Brooks and Vokey (1991, p. 321) themselves concede, their results can at least in principle be explained without reference to whole item knowledge. Just as grammatical test strings tend to contain more studied bigrams than nongrammatical strings (Perruchet & Pacteau 1990), so it is also the case that a test string that is highly similar to a study string will contain more studied bigrams than one that is less similar. In fact, Vokey and Brooks' results have been challenged by Perruchet (in press-a), who has shown that both the effect of similarity and the apparently independent effect of grammaticality that Vokey and Brooks obtained can in turn be reduced to substring knowledge. Grammatical test strings tend to contain more substring components that were part of the training strings than do nongrammatical items. The same is true for similar and dissimilar test items, with similar items tending to contain more substring components from the study strings.
A final piece of evidence supports the view that grammaticality judgments are controlled by comparison to memorized substring or whole-item information. On such a view, But not on an abstraction account, it is likely that judgments would be relatively susceptible to changes in the superficial characteristics of the studied strings. To test this, Whittlesea and Dorken (1993) required subjects to pronounce the training strings from one grammar and to spell the training strings from another grammar. At test, subjects were asked to either pronounce or spell test strings and to judge their grammatical status. Subjects were more likely to assign test strings to grammars when the encoding task matched the task for the test string than when they differed. Test strings that were equally similar to strings in both grammars were assigned to the grammar where the encoding and test tasks matched. Such results, while consistent with the idea that judgments are based on a comparison to a set of items in memory that represent the study items in a relatively unanalyzed form, would clearly not be anticipated if what was encoded were the underlying abstract rules of the grammar.
Our conclusion from this section, then, is that subjects use their memory system to acquire knowledge of (possibly) whole strings and (certainly) their parts, and that this simple information is conscious both during acquisition and testing. The results reported by Dulany et al. (1984), Perruchet and Pacteau (1990), and Dienes et al. (1991) show that the knowledge subjects can consciously retrieve in a recognition test is sufficient to explain their grammaticality judgments. From the evidence we have considered, we do not need to assume the existence of an additional implicit knowledge base, and conclusions to the contrary have arisen because of failures to meet the Information Criterion.
Our interpretation rests on the results of a variety of tests of conscious knowledge which have attempted to address the Sensitivity Criterion. Dienes et al.'s (1991) SLD test, for instance, which required subjects to judge which continuations of a sequence of letters were legal, was actually found in a signal detection analysis to be more sensitive than the implicit grammaticality test itself. Thus if such a test is accepted as a measure of explicit knowledge, no evidence of a dissociation between learning and awareness emerges. Of course, an alternative (see Reber, Allen, & Regan 1985, and the reply by Dulany, Carlson, & Dewey 1985) is to argue that performance on these explicit tests is contaminated by unconscious influences: Subjects may choose a correct continuation on the SLD test as a result of some implicit knowledge to which they do not have conscious access2. But the problem with this interpretation is that it means that we would have to abandon the test as an index of conscious learning and rely instead on verbal reports, in which case it is hard to see how the Sensitivity Criterion can ever be met. And if that criterion cannot be met, then how are defenders of unconscious learning ever going to unconfound test type from sensitivity and hence establish the existence of unconscious learning?
In fact, we believe that it is rather unlikely that unconscious influences do play a significant role in the SLD test. Presenting subjects with a letter sequence (e.g., VXT...) and asking them to judge, under no time pressure, whether a given letter (e.g., M) could continue the sequence would seem to be a prototypical example of a task requiring conscious reflection, even if it involves mere conscious recollection of studied strings. Nevertheless, to claim that the SLD test is only sensitive to conscious information does require adopting what Reingold and Merikle (1988) call the "exclusiveness" assumption: the assumption that performance on a test of awareness is only affected by conscious influences. This, of course, is a very strong assumption and one that may well not be correct.
2.5.2. Learning systems. In addition to the question of awareness, a second issue is whether whole item, bigram, and possibly rule information are acquired by a single learning system or by separate systems. If they are acquired by separate systems, perhaps those systems interfere with each other's operation? To examine this possibility, Reber and Allen (1978) manipulated the training task. Subjects either performed observation training, where they observed the strings without any explicit task, or they performed a paired associate task where each string was paired with a different city name. The idea was that the paired associate task would require better item encoding, thereby facilitating item knowledge but potentially inhibiting other learning processes.
The paired associate task produced several significant differences from the observation task. Overall, paired associate subjects were less accurate on their grammaticality judgments: 72.4% vs. 81.2% accurate for observation subjects. Paired associate subjects produced twice as many recognition justifications as did observation subjects (77 vs. 40), and paired associate subjects' probability of making consistent errors suggested they were more likely to develop unrepresentative knowledge than were observation subjects.
Clearly, the two training tasks affected the quantity of whole item and substring knowledge that was acquired, but the underlying learning processes do not appear to be in opposition. The verbal reports show that both groups justified their responses with the same knowledge sources, just to differing degrees. It appears, then, that whole item learning is compatible with substring learning. Vokey and Brooks (1992) examined a range of encoding tasks that produce differences in the extent of item knowledge, but they also found no reliable interference between item knowledge and substring knowledge.
Finally, Dienes, Broadbent, and Berry (1991) required subjects to generate random digits during training. Their goal was to test whether this task would interfere selectively with subjects given explicit instructions to search for rules to describe the study strings, but not interfere with subjects given implicit instructions simply to observe the study strings. Instead, Dienes, et al. found equivalent reductions in learning for both implicitly and explicitly instructed subjects.
2.5.3. Implicit rule induction. While the considerable evidence presented above supports the conclusion that subjects' knowledge consists of simple substrings (or whole strings), there are two pieces of evidence that support the conclusion that subjects learn rules. The first piece of evidence supporting rule learning was reported by Reber and Lewis (1977). Subjects were trained on a subset of strings and then solved "anagrams" based on the remaining strings generated from the grammar -- that is to say, they took strings of letters and rearranged them to make grammatical strings. The frequencies of bigrams produced by subjects in the anagram task were tabulated and compared to the frequencies of the bigrams in the training set and in the full set of grammatical strings. If subjects were learning bigram frequencies from the training strings, then the correlation between the frequencies of bigrams in the training strings and in the solved anagram strings should be high. While this was the case, Reber and Lewis found that the correlation between the frequencies of bigrams in the solved anagrams and in the whole grammar was actually higher. This result suggests that the subjects went beyond the training set to learn the rules of the grammar.
Perruchet, Gallego, and Pacteau (1992), however, argued that Reber and Lewis' (1977) result must obtain on statistical grounds alone. The anagrams demand the production of certain bigrams and not others, in fact exactly those bigrams which are underrepresented in the training set. Suppose, for example, that VT is a bigram in the grammar that is underrepresented among the training strings. VT must then be overrepresented among the solved anagram strings since the training and correctly-solved anagram strings together constitute the complete set of grammatical strings. It is no wonder, then, that the correlation between the frequencies of anagram bigrams and training bigrams is low and that the correlation between anagram bigrams and the full grammar bigrams is higher. Perruchet et al. (1992) went on to demonstrate this fact empirically by training subjects only on the individual bigrams from the training strings. Subjects under these circumstances could not be learning rules because they only saw bigrams, yet as with Reber and Lewis' subjects the frequencies of their anagram bigrams also correlated better with the full grammar bigrams than with the training string bigrams. The original conclusion, therefore, that subjects go beyond the training strings to learn rules, appears to have been an artifact of the experimental design.
The second and more compelling piece of evidence for abstraction is the fact that subjects show some degree of transfer to strings governed by the same underlying grammar but formed from a new set of letters or from a completely new set of stimuli such as tones. Reber (1969) trained subjects to recall grammatical strings, and when he switched to a new set of letters, subjects showed no increase in recall errors. This result suggested that subjects had learned abstract rules that were easily instantiated with different letters. More impressively, Altmann, Dienes, and Goode (in press) required subjects to observe a set of letter strings generated from the grammar shown in Fig. 2 prior to making grammaticality judgments concerning sequences of tones. Some of the tone sequences could be generated from the grammar by substituting a tone for a letter (e.g., middle C for the letter M). Altmann et al. found that exposure to letter sequences allowed grammatical and nongrammatical tone sequences to be discriminated at better-than-chance levels. Although the improvement was generally small (about 5% increase in correct classifications), this result strongly suggests that at least some aspects of the abstract structure of the letter-sequences had been abstracted and was available to aid classification of the tone sequences.
It is important to note that the change of stimulus set did have a detrimental effect on performance, though. Compared to a situation in which the study and test items were from the same set (both letters or both tones), classification performance was significantly impaired when the study and test sets differed. Thus abstract knowledge is plainly not the sole source of information subjects were relying on -- specific memorized fragments or strings must also have been playing a role. A study by Mathews et al. (1989) confirms this conclusion. In Mathews et al.'s study, over a series of training sessions, subjects were either trained on a single string set or on different sets generated from the same underlying grammar. Subjects in the same set condition learned better, and a final switch to a new set doubled the error rates in the single set training condition. Such a result would not be expected if an abstract set of underlying rules was the sole factor guiding classification, since the rules would apply equally to the new and to the original letter set.
What is the significance of these results for unconscious learning? To the extent that subjects might be poor at describing what they have abstracted, such results may imply that unconscious learning is taking place. But given the rather small improvement in classification performance that results from training and testing on different sets of items, it is quite likely that what is abstracted is fairly limited (e.g., only two initial symbols are legal, the first two symbols of a string cannot be the same, etc.), and it is quite possible that subjects, if asked, would be able to report such simple regularities. In sum, while the data from transfer studies does suggest that some aspects of the underlying structure can be abstracted, from the point of view of unconscious learning the significance of these findings has yet to be established.
Conclusions. These studies indicate that relatively simple information is to a large extent sufficient to account for subjects' behavior in artificial grammar learning tasks. Further, and most importantly, this knowledge appears to be reportable by subjects. Knowledge of the grammar does not to any major degree seem to be acquired by explicit hypothesis testing or other complex analytic processes (although we return in Section 3.2 to consider some rather different cases where grammars appear to be learned explicitly). Instead, knowledge seems to be mainly accumulated over training by simple memory mechanisms that collect frequency statistics on bigrams, slightly longer sequences, and possibly whole items.
2.6. Awareness in instrumental learning tasks
In contrast to the conditioning and artificial grammar studies described above, which arrange relationships between external cues, instrumental tasks establish some contingency between an action the subject performs and an associated outcome. Learning is measured as a change across trials in the propensity to perform the action. Naturally, the question we may again ask is whether such learning can occur without awareness. As with his review of Pavlovian learning studies, Brewer (1974) concluded that the answer to this question is "no." Some recent reports have further investigated the role of awareness in instrumental learning: We consider results separately from tasks in which the instrumental contingency is simple or more complex. By "simple" we mean any task in which there is ostensibly just one action available to the subject.
2.6.1. Simple instrumental learning tasks. Svartdal (1989; 1991) has reported a number of studies in which subjects are led to believe that there is a relationship between a reinforcer and one aspect of responding, when in fact the critical variable is some other aspect of responding. For example, Svartdal (1991) presented subjects with brief trains of between 4 and 17 auditory "clicks." Subjects immediately had to press a response button exactly the same number of times and were instructed that feedback would be presented when the number of presses matched the number of clicks. In fact, though, feedback was contingent on the rate of responding: For some subjects, feedback was given when the inter-response times (IRTs) were lower than in a baseline phase, while for others it was given when IRTs were higher.
Svartdal (1991) obtained evidence of learning, in that IRTs adjusted appropriately to the reinforcement contingencies. But subjects seemed to be unaware that it was rate of responding that was important. A structured questionnaire revealed no evidence of awareness of the contingency between response rate and feedback in subjects whose response rate had adjusted appropriately.
Such demonstrations appear superficially to be quite compelling, especially as the contingency to be learned is such a simple one. However, it is unclear that the Information Criterion is met in these and similar studies, because it is very difficult to rule out the possibility that subjects acquire "correlated" hypotheses about the reinforcement contingency which are incorrect from the experimenter's point of view but happen to produce response profiles that are correlated with those generated by the correct hypothesis. For example, suppose that subjects learn that resting their hand in a certain position increases reinforcement rate. This could be a true experienced contingency if that hand position was conducive to a fast or slow response rate. Such an "incorrect" hypothesis would generate behavior that is very similar to what would be produced by the correct hypothesis, yet a subject who reported hand position as the crucial variable would be regarded by the experimenter as "unaware" of the reinforcement contingency.
While such a criticism is undoubtedly post hoc, there is good evidence of subjects' behavior being under the control of such correlated hypotheses. In the 1950's, a number of studies asked subjects to generate words ad libitum and established that the probability with which they would produce, say, plural nouns was increased if each such word was followed by the experimenter saying "umhmm" (e.g., Greenspoon 1955), and as with Svartdal's experiment, this result occurred in subjects apparently unable to report the reinforcement contingency. However, in an elegant study, Dulany (1961) proved that subjects were hypothesizing that reinforcement was contingent on generating a word in the same semantic category as the previous one. Although incorrect, this hypothesis is correlated with the true one, in that if the subject said "emeralds" and was reinforced, then staying in the same semantic category meant they were more likely to produce another plural noun ("rubies") than if they shifted categories. Thus the subjects were perfectly aware of the contingency that was controlling their behavior, namely the contingency between staying in the same semantic category and reinforcement.
In sum, even ignoring possible insensitivity in the test of verbal awareness, results such as Svartdal's cannot be taken as conclusive evidence of unaware learning. Subjects may learn a rather different contingency from that explicitly programmed by the experimenter, and the Information Criterion may therefore not be met. The problem is particularly worrisome in operant studies because, by definition, the experimenter has little control over the subject's behavior and therefore over the contingencies that may be present. In non-operant tasks the problem can be avoided because the experimenter can in principle eliminate all reinforcement contingencies except the one of interest. For this reason, it seems that clear evidence for unconscious learning is likely to be difficult to establish in instrumental learning tasks.
In contrast to such apparent dissociations between learning and awareness, Shanks and Dickinson (1991) have argued that there are a number of variables which seem to have rather similar effects on performance assessments of learning and on awareness. In two studies, subjects performed a simple operant learning task in which pressing a key on a computer keyboard was related, via a schedule of reinforcement, to a triangle flashing on the screen. Subjects were exposed to a reinforcement contingency in which they scored points whenever the triangle flashed, but lost points for each response, so that they were encouraged to adapt their response rate to the reinforcement schedule. Learning was demonstrated by changes in subjects' rates of responding. As a measure of awareness, subjects were asked to report on a scale from 0 to 100 what they thought the relationship was between the response and the reinforcer.
Shanks and Dickinson (1991) found that response rate was sensitive both to the degree of contiguity between the response and reinforcer, and also to the degree of contingency between them. At the same time, subjects' judgments were equally sensitive to these factors. Furthermore, certain judgmental illusions also manifested themselves in performance measures. For instance, a frequently seen phenomenon is that subjects judge an action and an outcome to be related when in fact they are not. Shanks and Dickinson found that this effect appears in performance measures like response rate as well as in verbal judgments. Of course, the appearance of a bias in two behavioral measures strongly suggests that they are mediated by a common underlying process.
The notion that learning and awareness proceed in tandem is corroborated to the extent that they are affected in similar ways by various manipulations. Shanks and Dickinson's results indicate that -- at least for the two important factors of contingency and contiguity -- this is exactly the case. Shanks (1993) discusses some further apparent concordances.
The human operant learning literature provides perhaps the most convincing evidence that learning and awareness are associated in simple learning tasks. A wealth of data shows concordances between response rate and verbal reports under different schedules of reinforcement (e.g., Catania, Shimoff, & Matthews 1989; Rosenfarb, Newland, Brannon, & Howey 1992; see also Skinner 1984, and accompanying commentaries). For instance, Rosenfarb et al. required subjects to press a button on either a differential-reinforcement-of-low-rate schedule, in which reinforcers were delivered for a response provided that 5 sec had elapsed since the preceding response, or on a fixed-ratio schedule in which 8 responses were required to earn a reinforcer. Rosenfarb et al. found that subjects' verbal reports concerning the programmed contingency accorded very well with the actual contingencies. Furthermore, there was a strong correlation between the time at which responding became appropriate for a schedule and the time at which verbal reports indicated awareness of the reinforcement contingency operating in that schedule.
2.6.2. Complex instrumental control tasks. Several experiments have investigated the relationship between learning and awareness in more complex instrumental learning tasks where the subject has to learn to control an interactive system. Again, the basic idea is as shown in Figure 1, with some learning episode followed by an assessment of awareness. In most of these tasks awareness at time t2 is measured by verbally questioning the subject.
Berry and Broadbent (1984) conducted an influential and widely-cited experiment in which there was an apparent dissociation between learning and awareness. As in Hayes and Broadbent's (1988) study, one of the tasks they used required subjects to interact with a computer "person." On each trial, the subject entered an attitude (e.g., polite) to the computer, which then responded with its attitude (e.g., unfriendly). The subject's task was to try to get the computer to be friendly. The computer's attitude on each trial was a simple numerical function of the subject's input on that trial and the computer's previous attitude. Inclusion of the computer's attitude on the previous trial makes the task quite a difficult one to learn.
Berry and Broadbent (1984, Experiment 1) found, not surprisingly, that performance improved with practice: Significantly more trials on target occurred during a second block of 30 trials than during the first block. However, scores on a structured questionnaire designed to assess the subjects' reportable knowledge of the task were no better after the second block than after the first one. Hence here we have apparent evidence that learning to perform a task can take place without any change in awareness of the underlying structure of the task. Similar results have been obtained in a number of other studies (e.g., Berry & Broadbent 1987; 1990; Broadbent, FitzGerald, & Broadbent 1986; Hayes & Broadbent 1988; Stanley, Mathews, Buss, & Kotler-Cope 1989).
On the other hand, a detailed examination by Sanderson (1989) found evidence of associations rather than dissociations between performance and reports. Sanderson argued that because subjects often have complex prior beliefs about the interactions within a large system, and because these beliefs may be erroneous in a laboratory version of the system, it is possible for their mental models to undergo considerable revision without yielding an overall improvement in accuracy. It is only with prolonged practice that mental models, and hence the verbal reports based on them, begin to show noticeable improvement. Consistent with this, Sanderson (1989) was able to obtain significant performance improvements at the same time as weak improvements in the overall accuracy of verbal reports in a complex transportation task, but showed that the detailed nature of the verbal reports was changing very considerably.
A further experiment by Berry and Broadbent (1984) found the converse of the previous dissociation, namely reportable knowledge improving without corresponding improvements in task performance. One group simply completed two sets of trials, while between the two sets another group received detailed verbal instructions about the nature of the input-output relationship. These instructions essentially represented a verbal description of the equation governing the computer's attitude. When questioned at the end of the experiment, subjects who had received instructions outscored those who had not, yet the groups were indistinguishable in terms of number of trials on target. Thus a change in "awareness" (or at least a change in reportable knowledge of the task) was not accompanied by a change in task performance.
What are we to make of such dissociations? One possibility is that it is not only possible for learning to proceed without awareness, but in addition the system responsible for implicit learning is quite independent of another (explicit) system in which learning is accompanied by awareness. Such a "systems" account would then be able to explain why we can obtain double dissociations of the sort reported by Berry and Broadbent: Learning to perform the control task involves the implicit system, and proceeds without awareness, while a change in awareness involves the explicit system and can proceed without any benefit in task performance.
While double dissociation results are certainly consistent with the notion that there are two learning systems, one conscious and the other unconscious, we feel that an alternative account is equally feasible: There may be two systems, both of which are conscious, but which encode different types of knowledge. The basic problem is that we do not know that the sort of knowledge that subjects in Berry and Broadbent's experiments acquire when learning to perform the task is at all the same as the knowledge they require to score well on the test of reportable knowledge (i.e., the results may fail to meet the Information Criterion). Suppose, for the sake of argument, that good task performance simply depends on learning an unrelated set of stimulus-response (S-R) pairs or instances (evidence for such a possibility certainly exists: see Cleeremans 1993). It is then not hard to imagine that although practice provides the subjects with more and more knowledge of this sort, they might be hard pressed to use such knowledge when faced with questions about possible structural rules underlying the task. At the same time, giving the subjects detailed instructions about the task may improve their knowledge of the rules, and hence their questionnaire scores, but might not transfer to better performance on the task itself since S-R knowledge is required for that. But of course the subjects' inability to describe the rules underlying the task does not imply that the S-R learning occurred without awareness: If they were asked to report that knowledge, perhaps subjects would be able to do so. In sum, there are ways of interpreting such data that do not appeal to unconscious learning (see Stanley et al. 1989, for an examination of some of the alternative types of knowledge that subjects may encode).
A second problem concerns the sensitivity of the test of awareness. Can we be certain that the questionnaire procedure exhausts the subject's knowledge of the task? Can we be confident that a failure on the questionnaire to express awareness of the nature of the task means that the subjects were unaware at the time they were learning? For instance, one alternative strategy would be to ask each subject to instruct a yoked "partner" how to perform the task. If the partner could then perform the task as well as the original subject, we would conclude that the original subject was in fact able to articulate all his or her task knowledge. Such procedures have been used with other learning procedures (e.g. Mathews et al. 1989, for grammar learning) and have proven highly sensitive.
Conclusions. Instrumental learning experiments arrange some relationship between the subject's actions and certain outcomes. Implicit learning would be demonstrated if learning, as indexed by changes in instrumental behavior, occurred in the absence of awareness of the reinforcement contingencies. Although some studies have found that subjects are apparently unaware of the relevant contingencies, reliance on verbal report means that the Sensitivity Criterion is unlikely to have been met. Furthermore, because the experimenter necessarily yields a certain degree of experimental control in an instrumental learning task, it is difficult to rule out the possibility that the subject is responding on the basis of a correlated hypothesis, in which case the Information Criterion is violated. Finally, even ignoring these considerations, a surprisingly large number of studies have documented impressive concordances between behavior and awareness.
2.7. Learning and awareness in serial reaction time tasks
Nissen and Bullemer (1987) and Lewicki, Czyzewska, and Hoffman (1987) introduced an ingenious and simple technique, the serial reaction time task, in an attempt to demonstrate unconscious learning. In Nissen and Bullemer's version, a stimulus is presented on each trial in one of four locations (A-D), and the subject simply has to press as fast as possible the button corresponding to that location. The subject is given instructions appropriate for a typical choice RT task, but in fact there is a sequence underlying the selection of the stimulus on each trial. The question is, can subjects learn the sequence without being aware of it? With respect to Figure 1, the subject is presented with a series of learning trials in which there are predictive relationships between stimuli. These are accompanied by both a concurrent assessment of learning (RT) and a later assessment of awareness.
Some of the most compelling evidence for unconscious learning using this technique comes from a later study by Willingham, Nissen, and Bullemer (1989), and this study is worth considering in some detail because of the heavy reliance placed upon it in recent discussions of conscious and unconscious processing (e.g., Velmans 1991). In their first experiment, Willingham et al.'s subjects performed a 4-choice RT task. The actual sequence of signals was DBCACBDCBA... which repeated many times with no break between cycles. Subjects' RT improved across a total of 400 trials. To see whether this improvement represented knowledge of the sequence or general nonspecific speed-up, Willingham et al. compared the speed-up of their subjects to that obtained in a group of subjects from the earlier study by Nissen and Bullemer for whom there had been no structured sequence; for these control subjects, target location was random from trial to trial, except that the same location never occurred on consecutive trials.
The improvement in RT was significantly greater in the sequence group than in the control group, apparently indicating that sequence learning had occurred. Furthermore, this was still true for subjects who subsequently reported no awareness of the existence of a sequence during the RT trials.
2.7.1. Problem of suitable control group. While such results suggest the possibility of unconscious learning, there are a number of significant problems with such experiments. First, the demonstration of sequence learning has typically involved one of the following two comparisons: (i) A comparison (e.g. Willingham et al. 1989) between a group exposed to the sequence versus one for whom the stimulus on each trial is chosen at random (with the constraint that stimuli never repeat on consecutive trials), or (ii) a within-subjects comparison (e.g., Hartman, Knopman, & Nissen 1989) between performance at the end of a long period of exposure to the sequence versus performance on a subsequent block of trials where the stimuli are chosen at random, again with the constraint that stimuli never repeat on consecutive trials. The problem with both of these comparisons is that performance can differ between the sequence and random trials without the subject having any knowledge -- implicit or otherwise -- of the sequence.
As a moment's reflection reveals, faster responses on the DBCACBDCBA sequence compared to a random sequence might simply be due to response biases developing during exposure to the sequence. The stimuli are not equally frequent (B and C occur 3 times, D and A twice) in the 10-trial sequence. Thus in the sequence but not the random conditions the subject is to some degree able to predict which stimuli are most probable, a fact which -- as has been extensively demonstrated (see Broadbent 1971) -- will allow fast responses to develop.
Clearly, the appropriate comparison is with a group of subjects who receive a "pseudo-random" series constrained to have the same number of each of stimuli A, B, C, and D per 10-trials as appear in the sequence proper, and in which stimuli never repeat on consecutive trials. Such an experiment was reported by Shanks, Green, and Kolodny (in press). One group of subjects was presented with the normal sequence, another with the pseudo-random series, and a third with a "truly random" sequence in which again there was the constraint that stimuli never repeated on consecutive trials. The stimuli were dots arranged in a horizontal row and the general procedure followed that of Willingham et al. (1989).
After 400 RT trials, subjects in the sequence group were classified on the basis of a structured interview as having no knowledge of the sequence, some knowledge, or full knowledge. The prediction was that if the no knowledge subjects had indeed learned something about the sequence, they should have speeded up more than the pseudo-random subjects. In all but the truly random group the RT difference between the first and fourth block of 100 trials was significantly greater than zero. The normal-sequence/full- knowledge group speeded up more than any of the others; the difference between the normal-sequence/full-knowledge and pseudo-random groups confirms that the normal-sequence/full- knowledge subjects had indeed learned something about the sequence. However, there was no significant difference between the normal-sequence/no-knowledge and pseudo-random groups, though both speeded up more than subjects exposed to the truly random series. Thus, we suggest that with Willingham et al.'s stimuli and procedure, most if not all of the supposedly implicit learning in the normal-sequence/no-knowledge group is simply due to the development of response biases reflecting knowledge of the frequencies of the different stimuli.
As a consequence, Willingham et al.'s experiment fails to satisfy the Information Criterion. Subjects' inability to verbally articulate information about the sequence may have been due simply to the fact that they were not learning (in any sense) about the sequence. Instead, they were learning about the frequencies with which the different stimuli occurred. This is information they may, if asked, have been able to report.
2.7.2. Prediction tests as measures of awareness. The second problem is that, even ignoring the above considerations, we cannot of course rely just on the subjects' informal reports as assessments of their state of awareness some seconds or even minutes previously. Two somewhat different strategies have been advocated with regard to using more sensitive tests of awareness, namely recognition and prediction tests. We discuss recognition tests in the next section. Prediction tests, introduced by Nissen and Bullemer (1987), require the subject to try to predict the next element of the sequence, and such a test was used by Willingham et al. (1989) in addition to their verbal report test. After the RT phase of their experiment, Willingham et al. instructed subjects to try to predict on each trial where the stimulus would next appear, with no requirement for rapid responses. Subjects simply chose response keys on each trial until they picked the correct one, at which point they would then try to predict the next stimulus. Across many blocks of this prediction task, the subject again has the opportunity to learn the sequence. Evidence for explicit knowledge of the rule appears as savings (compared to the control group) in the number of trials required to learn the sequence in the prediction task3.
The rationale behind the prediction task is that if subjects are instructed to try to predict events, and are able to do so with above- chance accuracy, then this is evidence of conscious knowledge since their predictions must be based on conscious expectancies. As this task requires the subject to act on a conscious expectancy concerning which stimulus will appear next, it is apparently a test of awareness of elements of the sequence. This contrasts with the RT task, in which they have to respond as fast as possible to the current target. The prediction task is a good one in that it is immaterial whether the subjects believe or not that their performance in the RT phase was being affected by the sequence (indeed, they may not even be able to consciously report having detected a sequence). All that matters is whether any evidence of savings emerges in the prediction task, for according to the reasoning behind the task such savings must be due to conscious information about the sequence.
More to the point, failure at the prediction task would demonstrate a subject's inability to consciously draw on information about the sequence, thereby supporting the contention that the information really is implicit. Importantly in drawing such a conclusion, the prediction task satisfies the Sensitivity Criterion where verbal reports did not. The retrieval cues for the prediction task are virtually identical to those of the reaction time task. Therefore, we now have two tests which are almost identical, but in one the subject's performance (i.e., RT) is measured, and in the other awareness is assessed. This very much follows the rationale of recent experiments on unconscious perception (e.g., Merikle & Reingold 1992), where the test of awareness and the test of perception are designed to differ in little more than the instructions given to subjects. While the temporal arrangement of stimuli and responses is different in the two tasks, and the response metrics are quite different, the prediction task nonetheless represents an interesting new procedure for assessing awareness.
What are the results obtained from studies using the prediction task? Willingham et al. (1989, Experiment 1) discovered that subjects whom they had classified as unaware on the basis of their verbal reports not only speeded up in the RT phase, but also, according to Willingham et al., showed no evidence of awareness as assessed by the prediction task. Such a result appears to provide quite compelling evidence of implicit learning, even if Shanks et al.'s data suggest that learning probably involved frequency rather than sequence information. It is important to note that this dissociation of RT speed-up and prediction performance only applies to subjects who have been selected on the basis of their verbal reports as unaware. Across all subjects (regardless of their verbally reported awareness), RT speed-up and prediction performance tend to be closely associated, as experiments by Cleeremans and McClelland (1991) and Perruchet and Amorim (1992) have shown.
Willingham et al. compared the performance of their unaware, no knowledge subjects in the prediction task with that of a "no training" group who had not received the RT phase at all. This comparison was the critical evidence that the sequence learning in the normal sequence/no knowledge subjects was implicit. However, there are three problems with these results. First, while Willingham et al. claimed that there was no evidence of savings on the prediction task in their "no explicit knowledge" subjects, close inspection of their data reveal that these subjects did perform at a better level than naive subjects, albeit not significantly so. Over each of the first 6 sets of 10 trials of the prediction task, performance was better in the normal sequence/no explicit knowledge group than in the control group, by about 5% in each set (Willingham et al. 1989, Figure 3). On the first block of trials, the normal sequence/no explicit knowledge group scored 42.6% correct and the control group 38.7%. Although small, this trend is as much evidence for savings as it is for a dissociation between awareness and learning. A similar conclusion may be drawn for the data reported by Hartman, Knopman, and Nissen (1989) where small but consistent savings are also apparent.
Secondly, Perruchet and Amorim (1992) pointed out that Willingham et al. did not instruct their subjects that the stimulus sequence in the prediction phase would be the same as that in the RT phase. Subjects may not, therefore, have been maximally motivated to show transfer savings in the prediction task. The third and final problem is that Shanks, Green, and Kolodny (in press), in their replication and extension of Willingham et al.'s study, obtained savings that were of a statistically significant magnitude. Shanks et al.'s normal sequence/no knowledge subjects performed much better (mean 5.7 correct predictions) than the no training control subjects (mean 2.7) across the first 10 trials of the prediction phase, indicating that at least some of the knowledge they had acquired in the RT phase, but were unable to verbally report, was available for transfer to the prediction task. In sum, we conclude that the Willingham et al. study has failed to establish unconscious sequence learning.
A number of other studies have also used the sequence learning task. Several of these have adopted Willingham et al.'s procedure of classifying some subjects as unaware on the basis of their verbal reports and then examining their prediction task performance. Others have sought to obtain different dissociations between RT speed-up and prediction performance. Whatever the strategy used, we suggest that claims for implicit learning in these studies (e.g., Cohen, Ivry, & Keele 1990; Hartman, Knopman, & Nissen 1989; Howard & Howard 1989; Knopman 1991; Lewicki, Czyzewska, & Hoffman 1987; Lewicki, Hill, & Bizot 1988; Nissen & Bullemer 1987; Nissen, Knopman, & Schacter 1987; Stadler 1989) are difficult to interpret for many of the reasons we have raised concerning Willingham et al.'s experiment. These other studies either (i) fail to show that subjects have acquired any sequence knowledge in the RT phase, (ii) show small but consistent trends towards savings in the prediction task in supposedly unaware subjects, (iii) present control subjects in the prediction task with random rather than pseudorandom sequences, or (iv) do not provide feedback in the prediction test and hence run the risk of inducing forgetting of the sequence, which will lead to an underestimation of conscious knowledge. Caution suggests that these studies do not warrant the conclusion of reliable sequence learning in the absence of awareness.
Rather than reviewing all of these studies, we consider two widely-cited ones (Lewicki, Hill, & Bizot 1988; Stadler 1989) which illustrate some of the problems. Lewicki, Hill, and Bizot (1988) presented subjects with blocks of trials that were arranged into sequences of 5 trials. On each trial a target appeared in one of the four quadrants of the computer screen, and the subject had to respond by pressing the key appropriate to that quadrant. RTs were collected from a total of 4080 experimental trials experienced by each subject. On the first 2 trials, target location was random except that the target was never displayed twice in the same location. Target location on trial 3 was determined by what had happened on trials 1 and 2. If the movement on the first two trials had been horizontal, then the movement from trial 2 to trial 3 was vertical; if it was vertical, then the next was diagonal; and if it was diagonal, the next was horizontal.
Similarly, target location on trial 4 depended on target locations on trials 2 and 3, and target location on trial 5 depended on its locations on trials 3 and 4. The net effect is that target location on trials 3, 4, and 5 is entirely predictable from the underlying rules, but locations on trials 1 and 2 are random. Hence if the subjects were indeed learning something about the rules, this should manifest itself in a significantly greater reduction in RTs across blocks on trials 3, 4, and 5 than on trials 1 and 2, and this is exactly what Lewicki et al. found (in fact, they took as their dependent measure the number of correct responses with latencies less than 400 ms). Also, when the rules were changed towards the end of training, reaction times increased on trials 3, 4, and 5 but not on trials 1 and 2.
Lewicki et al.'s (1988) subjects could apparently report next to nothing about the rules determining target location. In fact, "none of the subjects mentioned anything even close to the manipulated pattern of exposures" (p.33), although 8 of the 9 subjects did seem to be aware that their performance had dropped when the rules were changed. Lewicki et al. concluded that the subjects had implicitly or unconsciously learned the rules determining target location on trials 3-5.
Perruchet, Gallego, and Savy (1990), however, disputed Lewicki et al.'s (1988) conclusions. Essentially, the criticism is that the set of possible events that could occur on trials 3-5 was more constrained than the set of events that could occur on trials 1 and 2. By analyzing the rules that determined the permissible transitions from one trial to another, Perruchet et al. were able to show in their replication of Lewicki et al.'s (1988) experiment that speed-up in RT on trials 3-5 relative to trials 1 and 2 was in fact mainly due to relative speed-up only on trials 4 and 5, and furthermore was almost entirely due to two factors. First, on trials 1 and 2, but not trials 3-5, there were some occasions when the stimulus moved back to a location from which it had just come; these backwards movements led to a slowing of RTs simply because they increased the unpredictability of the movement. Second, on trials 1 and 2 there were infrequent horizontal movements, which again led to a slowing of RTs. On trials 4 and 5 horizontal movements were not permissible. Rather than learn rules like "If the movement from trial 1 to trial 2 was horizontal, then the next movement will be vertical," subjects need only have learned that the po