Published in Behavioral and Brain Sciences
Volume 25, Number 3: 297-330 (June 2002)

© 2002 Cambridge University Press


Below is the unedited, uncorrected, unquotable final draft preprint of a BBS target article that was accepted for publication. Please visit the Cambridge Journals Online BBS Home Page to order the full published treatment.

The self-organizing consciousness

Pierre Perruchet

Annie Vinter

Word counts :

  • abstracts : 128/644
  • main text: 32318
  • references: 4441
  • entire text:: 40350

 

Running title: The self-organizing consciousness

 

Pierre Perruchet and Annie Vinter

Université de Bourgogne

LEAD/CNRS

6 Bd Gabriel

21000 Dijon, France

email: pierre.perruchet@u-bourgogne.fr or annie.vinter@u-bourgogne.fr

http://www.u-bourgogne.fr/LEAD/

 

Short Abstract

We propose that the isomorphism generally observed between the representations composing our momentary phenomenal experience and the structure of the world is the end-product of a progressive organization that emerges thanks to elementary associative processes that take our conscious representations themselves as the stuff on which they operate, a thesis that we summarize in the concept of Self-Organizing Consciousness (SOC). We show that the SOC framework accounts for the discovery of words and objects, and for word-object mapping. We then argue that isomorphic representations may underlie seemingly rule-governed behavior, as is observed in the areas of implicit learning of arbitrary structures, language, problem solving, and automatisms. This analysis provides support for the so-called "mentalistic" framework (e.g. Dulany, 1997), which avoids postulating the existence of unconscious representations and computations.

Long Abstract

The conventional cognitive framework rests on the existence of a powerful cognitive unconscious. Indeed, most psychological models heavily rely on the possibility of performing manipulations and transformations of unconscious representations using algorithms that are unable to operate while accommodating the functional constraints of conscious thought.

This paper explores the viability of an alternative framework which has its origins in the work of Dulany (1991, 1997). In this alternative, "mentalistic" framework, to borrow Dulany's terminology, the only representations people create and manipulate are those which form the momentary phenomenal experience. The main challenge is to explain why the phenomenal experience of adult people consists of perceptions and representations of the world which are generally isomorphic with the world structure, without needing recourse to a powerful cognitive unconscious. Our proposal is that this isomorphism is the end-product of a progressive organization that emerges thanks to elementary associative processes that take the conscious representations themselves as the stuff on which they operate. We summarize this thesis in the concept of Self-Organizing Consciousness (SOC).

We first provide evidence of self-organization in the context of an experimental example which concerns the progressive extraction of words from an artificial language presented as an unsegmented speech flow (e.g.: Saffran et al., 1997). Our approach is supported by a computer-implemented model, PARSER, the details of which are presented elsewhere (Perruchet & Vinter, 1998 b). A remarkable feature of PARSER is that the only representations generated by the model closely match the conscious representations people may have when performing the task. We then show that, provided that we accept a few simple assumptions about the properties of the world that are likely to capture subjects' attention, the rationale underlying PARSER may be extended to the discovery of the relevant units which form natural language and the physical world, and also accounts for word-object mapping.

We then apply the same principles to more complex aspects of the world structure. We show how the SOC framework can account for some forms of behavior seemingly based on the unconscious knowledge of the syntactical structure of the surrounding environment. This demonstration, which was originally stimulated by the literature on implicit learning of arbitrary structures, finds some echoes in the literature on language processing (notably in the so-called distributional approaches, e.g. Redington, Chater, & Finch, 1998), problem solving (for instance in the computation/ representation trade-off proposed by Clark & Thornton, 1997), incubation (e.g. Mandler, 1994), decision making, and automatism (notably in the instance-based models, as proposed by Logan, e.g.:1988, and Tzelgov, e.g.: 1997). We also show how the SOC framework, in conjunction with simple additional hypotheses, readily accounts for transfer between event patterns across sensory content, as shown for instance in the Marcus et al. (1999) study.

Finally, we argue against the empirical reliability of a some additional phenomena that seemingly require the action of the cognitive unconscious. In this context, we critically examine the studies reporting that implicit memory and implicit learning can occur without any attentional processing of the material during the familiarization phase (e.g. Eich, 1984; Cohen, Ivry, & Keele, 1990), and the data allegedly demonstrating the possibility of unconscious processing of semantic information (e.g. Dehaene et al., 1998). Issues related to the apparent dissociation between performance and consciousness in neuropsychological syndromes, such as blindsight, are also briefly discussed.

Our analysis leads to the surprising conclusion that there is no need for the concepts of unconscious representations and knowledge and, a fortiori, the notion of unconscious inferences: Conscious mental life, when considered within a dynamic perspective, could be sufficient to account for adapted behavior. This alternative framework is more parsimonious than the prevalent conceptions in cognitive and developmental sciences because it manages to account for very sophisticated behavior while respecting the important constraints inherent to the conscious/attentional system, such as limited capacity, seriality of processing, and quick forgetting (and even takes advantage of these constraints).

KEYWORDS: Associative learning, automatism, consciousness, development, implicit learning, incubation, language, mental representation, perception, phenomenal experience.

 

Contents

1- Questioning the Cognitive Unconscious Postulate

1.1. The Computational View of Mind

1.2. The Cognitive Unconscious

1.3. An Alternative Framework

1.4. The objectives of the paper

2- The Notion of Self-Organizing Consciousness (SOC)

2.1. Complex Conscious Representations Account for Seemingly Rule-Governed Behavior

2.2. Conscious Representations Self-Organize.

2.3. Overview of the Sections 3-7.

3. The Case for Word Extraction

3.1. The Word-Extraction Issue

3.2. PARSER: The Principles of the Model

3.3. PARSER and Consciousness

3.4. PARSER and Alternative Computational Models

4- Learning the World Units

4.1. Word Extraction in Natural Language

4.2. The Formation of Objects and Word-Object Mapping

5. From Lexicon to Syntax

5.1. Studies Involving Artificial Grammar

5.2. Learning Syntax in Natural Language

5.3. Converging Lines of Evidence from Psycholinguistic Research

5.4. Unconscious Rule Processing Outside of the Language Area

6. Abstracting Away From the Sensory Content

6.1 Experimental Evidence for Abstraction

6.2 The Outline of a Reappraisal

6.3 Perceptual Primitives Can Be Abstract and Relational

6.4 Is Our Account of Transfer More Parsimonious?

6.5 Analyzing Transfer Limitations and Failure

7. Problem Solving, Decision Making, and Automaticity

7.1 Problem Solving and Incubation

7.2 Decision Making

7.3 Automaticity

8. Other Purported Evidence for the Cognitive Unconscious

8.1 Implicit memory and learning without attentional encoding

8.2 What About "The Unconscious Processing of Semantic Information"

8.3. Blindsight and Other Neuropsychological Disorders

9. Conclusion

9.1. Summary

9.2. Looking Towards the Future

1- Questioning the Cognitive Unconscious Postulate

In this introductory section, we point out that, in contradiction to the widespread idea that the issue of consciousness is computationally irrelevant (1.1), the prevalent computational view of mind is grounded on the postulate of an omnipotent cognitive unconscious, which has been tacitly present from the very beginnings of the information processing tradition (1.2). We then outline an alternative perspective, in which this postulate becomes useless (1.3). This section ends with the presentation of our objectives and an overview of the paper (1.4).

1.1. The Computational View of Mind

The objective of psychology, in the prevalent computational view of mind, is to study how human subjects process information. It should be noted that this objective includes no mention of the status, conscious versus unconscious, of the processed information. Addressing this issue is generally conceived of as unnecessary. Indeed, the nature of the representations and computations included in these models make that they could, in principle, be either conscious or unconscious when implemented in a human brain. This contention holds irrespective of whether this processing is construed in terms of rule abstraction and application as in the mainstream tradition, or in terms of multivariate statistics computation as in the connectionist approach. The fact of being conscious or unconscious is, for a mental construct, a property that does not affect the way this construct intervenes within a processing sequence.

The function assigned to consciousness, when it is considered, generally consists in making certain parts of cognitive functioning accessible. To quote Baars (1998), "Many proposals about brain organization and consciousness reflect a single underlying theme that can be labeled the 'theater metaphor' . In these views, the overall function of consciousness is to provide very widespread access to unconscious brain regions." And elsewhere in the same paper: "A classical metaphor for consciousness has been a 'bright spot' cast by a spotlight on the stage of a dark theater... Nearly all current hypotheses about consciousness and selective attention can be viewed as variants of this fundamental idea". In keeping with this metaphor, the states and operations involved in information processing models occur in the same way whether they are concurrently accessed or not. This type of speculations is often summarized in the claim that consciousness is "computationally irrelevant".

1.2. The Cognitive Unconscious

We agree with the claim that qualifying as conscious or unconscious a representation or an operation involved in a computational model has no effect on the way this model works. But the computational irrelevance of consciousness can no longer be maintained if, instead of considering piecemeal aspects of the models, we consider their overall conditions of functioning. It appears then that most information processing models necessarily rely on a cognitive unconscious, for at least two reasons. First, the algorithms forming the models rarely match the phenomenal experience of the subjects running the tasks that, presumably, trigger these algorithms. Second, and more importantly, these algorithms are generally unable to work while accommodating the functional constraints of conscious thought, such as limited capacity, seriality, relative slowness of processing, and quick memory decay. As Lewicki, Hill, and Czyzewska (1992) wrote to emphasize the power of the cognitive unconscious: "Our conscious thinking needs to rely on notes (with flowcharts or lists of if-then statements) or computers to do the same job that our nonconscious operating processing algorithms can do instantly and without external help" (Lewicki et al., p.798).

Chomskyan psycholinguistics provide a striking illustration of these points. Whatever the fuzziness of the operational measures of consciousness, it is not tenable that the conscious mind is endowed with a Universal Grammar, makes assumptions about the properties of the ambient language, and tests hypotheses in order to set parameters at their appropriate values. To a lesser extent, similar remarks can be addressed at most information processing models. For instance, it is quite common to assume the existence of a syntactic processing device. Even the discovery of words in the continuous speech stream has been conceived of as the product of a mathematical algorithm of optimization, performed thanks to a statistical inference method (e.g. Brent, 1996; see below Section 3.4). Some untaught rules of spelling are also assumed to be unconsciously abstracted (e.g. Bryant, Nunes, & Snaith, 2000).

Of course, the premise of a cognitive unconscious is not limited to the studies on language. Let us consider the transcoding of numerals. One of the most influential transcoding models (McCloskey, 1992) assumes that all numerical inputs are translated into an amodal and abstract representation of quantity, associating every number to a power of ten (e.g. 4030 should be coded (4)103, (3)101). Motor activities are also of concern. For instance, how do fielders modulate their speed up to catch a ball before it reaches the ground? According to McLeod and Dienes (1993), they run so that d2(tana)/dt2 = 0, where a is the angle of elevation of gaze from fielder to ball. The authors wrote: "Children probably discover this somewhat obscure strategy... by extrapolating from their experience of watching balls thrown towards them... This strategy is obviously not available consciously. That its effectiveness is discovered demonstrates the power of the brain's unconscious problem-solving abilities" (McLeod & Dienes, 1993, p.23).

These few examples make it clear that, by construing information processing as the main target of psychological science, regardless of the conscious status of the processed information, the prevalent view does not remain neutral with regard to the consciousness issue. This view rests in fact on the existence of a cognitive unconscious (Shevrin & Dickman, 1980; Kihlstrom, 1987). By this expression, we mean that the prevalent view takes for granted the existence of unconscious representations, together with the possibility of performing unconscious manipulations and transformations on these representations. By the same token, the concept of cognitive unconscious includes the assumption that the notions of unconscious knowledge and memory are meaningful, and most authors would probably add to this list the notions of unconscious rule abstraction, unconscious analysis, unconscious reasoning, unconscious inference, and so on.

1.3. An Alternative Framework

1.3.1. The "mentalistic" framework. This paper explores the possibility of an alternative framework, in which the cognitive unconscious has no place. Mental life is posited as co-extensive with consciousness. This idea, in fact, is not new. It has even occupied a respected position in the philosophical tradition since Descartes. More recently, this framework has been cogently articulated by Dulany (1991, 1997), who called it, for want of a better term, the "mentalistic" framework.

The mentalistic view does not challenge overall the notions of representations, and the idea that rule abstraction, analysis, reasoning, and inferences can be performed on these representations. Conscious experience of each of us provides direct evidence for such operations. This evidence supports the conservative conclusion that we abstract rules and makes various computations and inferences when we have direct experience of doing so. These aspects of mental life, that Dulany (1997) calls the "deliberative episodes", are not of focal concern in this paper, although we do not intend to play down their importance in any way.

A departure from the standard cognitive view arises when there is no conscious evidence of performing the cognitive operations that a psychological model stipulates. As pointed out above, the lack of concurrent subjective experience is not thought of as a problem in the information processing tradition, because consciousness is thought of as providing only an optional access to the product of unconscious computations. By contrast, the mentalistic view rejects the notions of unconscious rule abstraction, computation, analysis, reasoning, and inference. Because unconscious representations have no other function than to enter into these activities, eliminating the possibility of these activities actually makes the overall notion of unconscious representation objectless1. Accordingly, the most salient feature of the mentalistic framework is the denial of the very notion of unconscious representations. The only representations that exist, in this view, are those that are embedded in the momentary phenomenal experience.

Representations, of course, are generated by neural processes, of which we are unaware. Thus, in the mentalistic framework, mental life comprises only two categories of events: the conscious representations, and the unconscious processes generating those representations. The two are linked like the head and the tail of a coin. To quote an earlier paper of ours: "Processes and mechanisms responsible for the elaboration of knowledge are intrinsically unconscious, and the resulting mental representations and knowledge are intrinsically conscious. No other components are needed." (Perruchet, Vinter, & Gallego, 1997, p.44; see also O'Brien and Opie, 1999a, for a link between the notions of representation and consciousness).

1.3.2. About terminology. Common sense knowledge of notions such as process, representation and computation, even if difficult to constrain within an exhaustive definition (as it is the case for many other concepts), appears sufficient at this point, because the originality of the mentalistic perspective is anything but a matter of subtle terminological nuances. However, it may be useful to exclude one particular understanding of the notion of representation and computation.

It has become increasingly common to define any pattern of neural activity as a representation, especially in the connectionist framework (e.g. Elman et al., 1996, p.364). Given this approach, any biological consequence of the presentation of a stimulus is a representation of this stimulus. For example, the projection of the world on the retina of the eye provides a representation of the world. A logical consequence of this definition is that most representations are fully unconscious. Of course, such a definition has its own internal consistency: From the observer’s point of view, retinal images are indeed world representations. However, the meaning of the concept is different in the mentalistic framework. Throughout the present paper, the word "representation" designates a mental event that assumes the function of some meaningful component of the represented world (e.g.: a person, an object, a movement, a scene) within the representing world. At least two functions can be envisaged (Dulany, 1997). A representation may evoke other representations (the representation of a pencil may evoke the representation of a pencil box, an exercise book, and so on). It may also enter as an argument into deliberative mental episodes (the representation of a pencil may be involved in reasoning, inference, action planning, and other mental activities). In this terminology, the retinal projection of the pencil does not represent the pencil, because the mosaic of cells of the retinal surface activated by the light reflected by the pencil does not fulfill any of these functions.

Likewise, the notion of computation does not extend to any neural activity, but instead designates the mental operations that take representations as arguments. In the following, the term "computation" will be taken to be synonymous with expressions such as "computation on mental representations".

1.3.3. An illustration. The concrete implications of endorsing a mentalistic view will now be illustrated using a very simple situation. Let us assume that a stimulus S1, initially neutral with regard to its behavioral consequences, comes to elicit an avoidance reaction after its repeated pairing with an aversive stimulus S2. Everyone will have recognized here the schema of a classically conditioned reaction. A first interpretation may be that people have acquired some knowledge about the S1-S2 relationships, then draw the inference "If S1 then S2", thus triggering an avoidance reaction when S1 is displayed. This is a version of the expectancy theory of conditioning, first proposed by Tolman (e.g. 1932), and nowadays largely accepted. This view is compatible with a mentalistic standpoint , as long as people have explicit knowledge about the S1-S2 relationships, and have explicitly drawn the inference that S2 is likely to occur when S1 occurs.

Let us now assume that people no longer remember the earlier S1-S2 pairings during the test, thus making impossible the explicit inference that S1 will be followed by S2. Experimental data suggest that a conditioned reaction can still occur in these conditions (e.g. Gruber, Reed & Block, 1968). In the standard cognitive view, the loss of explicit memory does not matter. People are now assumed to rely on their implicit memory of the S1-S2 pairings, and make the unconscious inference "If S1 then S2". Such an adjustment causes no difficulty, given that the presence or the absence of consciousness is held to be computationally irrelevant.

This interpretation obviously violates the premise of the mentalistic framework. However, is this interpretation mandatory? It is worth remembering here that an alternative interpretation of conditioned performance was proposed long ago. During the training phase, some subjects’ perceptual experiences comprise (at least) some features of S1 endowed with the negative valence triggered by S2. Elementary associative mechanisms are then sufficient to ensure that a negative valence becomes a new intrinsic property of S1. The conditioned avoidance reaction, in this interpretation, is directly elicited by S1. The crucial point is that the formation of knowledge about the stimulus relationships is no longer involved: the link between S1 and S2 has no need to be stored in memory and remembered, either explicitly or implicitly, nor exploited through inferential reasoning. What changes with training is the intrinsic representation of S1, which becomes negatively valenced.

There is overwhelming evidence that both interpretations are needed to account for all the reported conditioning data. Some paradigms certainly trigger one process more than the other. For instance, Garcia, Rusiniak, and Brett (1977) tease apart the behavior of rats preparing to cope with a painful reinforcer signaled by an auditory stimulus (a situation mainly involving the knowledge of the S1-S2 relationships), and the behavior of rats that acquire an aversion to a flavor previously associated with sickness (a situation mainly involving a change in the intrinsic representation of S1). However, most paradigms are presumably able to generate both forms of responding. In the discussion published in the pages following Garcia et al.'s (1977) contribution, Seligman distinguishes the learning of an "if-then" relationship from the acquisition of a hedonic shift. These two processes, even though very different, are both generated, according to Seligman, by Pavlovian situations. The responses elicited by these mechanisms differ from each other on a variety of experimental variables in a consistent way (e.g. in their sensitivity to the precise timing of events, in their resistance to extinction, and so on), thus strengthening the idea that conditioned behavior has a dual nature (e.g. Konorsky, 1967; Holland, 1980; for an overview, see Perruchet, 1984).

The dual nature of conditioned responses makes it possible to encompass all the available data within a mentalistic framework. When the knowledge of the stimulus relationships is consciously represented, conditioned responses may be of one or the other form. When explicit knowledge is no longer available, however, there is no need to invoke an unconscious analog to our conscious mode of reasoning. Responses may be due to a change in the intrinsic representation of S1. In this case, there are only successive conscious experiences, with S1, initially neutral, acquiring the negative valence initially induced by S2, through the action of unconscious associative processes. Most of the conditioning literature is consistent with this interpretation. It appears, in particular, that those conditioned responses that are endowed with characteristics typical of the responses due to the formation of knowledge about the stimulus relationships, are closely linked to the conscious knowledge of these relationships (for detailed arguments, see Perruchet, 1984) 2.

In this example, it is easy to understand how the very same observed behavior --a conditioned response without concurrent awareness of stimulus contingencies-- can be explained either in a standard cognitive view relying on the cognitive unconscious, or alternatively in a mentalistic framework which eliminates this postulate. The subsequent sections are devoted to the objective of assessing whether very complex adaptive behavior, commonly taken as indicative of unconscious rule abstraction or other unconscious computations on cognitive representations, can also be accounted for in another way, without introducing much more than the principles set out above for the conditioning data.

1.3.4. A hopeless project? At first glance, the weight of the empirical evidence runs against the view presented above as the behavior under examination becomes more and more complex. The main supporting argument is that most current psychological models accounting for complex behavioral phenomena rely, with indisputable success, on the existence of unconscious representations and computations.

The fact that the models based on a cognitive unconscious work might seem to negate the potential interest of an alternative model. However, the argument is not as straightforward as it might seem. Indeed, computational algorithms are so powerful that they can simulate virtually any phenomena, without proving anything about the computational nature of the actual mechanisms underlying these phenomena. Computational algorithms generate a perfect description of the rotation of the planets around the sun, although the solar system does not compute in any way. In order to be considered as providing a model of the mechanisms actually involved, and not only a simulation of the end-product of mechanisms acting at a different level, computational models have to perform better than alternative, non computational explanations. The point is that the comparison needed to reach such a conclusion has never been conducted. As asserted above, the possibility of a powerful cognitive unconscious has been embedded within the principles of the information processing tradition from its very beginning, without being clearly articulated and hence without being directly challenged. Given these conditions, the current focus on the notion of cognitive unconscious appears to be simply the consequence of making earlier tacit postulates explicit. To summarize, although the pervasiveness of the concept of a cognitive unconscious and its overall success can hardy be disputed, the demonstrative power of these arguments is undermined by a hidden circularity.

1.4. The Objectives of the Paper

It is worth pointing out from the outset that our project does not consist in showing that the prevalent computational framework is unwarranted, for any logical or empirical reasons. This objective would entail demonstrating that consciousness is necessary for any form of representation and computation. But there are major obstacles facing any such a demonstration. There is no theoretical reason for claiming that representations and computation need to be conscious. Moreover, it is difficult to conceive of any form of empirical demonstration. Indeed, addressing the question of the necessity of consciousness for any mental construct requires us to demonstrate that unconscious representations and computations do not exist, and demonstrating non-existence is beyond the reach of any empirical investigation. Our aim is to assess the viability of a mentalistic view, instead of directly questioning the prevalent framework. This leads us to address a different issue, presented below.

1.4.1. Necessity versus sufficiency. Let us start from a twofold consideration. On the one hand, we know that at least some mental events are conscious, because we have direct and personal evidence of their existence. Even those who argue that consciousness is epiphenomenal can not reject this assessment (although a few philosophers have questioned the very existence of consciousness; see Rey, 1991, and the refutation of Rey’s position by Velmans, 1991). On the other hand, the existence of an unconscious mental life is a postulate or a presupposition. This presupposition is so deeply ingrained in our modern culture that it is taken for granted by most people. But the fact remains that we have, by definition, no direct proof of an unconscious counterpart to our conscious mental life. It emerges from these two premises that the mentalistic framework is more parsimonious than the prevalent view, because it exclusively relies on the representations and the mental operations we are aware of, whereas the prevalent view postulates, in addition, a parallel cognitive apparatus 3.

In this context, questions about consciousness, in striking contrast with the overwhelming practice, may be framed in terms of sufficiency, rather than necessity. As a consequence, the question we address is: "is it sufficient to rely on the transient and labile representations that form one's momentary phenomenal experiences, when the conventional framework commonly assumes that a large number of representations are stored in mind and manipulated in various unconscious operations?".

The example in Section 1.3.3 illustrates the point. We do not argue that subjects are unable to build and use unconscious knowledge about the S1-S2 contingencies on the grounds that consciousness should be necessary for these operations. What we do show is that this hypothesis is only one among several possible interpretations of the fact that conditioned reactions persist beyond the forgetting of the S1-S2 contingencies. Positing that the affective reaction elicited by the occurrence of S1 has evolved during training due to unconscious associative processes is sufficient to account for the data.

1.4.2. A major objective and some additional issues. The major part of this paper, namely the sections 2 to 7, will be devoted to the presentation of a new model, called the SOC Model, with SOC standing for Self-Organizing Consciousness. This expression is a short-cut, and as such, it is potentially misleading. It might suggest that we intend to address the hard issues commonly linked to the notion of consciousness, such as the problem of knowing how neural events generate conscious mental states. In fact, this paper focuses more modestly on the contents of consciousness, such as they can be described at an informational level 4. We propose that conscious contents are endowed with self-organizing properties, which make it possible to account for a wide range of adaptive phenomena that are commonly considered to be mediated by the cognitive unconscious. Our objective is to suggest that most of the phenomena of interest for cognitive scientists can be accounted for by this model, which avoids any recourse to the concepts of unconscious representations and computation.

The last section (Section 8) will deal with somewhat different issues. For quite obvious reasons, the SOC model is not devised to account for data that we consider to lack a justifiable empirical basis. However, such data may constitute an a priori reason for some readers to reject our approach. Section 8 addresses such phenomena, and notably the data allegedly demonstrating the possibility of unconscious processing of semantic information. We also briefly discuss, in this section, the apparent dissociation between performance and consciousness observed in a few neuropsychological syndromes, such as blindsight.

2- The Notion of Self-Organizing Consciousness (SOC)

In the first section, we presented an outline of how a mentalistic framework could account for a response apparently based on unconscious memory and inference, taking as example a specific finding from the conditioning area. We now have to address a far more difficult challenge, namely to account for the most complex aspects of behavior on which contemporary cognitive science focuses. Our approach comprises two steps. The first step consists in showing that a large number of phenomena that seemingly require unconscious rule abstraction processes, inferences, analyses, and other complex implicit operations, can be accounted for by the formation of conscious representations that are isomorphic to the world structure. The second step concerns the formation of these representations, and more precisely the causes of their isomorphism to the world structure. We suggest that this isomorphism is the end-product of a self-organizing process. The general ideas underpinning these two steps will be briefly outlined in turn in this section, then developed at length in the following sections.

2.1. Complex Conscious Representations Account for Seemingly Rule-Governed Behavior

2.1.1 Trading representation against computation. Complex and integrative representations, we argue, make rule knowledge objectless. Here, our thesis relies heavily on the idea that neural systems "trade representation against computation", to borrow the expression used by Clark and Thornton (1997). The above discussion concerning certain findings in pavlovian conditioning (Section 1.4.) provides a first insight about the meaning of this claim. As shown above, the change in the intrinsic representation of S1, and notably the fact that this representation, initially neutral, becomes affectively valenced during the training phase, may replace, at a functional level, the formation of the knowledge of the S1-S2 contingency and the logical inference "if S1 then S2".

Although often indirect, supporting evidence for a representation/computation trade-off can be found in various areas of psychology. Examples include the instance--based model of categorization (e.g. Brooks, 1978), the so-called episodic (e.g. Neal & Hesketh, 1997) or fragmentary (e.g. Perruchet, 1994) accounts of implicit learning, the notion of mental models in problem solving (e.g. Johnson-Laird, 1983), and the memory-based theory of automatism (Logan, 1988). Although they evolved in at least partial independence, these avenues of research share the same general distrust with regard to the notions of abstract computation and rule-based processing, and stress the adaptive advantage of building complex representations. However, they subscribe to metatheoretical assumptions that are somewhat different to those of the mentalistic framework, notably with regard to the way they handle the notions of representation and consciousness. In keeping with the mentalistic framework, we assume that the representations involved in each case are conscious.

This position, we argue, increases the a priori plausibility of the representation-based views, and expands their explanatory power, for at least two reasons. Firstly, if the momentary phenomenal experience is the only mental event, the whole power of the neural system may be recruited for its construction. Secondly, the construction of a representation can profit from the presence of the momentary sensory input, instead of relying exclusively on the internal, memory capacity of the brain. The growing literature on change blindness and other related phenomena (e.g. see review in Noë, Pessoa & Thompson, 2000) leads us to emphasize the importance of this factor, on the grounds that perceptual experience may be more dependent on the real word than previously thought. If, for instance, a visual scene is changed in such a way that the perception of a movement is prevented (e.g. changes occur during an eye blink, or an ocular saccade, or if a blank mask is inserted between the two displays), changes are surprisingly difficult to notice. Such phenomena indicate that the world could play the role of an "outside memory" (O’Regan, 1992) in the formation of the perceptual experience, hence dispensing the brain from the need to retain a detailed representation of the world. These factors make the task of constructing the representations composing the current phenomenal experience considerably easier than the task of forming the permanent and ready-to-use internal model of the world required in the prevalent view of mind.

2.1.2 The isomorphism between the actual and the represented world. In order to solve problems that, at first glance, require rule abstraction and complex computation, a representation has to be isomorphic to the world structure. And indeed, by and large, phenomenal experience provides an internal representation of the world that is isomorphic to its structure. We generally perceive continuous speech as a meaningful sequence of words, the visual environment as composed of persons and objects, and so on. In some sense, the adapted nature of conscious representations is not a speculative and optional proposal, but derives from the most fundamental principle of evolutionary biology: as pointed out by Velmans, "if the experienced world did not correspond reasonably well to the actual one, our survival would be threatened" (Velmans, 1998, p. 51). If one adheres to the views outlined above, the structural isomorphism between our conscious representations and the world is the major phenomenon we have to explain. However, some preliminary comments are warranted to make it clear that this isomorphism is not perfect, and does not need to be so.

First, the representations we create are limited by sensory constraints. For instance, we do not have any perception about the sounds outside of the 20- 20000 Hz range, and our eyes are able to detect only a very small bandwidth of the electromagnetic spectrum from around 370 nm to around 730 nm. Likewise, phenomenal experience does not provide us with any direct representation of the structure of the physical world at other scales, such as atomic microstructure or galactic organization.

Second, even the parts of the world available to our sensory equipment may be represented only partially, or even erroneously. The fact that our representation of the surrounding world does not include the whole scene currently available to our sensory equipment, but instead is limited to a narrow focus, has been recently documented in the visual domain by the studies on change blindness alluded to above. Examples of misrepresentation are also plentiful. The sun rays at the day’s end are seemingly divergent in all directions whereas they are in fact (nearly) parallel, and star constellation at night have no physical reality due to the varying distances of their elements from the earth. In addition, there are innumerable cases in which our representations are biased by our interests, motivations, and their relevance for survival. The phenomenal experience of the world may even be misadaptive, as in the case of perceptual illusions in which perceptual processes which are generally well-suited in natural situations cease doing their job reliably when faced with highly specific patterns. Such phenomena illustrate that percepts and representations are isomorphic to the world structure only in a limited way. For the sake of brevity, we continue to refer to the isomorphism between subjects' representations and world structure throughout this paper, even though the very phenomenon we are attempting to account for can not be described as a simple term-to-term matching.

2.2. Conscious Representations Self-Organize.

The main question we have to address at this point is: How to account for the fact that the content of the phenomenal experience is, even in a limited sense, isomorphic to the world 5, if this content is not the product of a powerful unconscious processor manipulating unconscious representations? Our answer consists in considering consciousness within a dynamic perspective, that is to say a perspective centered on learning principles. The key point is that each conscious experience triggers associative learning mechanisms that take the components of this experience as the "stuff" on which they operate. Thanks to this phenomenon, consciousness does not only serve an immediate adaptive function, but also participates in its own development, each conscious experience allowing us to improve the content of subsequent conscious experiences. We summarize this thesis in the proposal that phenomenal experience is self-organizing.

Psychological textbooks routinely point out that there are multiple forms of learning. But they also mention that associative learning is the most fundamental and primitive, maybe the form to which all other forms are reducible in fine. Because our framework is primarily motivated by the search for maximal parsimony, we rely exclusively on conventional associative mechanisms in the following. Relying on associative principles --reminiscent of the old-fashioned behaviorist psychology for many-- within a mentalistic framework centered on the concept of consciousness may appear anachronistic. However, the paradox is one of appearance only. Although behaviorism was grounded on associative principles, the reverse is not true: Associative principles can serve equally well in other frameworks. The mentalistic view provides a highly relevant integrative framework, for at least two reasons that will be considered in turn. First, there is a natural relation between associative learning and consciousness, mediated by the concept of attention (2.2.1.). Second, the assumption that learning associates conscious contents implies that associations involve complex representations, a property that considerably improves the power of an association-based view (2.2.2).

2.2.1. Associative learning and consciousness. The issues of learning and consciousness are generally considered separately. As a case in point, "learning" is nearly absent from the indexes of the numerous recently published volumes on consciousness. However, reasons for considering the two issues jointly arise from the close link between learning and attention, on the one hand, and attention and consciousness on the other.

Attentional processes are sufficient for associative memory and learning to occur. This means that no superimposed operations - such as some forms of intentional orientation towards learning - are required. This phenomenon is known from the conditioning and skill learning experiments run during the behaviorist era. It has been subsequently "rediscovered" in the context of the level-of-processing framework in the seventies (e.g. Craik & Lockhart, 1972), and more recently in the context of the studies on implicit learning (e.g. Whittlesea & Dorken, 1993). The resulting picture is that many authors, using different terminologies, have proposed a view compatible with the claim that associative learning is an automatic process that associates all the components that are present in the attentional focus at a given point (French & Miner, 1994; Jimenez & Mandez, 1999; Logan & Etherton, 1994; Stadler, 1995; Treisman & Gelade, 1990; Wagner, 1981). Associative learning and memory are nothing other than the by-products of attentional processing (see Section 8.1. for a reappraisal of some contradictory evidence).

Now, there is a close relation between attention and consciousness. It must be acknowledged that the psychological literature offers a somewhat fuzzy picture of this relation. Across and even within domain and epoch, one term is often preferred to the other. But this preference lacks any clear justification. For instance, the methods devised to investigate perception without attention differ from the methods devised to investigate perception without consciousness. In the former, the stimuli are supraliminal but maintained outside the current focus of attention as a result of the task demands, whereas in the latter, attention is directed toward the target but stimulus quality is degraded. However, these terminological differences are linked more to historical contingencies than to theoretically rooted reasons. At the empirical level, it turns out that both kinds of manipulations lead to analogous findings (Merikle & Joordens, 1997). A more general argument for dissociating the two concepts is that attention is selective whereas "consciousness incorporates both a central focus, and a rich polymodal periphery", to borrow the expression used by O'Brien and Opie (1999b, p.191). This argument amounts to defining attention as the conceptually driven attentional mechanisms that are directed towards a specific source of information in response to task instructions. This view defines what Schmidt and Dark (1998) call the intention-equals-attention view, according to which participants' intention to attend exclusively to a target is sufficient to restrict attentional processing to this target. All proposals for a dissociation (e.g. Baars, 1997; Velmans, 1999) amount to such a confusion. However, the fact that the instructions ask participants to pay attention to a target does not prevent them from making quick attentional shifts toward non-attended information. Therefore, unless one endorses a highly restrictive definition of attended information as the informational content on which subjects are asked to focus, we see no reason to dissociate between attention and consciousness on the basis of their relative selectivity.

Accordingly, the fact that attention and consciousness refer to the same phenomenon does not mean that they are one and the same concept. Attention is generally located on the side of the processes, and consciousness on the side of the mental states resulting from these processes. As Pribam (1980) says: "'Consciousness' refers to states which have contents; 'attention' refers to processes which organize these contents into one or another conscious state". What constitutes the content of the phenomenal experience at a given moment is what is attended to at this moment, and vice versa (e.g. Cowan, 1995; Mandler, 1975; Miller, 1962; Posner & Boies, 1971).

2.2.2. Associative learning and complex representations. At first glance, associative mechanisms appear to be underpowered for the function that we assign to them. Essential to our claim is the idea that the oft-mentioned limitations of associative learning principles are overcome whenever complex representations are conceived of as the stuff on which associative processes operate. The fact that complex representations can enter into associative links, and the high explanatory power of this mode of functioning, has been pointed out in the modern literature on conditioning and learning. The following quote, borrowed from one of the leading theoreticians of animal learning, illustrates the point:

"Properly understood... associative learning theory is remarkably powerful. Of course, such a theory must... reject the restrictive assumption of S-R theory, which allowed associations to be formed only between a stimulus and a response, and should assume that a representation of any event, be it an external stimulus or an action, can be associated with the representation of any other event, whether another external stimulus, a reinforcer, the affective reaction elicited by the reinforcer, or an animal's own actions. Equally important, however, it must allow that the representation of external events that can enter into such associations may be quite complex. They need not be confined to a faithful copy of an elementary sensation such as a patch of red light; they may be representations of combinations or configurations of such elementary stimuli; they may even include information about certain relationships between elementary stimuli. But once we have allowed associative learning theory these new assumptions, we have a powerful account, capable of explaining quite complex behavior -including behavior that many have been happy to label cognitive and to attribute to processes assumed to lie beyond the scope of any theory of learning" (Mackintosh, 1997, 883-884; italics are ours).

However, by and large, the fact that associative principles apply to complex representations has not been exploited, and hence the power of associative learning theory has not been fully appreciated. The symbolic framework assigns a minimal role, if any, to associative processes, and most of the connectionist models, although rooted in associative principles, only considers associations between the input units of the network, which code the material piecemeal (note that the so-called constructive methods overcome this limitation, e.g. Fahlman & Lebiere, 1990).

To summarize, we propose that basic principles of associative learning and memory allow conscious representations to reach their high degree of organization and adaptiveness, provided that we consider that associations occur between the rich content of conscious experiences. The notion of self-organization excludes any organizing cognitive systems or principles that would be superimposed on phenomenal consciousness6. The phenomenal consciousness itself ensures its own improvement in representational power, thanks to the propensity of conscious representations to evolve in accordance with basic associative learning principles. Because consciousness is an unavoidable companion of our daily life, this means that every life episode has a learning function. There are no separate phases for learning and for performance: Each phenomenal experience contributes to improving people's ability to perceive and represent the genuine structure of the world in subsequent interactions.

2.3. Overview of the Sections 3 to 7

Thus two main ideas are embedded in the notion of Self-Organizing Consciousness (SOC). The first is that conscious representations that are isomorphic to the world structure, due to their ability to integrate various elements in a cohesive picture, can account for adaptive behaviors commonly attributed to rule-governed thought. The second is that ubiquitous principles of associative memory and learning are sufficient to account for the formation of these representations. The subsequent sections deal with these two aspects, although, in order to begin the demonstration at its logical starting point, we begin with the second one.

We start by demonstrating the self-organizing nature of phenomenal experience in the language domain. This domain is especially relevant to our position, because it is the domain in which the notion of the cognitive unconscious may be the most deeply rooted as a result of the Chomskyan tradition. In the next section (Section 3), we show that the ability to extract the words forming an artificial language presented as an unsegmented speech flow may be accounted for as an autonomous change in the phenomenal experience of the materials, due to the action of elementary associative mechanisms. This interpretation has been supported by a computational model, the details of which are presented elsewhere (Perruchet & Vinter, 1998b). Section 4 proposes a generalization of this model to word extraction in natural language, to the formation of objects, and to the word-object mapping issue.

Sections 5 and 6 introduce a generalization of the SOC framework to other dimensions. While sections 3 and 4 concern the formation of conscious representations of elements that are generally construed as the actual world units (words and objects), Section 5 applies the same principles for more complex aspects of the world structure. We show how the formation of complex representations that are isomorphic with the world structure can account for some form of behavior seemingly based on the unconscious knowledge of the syntactical structure of the surrounding environment. Section 6 deals with the fact that human behavior may be sensitive to structural aspects of the world that transcend its surface features. This problem, reminiscent of the criticisms Chomsky levelled at the once prevalent current of behaviorism, is obviously crucial for the validity of our view. We shows how the SOC framework readily accounts for transfer between event patterns cutting across their sensory content. Section 7 shows how the SOC framework may find some echoes in the literature on problem solving, incubation, decision making, automaticity, and implicit memory.

To sum-up, these sections provide, we hope, a model of how organisms deprived of a powerful cognitive unconscious, can behave adaptively when faced with complex world-size situations thanks to the formation of structurally relevant conscious representations of these situations.

3. The Case of Word Extraction

3.1. The Word-Extraction Issue

Language acquisition initially proceeds from auditory input, and linguistic utterances usually consist of sentences linking several words without clear physical boundaries. The question thus arises: How do infants become able to segment a continuous speech stream into words? Recent psycholinguistic research has identified a number of potentially relevant factors. Analyses of the statistical structure of different languages have shown that a number of features are correlated with the presence of word boundaries, and could therefore be used as cues for segmenting the speech signal into words (see review in Jusczyk, 1997; McDonald, 1997). However, the question remains of how infants abstract the statistical regularities that they seemingly exploit. It cannot be claimed that these regularities are learned inductively from word exposure without falling into circular reasoning, with word knowledge being simultaneously the prerequisite and the consequence of knowledge of statistical regularities. In addition to the difficulties inherent in their exploitation, prosodic and phonological cues in any case provide only probabilistic information.

The importance of prosodic and phonological cues in word discovery is further questioned by recent experimental studies showing that these cues are not necessary. For instance, Saffran, Newport, & Aslin (1996b) used an artificial language consisting of six trisyllabic words, such as babupu and bupada. The words were read by a speech synthesizer in random order in immediate succession, without pauses or any other prosodic cues. Thus the participants heard a continuous series of syllables without any word boundary cues. In the following phase, they were asked to perform a forced choice test in which they had to indicate which of two items sounded more like a word from the artificial language. One of the items was a word from the artificial language, whereas the other was a new combination of three syllables belonging to the language. Participants performed significantly better than would be expected by chance.

The participants in the study conducted by Saffran et al. (1996b) were told before the training session began that the artificial language contained words, and they were asked to figure out where the words started and ended. The processes used in these conditions may be different from those involved in natural language acquisition. Two subsequent papers from the same laboratory (Saffran, Aslin, & Newport, 1996a; Saffran, Newport, Aslin, Tunick, & Barrueco, 1997) partially respond to this objection. In Saffran et al. (1997), the participants' primary task was to create an illustration using a coloring program. They were not told that the continuous series of syllables, which were presented as a sound background, consisted of a language, nor that they would be tested later in any way. In the subsequent forced choice test, participants still performed significantly better than chance (although performance is comparatively impaired in these conditions, see Ludden & Gupta, 2000). A still more direct indication of the relevance of these data with regard to infants acquiring their mother tongue was provided by Saffran et al. (1996a), who reported studies carried out with 8-month-old infants. The infants were tested with the familiarization-preference procedure used by Jusczyk and Aslin (1995), in which infants controlled the exposure duration of the stimuli by their visual fixation on a light. The infants showed longer fixation (and hence listening) times for nonwords than for words, thus demonstrating that they were sensitive to word structure after a brief exposure to an artificial language. Overall, the studies conducted by Saffran and co-workers offer impressive support for the hypothesis that people are able to learn the words forming a continuous speech stream without any prosodic or phonological cues for word boundaries.

3.2. PARSER: The Principles of the Model

Our aim here is to show that word extraction can be explained by the action of elementary, associative-like processes acting on the initial conscious percepts, the result of which is to modify the conscious experience we have of the linguistic input.

What is the phenomenal experience of the listener of a new language such as the one used in the Saffran et al. experiments, at the beginning and end of training respectively? When people are confronted with material consisting of a succession of elements, each of them matching some of their processing primitives, they segment this material into small and disjunctive parts comprising a small number of primitives. As adults, we have direct evidence of the phenomenon. For instance, when asked to read nonsense consonant strings, we read the material not on a regular rhythmic, letter-by-letter basis, but rather by chunking a few letters together. In a more experimental vein, when adults are asked to write down this kind of material, they frequently reproduce the strings as separate groups of 2, 3, or 4 letters (Servan-Schreiber & Anderson, 1990). The same phenomenon presumably occurs when a listener is faced with an unknown spoken language, with the syllables or other phonological units forming the subjective processing primitives instead of the letters. Certainly, when hearing an unknown language at a normal locution rate, the processing of the material is usually not exhaustive. Rather, subjects pick up a chunk of a few syllables from time to time. But this difference does not alter the basic phenomenon of chunking. Chunking, we contend, is a ubiquitous phenomenon, due to the intrinsic constraints of attentional processing, with each chunk corresponding to one attentional focus.

This initial segmentation is assumed to depend on a large variety of factors. Some factors are linked to the participants. For instance, prior experience of another language may endow participants with different processing primitives. Also, the current state of attention and vigilance may partly determine the chunk size. Other factors are associated with the situation, such as the signal/noise ratio, the time parameters of the speech signal, and the relative perceptual saliency of the components of the signal. The mixture of these factors is very likely to mean that a listener's initial conscious experience consists of a succession of chunks which are different in length and content from the words of the language.

After extensive exposure to the language, the listener's phenomenal experience is presumably the experience each of us has of our mother tongue, that is the experience of perceiving a sequence of words. Our proposal is that the final phenomenal experience of perceiving words emerges through the progressive transformation of the primitives guiding the initial perception of the language, and that this transformation is due to the self-organizing property of the content of phenomenal experience. The basic principle is fairly simple. The primitives forming a chunk, that is those that are perceived within one attentional focus as a consequence of their experienced temporal proximity, tend to pool together and form a new primitive for the system. As a consequence, they can enter as a unitary component into a new chunk in a further processing step 7. This explains why the phenomenal experience changes with practice. But why do the initial primitives evolve into a small number of words instead of innumerable irrelevant processing units?

The reason lies in the combined consideration of two phenomena. The first depends on the properties of the human processing system. The future of the chunk which forms a conscious episode depends on ubiquitous laws of associative learning and memory. If the same experience does not re-occur within some temporal lag, the possibility of a chunk acting as a processing primitive rapidly vanishes, as a consequence of both natural decay and interference with the processing of similar material. The chunks evolve into primitives only if they are repeated. Thus some primitives emerge through a natural selection process, because forgetting and interference lead the human processing system to select the repeated parts from all of those generated by the initial, presumably mostly irrelevant, chunking of the material. The relevance of this phenomenon becomes clear when viewed in relation to a property inherent to any language. If the speech signal is segmented into small parts on a random basis, these parts have more chance of being repeated if they match a word, or a part of a word, than if they straddle word boundaries. In consequence, the primitives that emerge from the natural selection due to forgetting and interference are more likely to match a word, or a part of a word, than a between-word segment.

This account has been implemented in a computer program, PARSER. Technical details about PARSER are provided in Appendix A, and an on-line presentation of the model is available on the URL (http://www.u-bourgogne.fr/LEAD/francais/personnel/perruche/SOC.html). Simulations have revealed that PARSER extracts the words of the language well before exhausting the material presented to adults in the Saffran et al. (1996a) experiments, and the material presented to 8-month old infants8 in the Saffran et al. (1996b) experiments. These results were obtained with an exhaustive chunking of the input. When a more realistic fragmentary processing of the material was simulated, performances were impaired, but remained fairly good. PARSER was able to reproduce the performance of actual subjects while processing only 3 to 5 percent (according to experiments) of the sequences presented to participants. This finding suggests that PARSER was able to simulate the results obtained under attention-disturbing conditions (Saffran et al., 1997), where inattentional gaps were presumably more frequent than under standard conditions. Finally, the good performance of PARSER was not limited to the trisyllabic words used by Saffran et al., but also extended to a language consisting of one- to five-syllable words (Perruchet & Vinter, 1998b).

To summarize, we suggest that parsing results from the interaction between one property of language -essentially that the probability of repeatedly selecting the same group of syllables by chance is higher if these syllables form intra-word rather than between-words components-- and the properties of the processing systems -essentially that repeated perceptual chunks evolve into processing primitives which in turn determine the way further material is perceived. Note that our solution to the word extraction issue does not involve any new and specialized learning devices. The fact that complex material is processed as a succession of chunks each comprising a few primitives is supported by a large amount of literature (e.g. Cowan, 1999). The unitization of these primitives due to their processing within the same attentional focus is one of the basic tenets of associative learning (e.g., Mackintosh, 1975). Likewise, the laws of forgetting and the effects of repetition are ubiquitous phenomena. Moreover, the interdependence of processing units and incoming information - the nature of the processing primitives determines how the material is perceived and the nature of the material determines the transformation of the processing primitives, and so on recursively- is consistent with a developmental principle initially described by Piaget's concepts of assimilation and accommodation (e.g., Piaget, 1985). Most current theories of development, although they use different terminology, also rely on the constructive interplay between assimilation-like and accommodation-like processes (e.g. Case, 1993; Fischer & Granott, 1995; Karmiloff-Smith, 1992).

3.3. PARSER and the Issue of Consciousness

The functioning of PARSER, like the functioning of any other computational model, does not depend in any way on the conscious/unconscious status we ascribe to its components. As a consequence, PARSER does not demonstrate that consciousness is necessary for word extraction. Its objective lies elsewhere. As set out in Section 1.4.1, the aim of this paper is not to demonstrate the necessity of consciousness, but instead to assess whether conscious thought, although endowed with severe capacity limitations, is sufficient to account for performance. We pointed out that devising a model to simulate conscious states while respecting the properties of conscious thought introduces considerable constraints. The point we wish to emphasize here is that PARSER meets much of these constraints. Crucially, the only representations included in the model closely match the conscious representations subjects may have when performing the task. The early coding of the material as a set of short and disjunctive units, as well as the final coding of the input as a sequence of words, are assumed to closely match the phenomenal perceptual experience of the listeners. This correspondence also extends to the entire training phase, thus permitting our model to perform word segmentation while mimicking the on-line conscious processing of incoming information. By doing so, PARSER demonstrates that the transient and labile representations composing the momentary phenomenal experiences are sufficient for word extraction, provided that simple and ubiquitous associative processes are allowed to operate on these representations. There is no need for unconscious representations, nor for any forms of unconscious computation on these representations.

It is worthy of note that the constraints inherent to conscious thought can not be conceived of as limitations to the model. PARSER works well, not despite these constraints, but thanks to them. For instance, the fact that attention is limited to the simultaneous perception of a few primitives --a property of the conscious/attentional system usually thought of as a serious handicap-- is the very property that offers the system a set of candidate units. If humans perceived a complex scene as a single unit, PARSER's principles would not work. Likewise, forgetting is essential to the functioning of the model because, if it did not forget, PARSER would fail to extract the relevant units from the multiple candidate units processed by the system. This aspect of the model makes it specially relevant for a rational analysis of cognition, such as initiated by Anderson and Milson (1989). This approach contrasts with the common mechanistic explanation, in which the cognitive system is described "as an assortment of apparently arbitrary mechanisms, subject to equally capricious limitations, with no apparent rationale or purpose", to borrow Chater and Oaksford's (1999) characterization. The rational analysis of cognition shows how apparent limitations actually serve adaptive functions, due to the characteristic of the surrounding environment. For instance, the fact that memory decays gradually over time is viewed as adaptive, because it turns out that the probability for any memory components will be needed to deal with a subsequent situation also decays over time. In this way, the efficiency of the retrieval of information from memory parallels the probability of this information being recruited for adaptive goals. Although focusing on another function, our analysis follows the same approach: Memory breakdown, considered in conjunction with the preventing effect of repetitions, is adaptive, because it turns out that, in any language, a given segment has more chance of being repeated if it matches a word than if it straddles word boundaries. In this context, forgetting allows the selective disappearance of structurally irrelevant units9.

3.4. PARSER and Alternative Computational Models

As mentioned above, the primary objective of this paper is to highlight the internal consistency of a framework grounded on a set of premises which are strikingly different from those of the standard cognitive approach. This objective prevents a detailed and exhaustive comparison with alternative models. However, pointing out some differences may help to illustrate some specificities of the SOC framework, whose PARSER provides the instantiation in the word segmentation issue. To this end, we briefly compare PARSER with two other models of word segmentation, respectively based on a symbolic and a connectionist architecture. The comparison concerns only the basic principles of the models, given that empirical comparative analyses are not yet available.

One recent symbolic model of word segmentation has been developed by Brent and Cartwright (1996). The authors construe segmentation as an optimization problem. The principle of the method is akin to establishing a list of all the possible segmentations of a given utterance (although the authors used computational tools which prevented the program from proceeding in this way). The choice between possible segmentations is then made in order to fulfill a number of criteria. These criteria are threefold (according to the somewhat simplified presentation by Brent, 1996): minimize the number of novel words, minimize the sum of the lengths of the novel words, and maximize the product of the relative frequencies of all the words. The process of optimization is performed thanks to a statistical inference method, called the "minimum representation (or description) length" method. When units have been created by the system, they help to choose among different possible segmentations of the utterances. In addition, the choice between possible segmentations takes account of certain phonotactic constraints on the form of English words. This method has been applied with some success for parsing phonetic transcripts of child-directed speech into words.

Most of the connectionist models which address the word segmentation issue rely on the simple recurrent network, or SRN, initially proposed by Elman (e.g. 1990; see also Cleeremans, 1993). An SRN is a network which is designed to learn to predict the next event of a sequence. To this end, at each time step, the activations of the hidden units are stored in a layer of context units, and these activations are fed back to the hidden units on the next time step (hence the term "recurrent"). In this way, at each step, the hidden layer processes both the current input and the results of the processing of the immediately preceding step, and so on recursively. With the exception of this feature, an SRN works as many networks do, using the back propagation of errors as a learning algorithm. The comparison between the predicted event and the next actual event of the sequence is used to adjust the weights in the network at each time step, in such a way as to decrease the discrepancy between the two events. Elman (1990) presented such a network with a continuous stream of phonemes one phoneme at a time, the task being to predict the next phoneme in the sequence. The accuracy of prediction was assessed through the root mean square error for predicting individual phonemes. After training, the error curve had a strikingly marked saw-tooth shape. As a rule, the beginning of any word coincided with the tip of the teeth. This means that after a word, the network was unable to predict the next phoneme. However, as the identity of more and more of the phonemes in a word was revealed, the accuracy of prediction increased up to the last phoneme of the word, and the error curve therefore fell progressively. The start of the next tooth indexed the beginning of the next word. Therefore, an SRN appears able to parse a continuous speech flow into words (for more recent models, see Aslin, Woodward, LaMendola, & Bever, 1996; Christiansen, Allen, & Seidenberg, 1998)

Needless to say, nothing in those models matches the conscious experience of the learner of a new language. The operations involved in the Brent and Cartwright model, such as the computation of all the possible segmentations of an utterance in order to choose the one responding to pre-specified criteria, far exceed the level of complexity that can be achieved by a conscious operator, whether complexity is assessed in terms of computational sophistication or memory capacity. The consequence is that the Brent and Cartwright model is grounded on the postulate of a powerful cognitive unconscious, even if there is no explicit mention of this postulate in their paper. By contrast, an SRN relies on mechanisms that, although lacking direct support (there is no evidence of a neural implementation of the error backpropagation algorithm underpinning SRN functioning, as acknowledged by Elman et al., 1997), are a little more realistic at the neurobiological level. However, the model's contents are even more distant from the learner's experience. Even the final state, namely the representation of the input as a set of words, is not directly provided by the network: Words can only be inferred from the graded distribution of errors after learning is completed.

These remarks on alternative models can hardly be thought of as criticisms by themselves, given that these models were not devised to account for conscious experience. However, they illustrate the specificity of the SOC framework. PARSER, which implements the SOC framework in the word segmentation issue, accounts for the formation of word while closely mimicking the subjective experience of the learner, and without calling on other principles or mechanisms than the ubiquitous principles of associative learning and memory. By contrast, the alternative models rely on various postulates about states and operations we have no evidence of, while giving strictly no function to the representations of which we have direct and immediate evidence through conscious experience. The end-result is that, in the alternative models of word segmentation considered here, costful assumptions are made about unconscious operations while the content of phenomenal experience is left both unexplained and objectless.

4- Learning the World Units

The achievement of PARSER in simulating experimental data on artificial, over-simplified languages supports the idea that conscious representations, far from being a phenomenal by-product of complex analytical processes, are capable of self organization. We now intend to show that our model provides a reasonable account of word extraction in natural language (4.1.), and also extends to the formation of object representations and word-object mapping (4.2.).

The general position taken in this section is as follows. On the one hand, natural conditions are far more complex than the experimental conditions considered so far, and this leads one to expect our model to perform worse in the latter case than in the former. In particular, it appears likely that relevant units represent a very restricted proportion of the potential units that may be initially perceived, and that the process of natural selection on which our model is based will not be sufficiently efficient. However, on the other hand, the complexity of natural conditions may paradoxically help to built the relevant units. To understand the reasons, we have to go back to the basic principles of the SOC framework, and notably to the role of attentional factors in unit formation. A new unit associates the processing primitives that are attended to simultaneously. With the simple artificial languages considered so far, the primitives embedded within a single attentional focus at the beginning of training are randomly selected on the basis of their temporal contiguity, because there are no other guides to constrain chunking. However, natural conditions often provide clues, which are generally excluded in experimental conditions in order to achieve better control. These clues, we will show, guide the formation of the initial chunks by orienting people's attention, and allow us to deal with the problem of the unmanageable number of possible units.

4.1. Word Extraction in Natural Language

Natural language acquisition does not consist in identifying six words used again and again in a few minutes, but many thousands of words distributed over years. Are the principles underlying PARSER general enough to be easily applied to such different complexity and time scales? As we have mentioned, PARSER works thanks to the interaction between one property of the language and a few properties of the human processing system. There is no reasons to believe that this interaction occurs only with the simplistic language used by Saffran and co-workers. The target property of the language, namely that the probability of repeatedly selecting the same group of syllables by chance is higher if these syllables form intra-word rather than between-words components, is obviously shared by Saffran et al.'s artificial material and by any natural language. Likewise, the properties of the processing system on which PARSER relies are very general. For instance, one fundamental assumption of the model is that a cognitive unit is forgotten when not repeated and strengthened with repetition. This assumption may be taken for granted irrespective of whether the process occurs in the few minutes of an experimental session or across larger time scales, in keeping with a long-standing tradition of research into the laws of memory and associative learning. In consequence, PARSER's principles seem to be relevant to natural as well as to artificial language. Briefly stated, the generality of PARSER is ensured by both the generality of the behavioral laws (e.g., only repeated units shape long-lasting representations) and the generality of the language property (the most repeated units are the words) on which it relies.

However, beyond the theoretical relevance of the principles, it is possible that the complexity of the situation may give rise to an insoluble difficulty. This could be the case if natural language really consisted of a continuous, uninterrupted speech flow. But natural language includes pauses. These provide natural cues for segmenting the speech flow from its very onset. Although the information is insufficient for full segmentation, it may be quite useful for children given that child-directed language is characterized by very short utterances separated by clear pauses. Incorporating the information provided by the pauses into PARSER is straightforward: we simply need to constrain selection of the number of primitives perceived in one attentional focus in such a way that the content of an attentional focus does not straddle pauses. It is worth stressing that this change is not an ad-hoc, poorly motivated addition to the model. Indeed, this change is fully consonant with the SOC framework, and notably with the importance of attentional factors. Pauses, in fact, partly determine the content of the attentional focus, because attention naturally gathers events in close temporal proximity. Furthermore, pauses are only one among many prosodic and phonological cues capable of orienting attention in natural language processing. Overall, although we acknowledge that PARSER is certainly underpowered to deal with natural language, the principles that it implements are general enough for us to be optimistic about achieving an improved version exploiting the multiple cues which are likely to constrain the selection of the primitives embedded in each attentional focus.

4.2. The Representation of Objects and the Word-Object Mapping Issue

PARSER was initially built to account for the segmentation of a continuous speech flow observed in the experiments by Saffran and her co-workers. Saffran, Johnson, Aslin, and Newport (1999) recently showed that both adults and 8-month-old infants succeeded equally well at segmenting non linguistic auditory sequences. Of course, there is no reasons to restrict the applicability of the principles underpinning PARSER to the language area and PARSER should therefore be a priori able to simulate the Saffran et al. (1999) data. Generalizing from this example, there is no reason not to apply PARSER's principles to non-sequential material, such as objects. Our objective now is to show that the model of word extraction described above is able to account for the formation of object representations. The idea that learning is crucial for object representation has been proposed earlier in the literature, especially by Schyns and co-workers (e.g. Schyns, Golstone, & Thibaut, 1998). These authors show cogently that low-level object features can change with experience, thus altering the immediate appearance of objects. These views suggest an account of object perception strikingly different from the prevalent ones. Indeed, most developmental psychologists postulate that children's ability to parse continuous sensory input into discrete objects is made possible because there are some innate constraints and certain domain-specific knowledge (Bower, 1979; Karmiloff-Smith, 1992), assumptions (Markman, 1990), presuppositions or intuitive theories (Spelke et al., 1992) about the structure of the world, a position that naturally follows from the standard cognitive view outlined in Section 1.

Some adaptations are warranted if we are to achieve our objective. Accounting for the formation of object representations implies a change in the primitives of the system, which will no longer be the syllables or other phonological units, but, for instance, spatially oriented features. Likewise, the natural principles guiding the initial chunking of primitives will no longer be temporal proximity, but spatial contiguity. However, instantiating these adaptations confront us with a problem, which arises from the fact that the number of initial units is much greater than with a linguistic material. Indeed, in the auditory speech flow, the number of possible units is limited by the sequential nature of the speech signal. For instance, a 3-syllable message can be composed of three one-syllable words, two words consisting of one and two syllables, or one 3-syllable word. This results in only four possibilities. By contrast, a visual display can be decomposed into a virtually unlimited set of different parts, even if each part includes only spatially contiguous elements. Under these conditions, the formation of relevant units would appear to be an intractable problem.

This problem, again, finds a solution in the idea that units are formed by the concurrent attentional processing of a small number of primitives. The point is that infants' attention is captured by an array of stimuli sharing specific properties. One of these properties, for instance, is novelty (e.g. Kagan, 1971). If, at a given moment, several primitives are new for the infants, it is highly probable that these primitives are processed conjointly in the attentional focus, hence forming a new unit. Now, if several primitives are new for a subject, there is also a good chance that they will be the components of one and the same meaningful unit, such as an actual object. The same line of reasoning may be followed with movement. It has been established that infants' attention is attracted by a moving display (e.g. Haith, 1978; Bronson, 1982; Vinter, 1986). If several elementary features move concurrently, they have a high probability of being both attentionally processed by infants, and belonging to a same real object (of course, many objects do not move; however, it is imaginable that the perceived movement generated by eye displacement in a 3-D visual field makes it possible to generalize this phenomenon to motionless objects).

The logic applied to the segmentation of the linguistic input into words and to the segmentation of the world into objects may be extended to word/object mapping. Note that the potential problem raised by the number of candidate units is exacerbated here. In real life, infants may capture within a single attentional focus unrelated componential aspects of the environment, such as a sound frequency together with the orientation of a segment of a visual display. To illustrate the latter issue, let us consider an example inspired by a question raised by Karmiloff-Smith (1992, p.40). When an adult points toward a cat and say "look, a cat", how can the child pair the word "cat" with the whole animal, rather than, say, with the cat's whiskers, the color of the cat's fur, or the background context? A solution based on the selective role of attention still works. What is likely to become associated is what captures the infant's attention, that is, essentially, what is new and/or moving. Presumably, considering the auditory input first, "cat" is newer than "look", because "look" has been associated with many contexts before. As a consequence, it is highly probable that "cat", rather than "look", enters into the momentary attentional focus. On the other hand, it is also highly probable that the infant's attention is focused on the animal, which moves as a whole, rather than on one of its parts, or on the other elements of the context, which are presumably both more familiar and motionless.

Of course, the process of mapping as described above may sometimes fail. The infant may be quite familiar with cats, and surprised by the russet color of the fur of this specific cat. We predict that, in this case, the infant would mismap the word "cat" to the color russet. It is worth noting first that, in real world settings, this situation may be infrequent, because adults would tend to spell out what is presumably the most novel for the infants, and more generally, what they infer to be their present object of attention. On the other hand, errors of mapping do in fact occur during language development. What is needed is not a theory predicting a perfect mapping from the outset, but a theory able to predict the final achievement. Our model of learning is precisely adapted to extracting signals from noises. In general, the correct mapping will be the final outcome, because the infants will hear "cat" for animals which are not russet, and will hear "russet" for animals which are not cats.

To summarize, our model of learning, initially applied to the word extraction issue, suggests a new account of infants' basic ability to parse the physical words into objects, and to map words and objects. The apparent problem posed by the unmanageable number of potential units that can be initially perceived finds a simple solution thanks to the fact that attention is naturally captured by a tightly defined set of events. Of course, this account, in its present form, is just a first draft of a more complete developmental model. Such a model should address many other points. For instance, as a rule, a word does not designate a specific object or animal, but a category of objects or animals. It is easy to imagine how the phenomenon may be encompassed in an framework based on the laws of associative learning and memory. Differences between specific instances of, say, cats, can be viewed as noise for the system, whereas the common features are located in the to-be-detected signal. When the word cat is associated with different instances of cats, idiosyncratic features of the animals, because they are not repeated, disappear from the representation while common features are reinforced.

5. From Lexicon to Syntax

Up to now, we have proposed an interpretation for the formation of conscious representations of parts of the world, such as words and objects. However, the existence of linguistically or physically relevant representations is not commonly considered as sufficient for accounting for human behavior. Representations are generally construed as the elementary bricks of thought, and complex human behavior is assumed to rely on the formation of some kind of abstract knowledge, in which the bricks are combined on the basis of some organizing (e.g. logical) principles. In the language domain for instance, there is a conventional distinction between the lexicon and the syntax. Both of them are assumed to be mediated by different neural mechanisms, and the role of language exposure in the acquisition process is conceived of as very different: although some impact of learning in word acquisition is acknowledged even by strong nativists, the acquisition of grammar is attributed to innate and specialized modules. Needless to say, we do not deny that adult humans are able to abstract rules. The very existence of sciences such as logic, physics, and linguistics, testifies to the human ability to abstract the structure of complex environments. Since this section is devoted to language, it is important to point out from the outset that we agree with the contention that humans can achieve genuine knowledge of the syntax of their language. However, in the mentalistic framework, the formation and manipulation of abstract knowledge is restricted to conscious activities.

Our proposal is that the notion of self-organizing consciousness offers a way of thinking about rule-governed behavior in cases where no conscious rule analysis is performed, without having recourse to the notion of unconscious rule abstraction. The idea is that the separation between basic units on the one hand, and rules governing those units on the other, or between lexicon and syntax in linguistic terminology, is warranted in a scientific approach (i.e. from the observer's viewpoint) but has no relevance for the processing system. The purpose of the processing system is to generate a representation of the world which integrate all the momentary input (internal and external) into a coherent and meaningful scene. This complex and integrative representation, we will argue, makes rule knowledge objectless. The mentalistic framework is specially well-suited to pushing the representation/ computation trade-off (Clark and Thornton, 1997) to its ultimate end. Indeed, claiming that representations exist only in the momentary phenomenal experience is primarily restrictive when contrasted with the conventional cognitive approach, which postulates that innumerable representations are stored and processed in parallel in the cognitive unconscious. But there is a positive counterpart. If there is no Cognitive unconscious, the full power of the neural system may be mobilized for the formation of the current phenomenal experience. This opens up the possibility of generating a multifaceted and highly complex representation of the world.

Our demonstration of the ability of conscious representations to account for improved performances in rule-governed situations starts in the context of artificial languages generated by a simple finite-state grammar (5.1). Then we turn to natural language. Section 5.2 is an attempt to generalize, on a speculative basis, the principles whose efficiency has been demonstrated in connection with finite-state grammars. Section 5.3 indicates a few directions in contemporary psycholinguistic research that also exploit the ability of lexical representations to explain apparent rule-based, syntactical abilities. We then turn away from the field of language and examine the studies in implicit learning that exploit non-linguistic material (5.4).

5.1. Studies Involving Artificial Grammars.

In the artificial language considered in Section 3, which was used by Saffran and co-workers (e.g. Saffran et al., 1997), the subject's task is to discover the lexicon. There are no syntactical constraints, insofar as the words of the lexicon are displayed in random order. By contrast, in the situations considered in the literature on artificial grammar learning, the discovery of the lexicon raises no particular problems, because the units of the language match some subject's processing primitives. However, the combination of these units are governed by syntactical rules, which are the to-be-learned components of the situation. In most cases, the situation involves a set of consonants, the order of which is governed by a finite-state grammar , such as that initially introduced by Miller (1958). The finite-state grammars have been extensively used by Reber (e.g. Reber, 1967) and many other researchers (e.g. Dulany et al., 1984, Shanks et al., 1997) working in the implicit learning field (for reviews, see Cleeremans, Destrebecqz, & Boyer, 1998; Reber, 1993).

In a conventional situation, participants are first exposed to a set of consonant strings following a finite state grammar such as that represented in Figure 1, without being asked to learn the rules or even being informed of the structured nature of the material. A subsequent test is performed in order to reveal whether participants have learned about the grammar. This test generally consists in asking them to judge the grammaticality of new strings. The usual outcome is that participants are able to classify the new strings as grammatical or ungrammatical with better-than-chance accuracy, whereas they lack conscious knowledge about the grammar. The initial conclusion of these studies was that mind is endowed with an unconscious information processing device able to abstract the rules governing the experimental material, and then applies these rules in other contexts (e.g. Reber, 1967). Because the conclusions of these early studies accorded well with the prevalent zeitgeist, this interpretation has gone unchallenged for many years.

 

 

 

Figure 1. Schematic diagram of the grammar used in several earlier studies (e.g. Dulany, Carlson, & Dewey, 1984)

 

However, further studies, initiated by the seminal papers by Brooks (1978) and Dulany, Carlson, and Dewey (1984) made it clear that these conclusions were premature. To borrow the distinction proposed by Smith, Langston, and Nisbett (1992), the early studies failed to distinguish between a system that follows rules from one that simply conforms to rules. A ball falling on the ground conforms to the law of gravity, but does not follow this law. Experimental evidence in implicit learning situations shows that the participants conform to the rules underlying the situations, but there is no proof that the rules have been learned in any way. Several alternative interpretations have been proposed. Because this literature has been reviewed extensively elsewhere (e.g. Berry & Dienes, 1993; see also the Handbook of Implicit Learning edited by Stadler and Frensch, 1998), we will focus on our own interpretation.

In keeping with the SOC framework, our re-interpretation (e.g. Perruchet & Vinter, 1998a; Perruchet, Vinter, & Gallego, 1997) of the phenomenon is that the training phase modifies the way the data are consciously coded and perceived. Assuming, for the sake of illustration, that XRX is a frequent recursion in the finite state grammar, participants no longer perceive X and R as two familiar but separate entities, but perceive XRX as an increasingly familiar unit. One possible explanation for the above-chance grammaticality judgments of a new string including XRX is that participants interpret, more or less automatically, the level of perceptual fluency as an indicator of grammaticality. Strings that can be easily read because chunks of letters are directly perceived as familiar units would tend to be judged as grammatical. In short, in our re-appraisal, the formation of the conscious unit XRX replaces the unconscious extraction, retention, and use of a rule such as: If XR, then X.

It might seem, at first glance, that any fragment of a grammatical utterance is itself grammatical, and can be recombined with another fragment to form a new grammatical string. Given this logic, the initial chunking of the material would not matter. And indeed the notion of "fragmentary knowledge" conveys the tacit implication that it is a quite impoverished form of knowledge. This view is faulty, as may be illustrated using the example of natural language. For instance, in the preceding sentence, "this view", or "natural language" form structurally relevant sequences, in the sense that they can be recombined with a large number of other sequences, whereas "faulty as may" cannot be easily integrated as a component in another linguistic context, although it is a component of a legal sentence. It is obvious that it is preferable to become familiar with the former sequences than with the latter. Likewise, in the letter strings generated by a finite-state grammar, it is preferable to become familiar with a subset of sequences --for instance those that are generated by a recursive loop-- than with other, randomly selected, sequences. We (Perruchet, Vinter, Pacteau, & Gallego, in press) have shown that participants in an artificial grammar learning setting indeed formed the structurally relevant units. They were asked to read each string generated by a finite state grammar and, immediately after reading, to mark with a slash bar the natural segmentation positions. The participants repeated this task after a phase of familiarization with the material which consisted either of learning items by rote, performing a short-term matching task, or searching for rules. The same number of total units was observed before and after the training phase, thus indicating that participants did not tend to form increasingly larger units. However, the number of different units reliably decreased, whatever the task during training. This result was taken as evidence that participants' processing units become increasingly relevant as training progressed (see also Servan-Schreiber & Anderson, 1990). Perruchet et al. (in press) also showed that PARSER, the computer model which was used previously to account for the discovery of words in an unsegmented speech flow (Perruchet & Vinter, 1998b; see Section 3), also accounted for participants' actual performance. Thus the principles that make it possible to discover the lexical units of an artificial language built from the random concatenation of words, also proved to be efficient in the discovery of the syntactically relevant units of an artificial language built from a finite-state grammar.

It is worth examining why such simple principles work well in a situation that was once thought of as involving grammatical rule abstraction. It is because first-order and second-order dependency rules capture virtually all the structural constraints of the standard finite-state grammars. For instance, Perruchet and Gallego (1997) have demonstrated that consideration of only the first-order dependency rules is sufficient to account for the performance of the participants in the Reber (1976) experiments and many others which use the same material. Indeed, assuming that participants classify test items as grammatical if they consist only of permissible bigrams (whatever their location in the strings) would result in the production of 90% correct responses, a success level that greatly exceeds observed performance. The same demonstration may be repeated for other standard situations of implicit learning, such as the repeated sequence tasks (Perruchet & Gallego, 1997).

Note that we have dealt separately with the lexical level (in Section 3) and the syntactical level (in this section), while language acquisition implies the simultaneous acquisition of lexicon and syntax. This does not constitute a problem. The starting point for PARSER is the idea that each attentional chunk includes a small number of primitives, and that the primitives which are processed together form a new internal primitive, as a by-product of their joint attentional processing. After having discovered the words forming the artificial language used in the Saffran et al. (1996 a and b, 1997) experiments, PARSER obviously goes on creating new units. These units, which are the concatenation of a few words, rapidly vanish. Indeed, because word order is random in Saffran et al.'s material, the repetition of the same word sequence is not frequent enough to allow the strengthening of any word sequence. Let us now suppose that, instead of being randomly ordered, the words are subjected to some syntactic constraints. The constraints would make some sequences grammatical and the other sequences ungrammatical. In this case, PARSER forms long-lived units consisting of the grammatical sequences. Moreover, as shown in current studies run in collaboration with Axel Cleeremans, PARSER discovers the most frequent multi-words sequences, which have much chance of being the most syntactically relevant. If we transpose the results from the computational model to the level of the phenomenal consciousness of actual people, it appears that the very same process that permitted word formation during the initial stage of learning is able to generate the phenomenal experience of well-formedness for syntactically correct word sequences. This phenomenal experience can be the source of various overt behaviors, such as grammaticality judgments or verbal productions.

5.2. Learning Syntax in Natural Language

Of course, it is premature to claim that the above outline is directly relevant to natural languages. First, it may be argued that any approach relying on associative learning mechanisms can in principle provide only statistical approximation to genuine syntactic knowledge, whereas people make no errors. We believe that this objection amounts to both underestimating a priori the power of associative mechanisms, and exaggerating the actual accuracy of people performance. For instance, we mentioned above (Section 3) that PARSER, although relying only on associative learning mechanisms, was able to extract the words in Saffran et al. (e.g. 1996) language without any errors. Admittedly, this language is oversimplified, but, at the same time, a very limited amount of exposure to the material is sufficient to learn it. The level of performance that can be reached when a more complex language is studied over a more extended period is currently a matter of speculation. On the other hand, people's ability to master the syntax of a natural language may have been overemphasized in the Chomskyan tradition. For instance, even simple spontaneous oral productions are rarely error-free, and it is fairly difficult to capture the syntactical structure of a complex sentence whenever semantics can not help. To conclude, assessing the ultimate explanatory power of associative mechanisms is a matter for further empirical investigations and computational studies.

However, there are a second category of objections, stemming from the fact that the finite-state grammars used in the laboratory studies provide a poor analog for the grammars of natural languages. The finite-state grammars used in the implicit learning literature mainly involve first-order and second-order dependency rules between contiguous elements. By contrast, natural languages involve higher-order dependency rules and remote dependencies. At a more qualitative level, it has long been known that the grammars of natural languages can not be conceived of in terms of a finite-state grammar. Also, it remains unclear how our claims account for other aspects of syntactic knowledge, and especially the abstraction of syntactic classes such as nouns and verbs.

The part of the argument based on the consideration that our account works well only with first and second-order dependency rules is not as problematic as it might seem. Indeed, in PARSER, the dependency rules are captured through the formation of new processing primitives, which can themselves become the components of subsequent primitives. Thanks to this possibility of hierarchical processing, we can speculate that PARSER should become at least partially sensitive to high-order dependency rules. However, the order of the dependency rules is only one aspect. Many other aspects of natural language have no counterpart in artificial languages governed by a finite-state grammar. We acknowledge that a model designed to deal with artificial languages can not deal with natural languages without undergoing substantial changes. But the essential question is: Beyond the limitations of PARSER in its current implementation, are the fundamental principles underlying the SOC model able to account for the acquisition of syntax in natural language? Although we have no definitive response, we believe that there are arguments allowing us to answer this question in the positive.

As an example, let us consider the dependencies between remote elements, and more precisely, the case of a sequence AXB in which A and B are associated irrespective of the length and nature of X. There are many occurrences of such a structure in natural language. For instance, in the sentence: "The window of my office is open", "The window" (A) is associated to "is open" (B) irrespective of the determinant: "of my office" (X), that may be deleted or replaced by an infinite number of subordinate propositions. PARSER is a priori unable to capture the relation, because the model posits that new units can only be formed between contiguous elements. However, the general principle that PARSER instantiates is that new units result from the processing of a few primitives within the same attentional focus. When people encounter sequential material, the most simple assumption is that each attentional focus embraces a small number of contiguous elements. In artificial, meaningless languages, there is no obvious reason to expect a different type of chunking. However, there are clearly no functional or structural constraints here. Each of us commonly mixes present and past events in his/her current phenomenal experience. It is in keeping with our general approach of assuming that a new unit may be composed of spatially or temporally remote events, provided that there is some reason for those events to become associated in phenomenal experience. It is easy to imagine several developmental sketches accounting for how two remote events can be joined in an unitary experience. For instance, a link between A and B may emerge in situations where both events are contiguous (a case which, in our example, corresponds to the most simple utterance: "The window is open"). Then the occurrence de A without its usual successor may result in the retention of A in working memory until B occurs in order to complete the percept AB. At this moment, A and B will be simultaneously held in the attentional focus despite their objective separation, thus providing conditions favoring both the strengthening of their association and the understanding of the sentence. This is again consonant with the SOC framework, which relies on the assumption that perception is shaped by earlier representations.

5.3. Converging Lines of Evidence from Psycholinguistic Research

Although they developed completely independently of our own framework, there are a number of directions in psycholinguistic research that are able to help us consider the question of language learning within the SOC framework. As an example of such work, the re-emergent distributional approaches to language have recently shown that abstract classes and categories are often associated with simple statistical properties that make them tractable by all-purposes statistical learning mechanisms. Interestingly, even simple properties such as co-occurrence statistics turn out to be informative about syntactic classes. For instance, Redington, Chater and Finch (1998) studied a large natural language corpus taken from the CHILDES database (MacWhinney, 1995), comprising over 2.5 million words of adult speech. They measured the information that the context of a given word provided about the syntactic category of this word (among 12 possible categories). Context was defined by the two words to either side of the target word. The authors showed that "highly local contexts are the most informative concerning syntactic category and that the amount of information they provide is considerable" (Redington et al., 1998, p. 452; see also Gasser & Smith, 1998). Distributional approaches have also proven to be able to account for other aspects of language, such as the development of word meaning (McDonald & Ramscar, 2001).

Converging lines of evidence have evolved in other contexts. For instance, careful scrutiny of the linguistic productions of young children shows that these productions are organized around particular words and phrases, instead of operating with abstract linguistic categories and schemas. This finding of the item-based learning and use of language appears fairly general (for a review, see Tomasello, 2000). Of course, "item-based", or "memory-based" (McKoon & Ratcliff, 1998) approaches to grammar have not gone unchallenged. Some authors go on to argue that there is a modular dissociation between syntax and lexicon (e.g. Grodzinsky, in press).We are not familiar enough with the domain to offer new arguments in either direction. Our intention was simply to point out that distinguished figures in the psycholinguistic literature have been prepared to reject the idea that language processing necessarily involves syntactical rules. Such a view confers a high degree of probability on one of the main propositions of this paper, namely that it may be possible to explain the apparent use of abstract rules in terms of the formation of complex representations.

5.4 Unconscious Rule Processing Outside of the Language Area

Thus far, we have focused on studies on artificial or natural languages in order to illustrate the idea that apparent rule processing may be reducible to the formation of complex representations. The same idea can be illustrated in other fields. In particular, this idea finds strong support in the literature on implicit learning that is not based on linguistic material. Outside of the artificial grammar settings, studies on implicit learning have primarily involved two situations: the so-called serial reaction time (SRT) situations, and the control of complex systems. Most of the SRT studies have been designed on the basis of Nissen and Bullemer's paradigm. A target stimulus appears on successive trials at one of three or four possible positions, and participants are asked to react to the appearance of the target by pressing a key on the keyboard that spatially matches the location of the target. Unknown to the participants, the same sequence of trials is repeated throughout the sessions. Under these conditions, participants usually exhibit a reliable improvement in performance when compared with a control group presented with randomly generated series. The tasks involving the control of complex and interactive systems have their origin in Broadbent's studies (e.g. Broadbent, 1977). Participants are placed in front of a computer simulating a complex system, such as a city transport system. Unknown to them, the parameters of the system are governed by a linear equation. The task consists of regulating the system, that is they have to manipulate a number of parameters in order to reach and maintain a prefixed target state of the system. Several studies have shown that the initial abstractionist account of performance improvement involved unnecessary assumptions, because alternative interpretations based on simpler memory processes proved to be sufficient (see for instance Cleeremans & McClelland, 1991; Marescaux, Dejean, & Karnas, 1990; Perruchet & Amorim, 1992; Perruchet, Pacteau, & Gallego, 1997; Shanks & St.John, 1994; Stadler, 1992; Whittlesea & Dorken, 1993)

Rather than examining in detail the findings resulting from these conventional situations, we focus below on a specific paradigm initially designed by Lewicki, Hill, and Bizot (1988). Like almost all other studies in the field, this paradigm serves our primary objective which is to show that what is initially interpreted as compelling evidence of unconscious rule abstraction can also be explained in terms of the formation of conscious percepts and representation which are isomorphic with the structure of the material. However, this specific paradigm was also chosen because it allows us to illustrate another point, namely that our interpretation can work even in cases where there is no obvious relationships between the actual rules generating the structure of the material and the participants' conscious processing units. The point is that we may be sensitive to surface regularities that are a remote by-product of the rule, so remote in fact that the logical link between the rules and their by-products may be quite difficult to discover. This subsection is dedicated to those skeptical readers who doubt the power of our approach because of their failure to understand how it can apply after a cursory examination of certain complex situations.

In the Lewicki et al. (1988) paradigm, participants were asked to perform a four-choice reaction time task, with the targets appearing in one of four quadrants on a computer screen. They were simply asked to track the targets on the numeric keypad of the computer as fast as possible. The sequence looked like a long and continuous series of randomly located targets. However, this sequence was organized on the basis of subtle, non salient rules. Indeed, unbeknown to participants, the sequence was divided into a succession of "logical" blocks of five trials each. In each block, the first two target locations were random, while the last three were determined by rules of the form: "If the target describes a movement m while it moves from location n-2 to n-1, then it describes a movement m' from location n-1 to n". Depending on whether n is the third, fourth, or fifth trial of the logical block, if m is horizontal (or vertical and diagonal), m' is vertical or diagonal (or horizontal or diagonal, or horizontal or vertical respectively). It should be noted that to discover these second-order dependency rules, participants must inevitably segment the whole sequence into a succession of 5-trial subsequences. That is to say, any trial within the long displayed sequence must be identified as the first, second, ..., fifth trial within the logical 5-trial block to which it belongs.

The results obtained by Lewicki et al. were clear. The participants were unable to verbalize the nature of the manipulation and, in particular, they had no explicit knowledge of the subdivision into logical blocks of five trials, which was a precondition which had to be satisfied if they were to grasp the other rules. However, performance on the final trials of each block, the locations of which were predictable from the rules, improved at a faster rate and was better overall than performance on the first, random, trials. Lewicki et al. (1988) accounted for these results by postulating that the structuring rules were discovered by a powerful, multipurpose unconscious algorithm abstractor.

Perruchet, Gallego, and Savy (1990) provided the basis for a radically different interpretation (for an alternative interpretation based on connectionist modeling, see Cleeremans and Jimenez, 1998). Perruchet et al. demonstrated that participants learned the task without ever performing the segmentation of the sequence into logical blocks. Instead, they were sensitive to the relative frequency of small units, comprising 2 or 3 successive locations. Some of the possible sequences of 2 or 3 locations were more frequent than others, because the rules determining the last 3 trials within each 5-trial block prohibited certain transitions from occurring. In particular, an examination of the rules shows that they never generated back and forth movements (i.e., m' is never the inverse movement of m). As a consequence, the back and forth transitions were less frequent on the whole sequence than the other possible movements. The crucial point is that these less frequent events, which presumably elicit longer reaction times, were exclusively located on the random trials. This stems not from an unfortunate bias in randomization, but from a logical principle: The rules determined both the relative frequency of certain events within the entire sequence and the selective occurrence of these events in specific trials. The validity of this interpretation was tested by deriving predictions concerning specific features of fine-grained performance from an abstractionist model, on the one hand, and from our alternative model on the other. The empirical data clearly supported our re-analysis.

It should be noted that the subsequences of 2 or 3 successive locations considered by Perruchet et al. (1990) are presumably the events on which the su