Published in Behavioral and Brain Sciences
Volume 25, Number 6: 727-750

© 2002 Cambridge University Press


Below is the unedited, uncorrected, unquotable final draft preprint of a BBS target article that was accepted for publication. To order the final published version of this target article, commentaries and author's response, please visit the BBS Homepage at Cambridge Journals Online.


 

 

 

 

Are developmental disorders like cases of adult brain damage?

Implications from connectionist modelling

 

 

 

Michael Thomas and Annette Karmiloff-Smith

 

Neurocognitive Development Unit,

Institute of Child Health, London.

 

 

Dr. Michael Thomas (from 1/10/2002)

School of Psychology

Birkbeck College, University of London

Malet Street

London WC1E 7HX, UK

Email: m.thomas@psychology.bbk.ac.uk

Tel.: +44 (0)20 7631 6207

Fax: +44 (0)20 7631 6312

 

Professor Annette Karmiloff-Smith,

Neurocognitive Development Unit,

Institute of Child Health,

30, Guilford Street,

London WC1N 1EH, UK.

Email: a.karmiloff-smith@ich.ucl.ac.uk

Tel.: +44 (0)20 7905 2754

Fax: +44 (0)20 7242 7177

http://www.ich.ucl.ac.uk/units/ncdu/NDU_homepage.htm

 

Word counts

Abstract: Short - 118 words, Long - 262 words

Main text (including figure captions): 16,323 + 2,025 simulation details

References: 2,863

Entire text (excluding abstracts, including notes, figure captions, acknowledgements): 22,639

 

Short Abstract

It is often assumed that similar domain-specific behavioural impairments in adult brain damage and developmental disorders correspond to similar underlying causes. We argue that this correspondence is contingent on an unsupported assumption that atypical development can produce cognitive systems with selective deficits while the rest of the system develops normally. We explore the computational viability of this assumption based on a review of connectionist models of acquired and developmental disorders in the domains of reading and past tense, as well as using new simulations. We conclude that in developmental disorders, inferences from behavioural deficits to underlying structure crucially depend on developmental conditions, and that the process of ontogenetic development cannot be ignored in constructing models of developmental disorders.

 

Keywords: Acquired and developmental disorders, connectionist models, past tense, reading, modularity.

Abbreviations: RN – Residual Normality; WS – Williams syndrome; SLI – Specific Language Impairment

 

 

Long Abstract

It is often assumed that similar domain-specific behavioural impairments found in cases of adult brain damage and developmental disorders correspond to similar underlying causes, and can serve as convergent evidence for the modular structure of the normal adult cognitive system. We argue that this correspondence is contingent on an unsupported assumption that atypical development can produce selective deficits while the rest of the system develops normally (Residual Normality), and that this assumption tends to bias data collection in the field. Based on a review of connectionist models of acquired and developmental disorders in the domains of reading and past tense, as well as on new simulations, we explore the computational viability of Residual Normality and the potential role of development in producing behavioural deficits. Simulations demonstrate that damage to a developmental model can produce very different effects depending on whether it occurs prior to or following the training process. Since developmental disorders typically involve damage prior to learning, we conclude that the developmental process is a key component of the explanation of endstate impairments in such disorders. Further simulations demonstrate that in simple connectionist learning systems, the assumption of Residual Normality is undermined by processes of compensation / alteration elsewhere in the system. We outline the precise computational conditions required for Residual Normality to hold in development, and suggest that in many cases it is an unlikely hypothesis. We conclude that in developmental disorders, inferences from behavioural deficits to underlying structure crucially depend on developmental conditions, and that the process of ontogenetic development cannot be ignored in constructing models of developmental disorders.

1. Introduction

Behavioural impairments found in developmental disorders and in cases of acquired brain damage provide a source of information about the structure of the cognitive system. Historically, the logic of deriving implications about cognitive structure from behavioural impairments was formulated in the domain of acquired disorders in adults (see e.g., Shallice, 1988). It was argued that under some circumstances, highly selective patterns of impairment after damage could demonstrate the relative independence of different cognitive processes, predicated on an a priori assumption of modular structure within the cognitive system. Ultimately, this was thought to lead to the identification of the components of cognition.

Latterly, behavioural impairments found in developmental disorders have often been interpreted within the same cognitive neuropsychology framework (see e.g., Baron-Cohen, 1998; Leslie, 1989; Temple, 1997). In this case, there is an inference that selective behavioural impairments reveal discrete components of the cognitive system that have not developed properly, for example, the purported defective ‘theory of mind’ processor in autism (Leslie, 1992), or the defective phonological processor in dyslexia (Frith, 1995). However, the extension of the cognitive neuropsychology framework to interpret developmental disorders has proved controversial. Indeed, some researchers (Bishop, 1997; Karmiloff-Smith, 1997, 1998) have argued that the process of development itself violates key assumptions of the static cognitive neuropsychology model, and thus invalidates the direct inference from impairment to cognitive structure.

Our aim in this article is to evaluate this debate from the perspective of connectionist modelling of cognitive processes. This is a useful perspective because such models have been employed to capture both acquired deficits (when models of adult performance are damaged) and developmental deficits (when initial computational constraints are altered in models of typical development). Connectionist models therefore provide a concrete computational basis on which to anchor a debate on the potential causes of each type of deficit.

From the stance of the behavioural outcome, the impairments found in cases of acquired and developmental disorders can look very similar. For instance, in relation to types of dysgraphia, dyslexia, and dyscalculia, Temple (1997, p. 324) comments that ‘were one to give the data from adult and child cases to a cognitive neuropsychologist and ask the question, which are the adults and which the children, there are no apparent criteria by which to distinguish them’.

One might take this similarity in behaviour as an indication that the two types of disorder are linked at a deeper level, namely that they share a similar underlying cause. For example, where one can appeal to a static information-processing model of the adult system, one might characterise an impairment in the adult case as corresponding to selective damage to one (or more) processing components, and an impairment in the developmental case as a failure of one (or more) components to be properly acquired. Temple (1997) offers just such a characterisation for cases of developmental prosopagnosia (p. 139 and p. 141), as well as two sub-types of developmental dyslexia (p. 192 and p. 206), developmental disorders of spelling (p. 238 and p. 244), and developmental dyscalculia (p. 285-6).

In this target article, we argue that such a causal link between acquired and developmental disorders can only occur if, for a given domain, a very particular kind of developmental account holds true. In most cases where researchers have linked acquired and developmental disorders, the required developmental account has not been argued for, but merely assumed.

Our aim in the computational section of this article is to characterise the conditions that must hold in a developing cognitive system for acquired and developmental disorders to be linked at a causal level. We demonstrate by simulation that, in the absence of a precise developmental account of a cognitive system, behavioural data alone may be insufficient to infer underlying functional structure from a pattern of impairments. As a result, we argue that researchers working with developmental disorders must compare their data against developmental rather than static models, even though those static models may be appropriate for explaining patterns of acquired deficits in normal adults.

First, however, we introduce two concrete examples of domains in which explicit links have been drawn between acquired and developmental impairments. These are the domains of dyslexia and English past tense formation. For current purposes, these areas are important not only because they illustrate how explicit the claims have been about the relation between acquired and developmental impairments, but also because both areas have been the focus of substantial computational modelling work exploring the possible underlying causes of those impairments.

2. Comparisons of acquired and developmental deficits in two domains

2.1 Dyslexia

When adults experience difficulty in reading following brain damage, their patterns of behavioural impairments can be described according to several sub-types. Two sub-types are of particular relevance. In acquired phonological dyslexia, patients demonstrate particular difficulty in reading nonwords. In acquired surface dyslexia, they show a deficit in reading ‘exception’ words, where the pronunciation cannot be predicted from the usual letter-to-sound correspondence. For these exception words, patients tend to display errors of regularisation, e.g., gauge pronounced as "gorge", trough as "truff", come as "kome", and quay as "kway" (Shallice, Warrington, & McCarthy, 1983).

Cognitive neuropsychologists have interpreted these two patterns as reflecting specific damage to independent sub-components of the skilled reading system. The traditional information-processing model of the skilled reading system proposes that three processing routes link print to sound (see e.g., Patterson & Shewell, 1987; Temple, 1997). One route decomposes written words into their component graphemes and constructs a pronunciation via a system of grapheme-to-phoneme correspondences. This is called the ‘nonlexical’ or ‘phonological’ route. A second ‘direct’ or ‘lexical’ route recognises the whole written word form and uses this representation to recover the whole-word pronunciation. A third ‘semantic’ route uses the written word form to recover the word’s meaning, and this semantic representation is then used to recover the word’s pronunciation.

Assuming this model, acquired phonological dyslexia can be interpreted as a normal adult reading system that has experienced damage to the grapheme-to-phoneme processing route; existing words can be read via the whole-word recognition routes, but reading of nonwords is impaired. Acquired surface dyslexia, on the other hand, can be interpreted as an adult system that has experienced damage to both whole-word recognition routes. Words can only be read via decomposition into component graphemes and the application of grapheme-phoneme correspondences, resulting in the regularisation of exception words.

The patterns of errors defining these two sub-types of dyslexia have been reported in children, both in single case studies and in group studies (see e.g., Castles & Coltheart, 1993; Manis, Seidenberg, Doi, McBride-Chang, & Petersen, 1996). Researchers have employed similar explanations in the developmental case – once more appealing to the structure of the adult reading system, but now replacing the notion of ‘specific damage’ with the notion of ‘a specific failure to develop’. Thus, in developmental phonological dyslexia, children may be ‘having difficulty with [acquiring] one or more components of the nonlexical route’ (Coltheart, Rastle, Perry, Langdon, & Zeilger, in press), or may exhibit an overall system with ‘relatively normal development of semantic, lexical, and direct reading systems but with impairment in the acquisition of the phonological reading route’ (Temple, 1997, p. 206). In developmental surface dyslexia, children may be having difficulty acquiring ‘one or more components of the lexical route’ (Coltheart et al., in press), or exhibit a reading system in which the ‘direct and semantic reading routes have failed to become established properly’ (Temple, 1997, p. 192).

2.2 Past tense formation

Our second example comes from the domain of inflectional morphology, and in particular the formation of the English past tense. Once again, a model has been proposed for the functional structure of the adult system in which separate sub-components tackle different aspects of the task (Pinker, 1991, 1999). One component is claimed to be responsible for forming the majority of past tenses that conform to a rule ("add –ed") and for generating past tenses for novel verbs (wug-wugged). A second component memorises individual past tense forms, particularly those that are exceptions to the rule (e.g., go-went, sleep-slept, hit-hit, etc.).

In cases of acquired aphasia and in neurodegenerative diseases, adults can exhibit dissociations between performance on regular and exception past tense formation. Patients with non-fluent aphasia can be worse at producing and reading regular past tense forms than exception forms, while patients with fluent aphasia can be worse at producing and reading exception forms than regular forms (e.g., Tyler, Randall, & Marslen-Wilson, 2002; Tyler, de Mornay Davies, Anokhina, Longworth, Randall, & Marslen-Wilson, 2002; Ullman, Corkin, Coppola, Hickok, Growdon, Koroshetz & Pinker, 1997; Ullman, Izvorksi, Love, Yee, Swinney & Hickok, in press; though see Bird, Lambon Ralph, Seidenberg, McClelland, & Patterson, 2002). Similarly, patients with Parkinson’s disease can make more errors producing regular and novel +ed forms than exception forms, while patients with Alzheimer’s disease can make more errors producing exception past tense forms than regular past tense forms (Ullman, in press; Ullman et al., 1997). Assuming the dual-mechanism model of the adult system, these patterns of acquired deficit are taken to reflect selective damage to either the rule-processing component or the exception memorisation component.

Once again, parallel impairments have been reported in the developmental domain, in this case in two developmental disorders with a genetic origin, Specific Language Impairment (SLI) and Williams syndrome (WS). Ullman and Gopnik (1999) and van der Lely and Ullman (2001) reported that children with SLI perform poorly on past tense formation tasks and show a much smaller advantage of regular past tense formation over exception past tense formation – interpreted as a relative impairment in regular past formation. On the other hand, Clahsen and Almazan (1998) reported that children with WS exhibit a specific difficulty with generating exception past tense forms [Note 1]. Pinker (1999) offered an interpretation of these respective findings in terms of the adult model: SLI represents a case where the mutation of certain genes interferes with the development of the ability to inflect new and uncommon regular verbs. WS represents a case where the rule-based computational mechanism is intact but the memory mechanism for storing exception verbs is specifically impaired. Together these disorders are argued to represent a ‘genetic double dissociation … the first group of children rarely generalise the regular pattern; the second group of children generalise it freely’ (Pinker, 1999, p. 262).

Both the examples of dyslexia and past tense formation illustrate the way in which developmental impairments are often interpreted by appealing to the structure of adult models. They show, too, how the central double dissociation logic of adult cognitive neuropsychology has been extended to developmental cases. Dissociable behavioural impairments are taken as evidence of independent underlying mechanisms, by virtue of the claimed independent failure of those mechanisms to develop properly. We turn now to consider why the validity of this extension is questionable, before examining specific computational implementations of deficits in these target domains.

3. Is the cognitive neuropsychology framework appropriate for the interpretation of developmental disorders?

3.1 Development in a ‘static’ framework

When acquired damage causes selective cognitive deficits in normal adults, these deficits occur against a background of hitherto normal function. (This is also the case for acquired deficits in children, at least at the time of insult.) Such cognitive systems are hence discussed in terms of the cognitive mechanisms or processes that have become impaired compared to those that have remained intact.

When selective behavioural deficits are identified in developmental disorders, they are frequently characterised in the same way, in terms of developmental impairments against a background of normal development. We will refer to the second half of this characterisation as the assumption of Residual Normality. This is the assumption that, in the face of a selective developmental deficit, the rest of the system can nevertheless develop normally and independently of the deficit. It is this developmental assumption that allows researchers to relate patterns of deficits in developmental disorders to static models of the normal cognitive system. Because patterns of deficits are usually identified in older children, adolescents, or adults with the developmental disorder, static models of the normal adult system are often deemed an appropriate point of reference. In principle, however, deficits identified in the younger child could be compared against a static model of the normal system for the appropriate stage in development, were such a model to exist. In either case, the essential point here is that the assumption of Residual Normality permits developmental deficits to be compared against functional models that themselves have no developmental component.

The assumption of Residual Normality has been widely deployed in the study of atypical development, including in the case of disorders such as autism, WS, SLI, dyslexia, dyscalculia, Gilles de la Tourette syndrome, and developmental prosopagnosia. The following quotes illustrate three explicit renditions of the claim for Residual Normality in developmental disorders (italics added):

I suggest that the study of mental retardation would profit from the application of the framework of cognitive neuropsychology (e.g., McCarthy & Warrington, 1990; Shallice, 1988). In cognitive neuropsychology, one key question running through the investigator’s mind is "Is this process or mechanism intact or impaired in this person?" … In fact researchers in mental retardation have been searching for intact versus impaired cognitive processes for quite some time without discussing this in terms of modularity. (Baron-Cohen, 1998, p. 335 and Footnote 1)

The analysis of the developmental dyslexias offered by the dual-route model is that, just as each of the two routes can be selectively affected by brain damage with the other remaining intact, it is possible for a child to have difficulty acquiring one of the routes, with the other being acquired at a normal rate. (Coltheart et al., 1993, p. 591)

Within modular theories, the linguistic performance of subjects with [developmental] language impairments may reflect the architecture of the normal system but with selective components of this system under- or over-developed. (Clahsen & Temple, in press)

Interestingly, Residual Normality (henceforth RN) is less frequently deployed as a developmental hypothesis in paediatric neuropsychology. For children with acquired brain damage, the clinically driven focus is usually on recovery. Researchers tend to eschew static models and explore the effect of cerebral insult to the potentially plastic process of development. Because structural damage is seen in the context of a dynamic and interactive developmental process, there is recognition of the possible influences of compensation within the cognitive system and of disruption to the acquisition of further cognitive skills, as well as family and social factors (Anderson, Northan, Hendy, & Wrennall, 2001; see Thomas, in press, for discussion). If the undamaged part of the cognitive system compensates / alters across development in response to the part that has suffered a selective deficit, the undamaged part may not follow the normal path of development, in which case RN would not hold. [Note 2]

On the other hand, researchers in developmental disorders of a genetic origin routinely deploy RN in their explanations, as we have seen in the cases of SLI, WS, and dyslexia. This is probably due to the fact that such disorders are used (in part) for theoretical purposes within the cognitive neuropsychology framework, as a source of evidence about the structure of the normal cognitive system (and because of the genetic origins, about the potential innateness of that structure). Residual Normality is an assumption (often implicit) about how development takes place. But, is it likely to be correct? In the next section, we consider two opposite claims. First, we examine the claim that no answer to the preceding question is necessary: from the perspective of cognitive neuropsychology, development can be ignored in the study of behavioural deficits in developmental disorders. Second, we consider the claim that, not only must development be incorporated, but that when it is, the assumptions required to use the cognitive neuropsychology framework are fatally undermined. We then propose a resolution of these two opposing positions.

3.2 Development and modularity

Jackson and Coltheart (2001) have recently defended the use of the cognitive neuropsychology framework for studying developmental disorders. They have argued that the process of development is not relevant to identifying intact and impaired processes in a cognitive system, so long as modularity can be assumed for that system. In their view, the framework is equally suitable in both acquired and developmental cases for establishing what they call the proximal cause of the behavioural impairment. By this they mean ‘what is wrong with the cognitive system right now’, irrespective of whether the original cause was brain damage, atypical development due to a genetic abnormality, or even poor schooling. These latter causes are what Jackson and Coltheart term distal. They maintain that the distal causes of an impairment are potentially independent of the common proximal cause, allowing one to consider acquired and developmental deficits within the same framework. Although Jackson and Coltheart agree that a full explanation will involve both proximal and distal causes, they defend the cognitive neuropsychology framework as the appropriate way to reveal the proximal cause of any behavioural impairment, independent of distal causes.

By way of example, Jackson and Coltheart discuss the case of phonological dyslexia which, as we saw earlier, is defined by a difficulty in reading novel words. They argue that, in relation to the traditional cognitive model of reading, both acquired and developmental phonological dyslexia can be assigned the same proximal cause, namely a problem with the processing route that maps graphemes to their respective phonemes (the ‘GPC’ route). What differs in the acquired and developmental cases is the distal cause, respectively brain damage and some developmental (perhaps genetic) anomaly. In short, these authors argue strongly that synchronic similarities in behavioural deficits between acquired and developmental cases can be linked by a common underlying cause at a cognitive level of description, and that this cognitive cause can be established by the methods of cognitive neuropsychology.

The extension of the cognitive neuropsychology framework to developmental disorders has, however, been criticised on three main grounds (Bishop, 1997; Karmiloff-Smith, 1997, 1998). The first criticism is that the framework unnecessarily warps the type of data that is collected in the developmental case by focusing on the search for specific deficits, only superficially examining areas of presumed intactness. The second criticism is that the framework is unable to comment on one of the key contributory causes of the patterns of behavioural impairments found in developmental disorders, namely the process of development itself. Where different developmental hypotheses exist for a given impairment, the cognitive neuropsychology framework cannot distinguish between them. The third criticism is that the assumption of a universal modular structure in the cognitive system on which the framework relies may not hold in the developmental case. We briefly look at each claim in turn.

First, Bishop (1997) has argued that developmental and acquired disorders require empirical approaches with different emphases. While researchers in adult cognitive neuropsychology look for single cases showing dissociations between cognitive abilities (as existence proofs of their dissociability), developmental disorders are likely to show patterns of associated impairments as a consequence of cascading effects of early deficits on subsequent development. Particular developmental disorders will be best identified by seeking consistent patterns of associated impairments in group studies. Karmiloff-Smith (1997, 1998) argues that, particularly for disorders of a genetic origin, behavioural impairments at the end of development are likely to be the outcome of an extended atypical developmental trajectory, determined in part by the initial structural anomalies in the cognitive system, and in part by the interactions of that system with its environment. She argues that, methodologically, researchers should not search exclusively for selective deficits at the end of the developmental process but should also seek differences in infancy, where the origins of the atypical trajectory may be revealed.

Second, both Bishop and Karmiloff-Smith claim that the cognitive neuropsychology framework is impoverished in the developmental domain by its exclusion of the developmental process as an explanation of patterns of behavioural impairments in the adult state. For Bishop (1997), these processes include top-down as well as bottom-up interactions between cognitive sub-systems during development, compensatory processes, and timing differences that may lead to changes in the patterns of impairment over time. As an example of the latter, she points to the hypothesis that in children with SLI, early problems in auditory discrimination which occur at a crucial stage in language development cause a lasting legacy of language impairment, even if the auditory problems subsequently resolve themselves and are undetectable. Karmiloff-Smith (1997, 1998) argues that the causes of behavioural impairments in developmental disorders are likely to be found in the low-level computational properties of the neonate brain, such as atypical neuronal firing levels or local connectivity. Such low-level differences can only lead to behavioural impairments via the developmental process, a process that may exaggerate some initial computational differences but attenuate others, depending on the nature of the domain. In the former case, the developmental process itself must be considered a key cause of the subsequent impairments.

In response to this criticism, it is worth noting Jackson and Coltheart’s (2001) position that the cognitive neuropsychology framework is not designed to comment on ‘distal’ causes of deficits such as development, merely on the ‘proximal’ causes, that is, the functional deficits shown in the current state. As such, the cognitive neuropsychology framework simply does not have the power to address the questions of concern to Bishop and Karmiloff-Smith.

The third criticism of the extension of the cognitive neuropsychology framework to developmental disorders is potentially the most serious. The only necessary a priori assumption required to employ that framework is that in many domains the cognitive system is modular, so that selective deficits in behaviour may be traced to independent functional components. Bishop (1997) has suggested that two of the defining properties of modules – innateness and imperviousness to top-down feedback – are clearly challenged by processes of development. Some cognitive abilites acquired by children are not innate, and top-down processing is used a great deal by children when they are performing cognitive tasks. There are problems with this criticism that concern the precise definition of what constitutes a module. Fodor (1983) identified several possible characteristics of modules (that they be domain specific, innately specified, informationally encapsulated, fast, hardwired, autonomous, and not assembled). However, none of these was stipulated as necessary properties, rather those likely to be associated with modular processing (Coltheart, 1999). There has been significant disagreement concerning the key properties of a module, if indeed modularity is to remain a single explanatory concept (Thomas & Karmiloff-Smith, 1999). For instance, for Fodor (2000), the most important property is encapsulation; for Coltheart, it is domain specificity. Until there is agreement on what constitutes a functional module, it will prove difficult to demonstrate whether development violates the necessary conditions and so clearly undermines the use of the cognitive neuropsychology framework to explain developmental disorders.

However, a more grave criticism lies in wait. Even if we accept a (loose) notion of modularity, it may be that in some types of developmental disorders, individuals do not share the same set of functional modules as in the normal cognitive system. Karmiloff-Smith (1998) argues that neurobiological evidence of the development of neocortex in infants strongly suggests that genes do not code directly for high-level cognitive modules, but that processing structure is emergent and experience-dependent, the outcome of a developmental process. The implication is twofold. First, if modular structure is the product of development (Karmiloff-Smith, 1992), in cases of atypical development, the resultant modular structure may not be the same as in the normal adult case. Second, even if early damage is limited to a specific cognitive component, if the modular structure of the cognitive system is sensitive to experience, compensation may occur elsewhere in the system, altering the function of the initially intact components.

We can illustrate this idea with reference to Jackson and Coltheart’s own example, phonological dyslexia. Recall that these authors attribute the impairment in nonword reading in the developmental and acquired cases of this disorder to a common proximal cause, an impairment to the GPC route in the traditional model of adult performance. However, in the pure case of the disorder, this common proximal cause is actually shorthand for "the GPC route is impaired and the lexical routes are functioning normally". In cases of adult brain damage, this seems possible. But in the developmental case, Bishop and Karmiloff-Smith’s position is that problems with the GPC route may also lead to differences in the way in which the lexical routes themselves develop. Under this view, the developmentally disordered system could comprise a GPC route and two lexical routes [Note 3], all of which are functioning atypically. Together, however, these routes would then manifest a behavioural impairment in nonword reading.

The extent to which modular structure can vary in cases of atypical development is currently an open question (see Tager-Flusberg, 2000, for discussion). Indeed, the degree of plasticity across different cognitive systems and their underlying neural substrates is an area of active investigation (Thomas, in press). For instance, it remains to be seen whether limits to plasticity are different in cases of acquired damage in early childhood than in individuals with genetic developmental disorders. Nevertheless, if a common modular structure cannot be assumed, it is evident that developmental disorders cannot be straightforwardly related to static models of the normal cognitive system. It is therefore of key importance to understand how modular structure emerges and to what extent this process can be disrupted.

3.3 A reconciliation

Jackson and Coltheart’s (2001) claim that it is possible to study independently the endstate of a developmental disorder (‘proximal’ cause) and the developmental process by which it was reached (‘distal’ cause) must be considered carefully. Although it may be possible to study them independently, they are not in fact independent but mutually constraining. It would be unwise to characterise the endstate of a cognitive system in a form that could not be reached by a feasible developmental process (Piaget, 1971). This point does not just apply to the study of developmental disorders. In the same way, theories within normal cognitive psychology and normal developmental psychology must be mutually constraining, despite existing as separate fields of inquiry.

Is Residual Normality a feasible type of developmental account? Is it realistic to expect developmental patterns of specific deficits to stand against a background of normal modular function? Karmiloff-Smith and colleagues have argued that a priori, the effects of genetic abnormalities are likely to be widespread throughout the brain and unlikely to be isolated to single high-level cognitive modules (Karmiloff-Smith, 1998; Karmiloff-Smith, Scerif, & Thomas, 2002). When marked behavioural deficits arise in a single domain, it is likely that the cognitive processes underlying apparently intact performance in other domains are also atypical in subtle ways – which may go undetected without the sensitive testing of abilities outside of the main behavioural impairment. Such investigations are prompted only by a realistic developmental hypothesis. In support, Karmiloff-Smith cites examples such as Williams syndrome, where ostensibly ‘intact’ face recognition was subsequently shown to be achieved by atypical cognitive processes (see below), and developmental dyslexia, where motor deficits have been found in children previously thought only to have a highly selective problem in reading (e.g., Bishop, 1990; Fawcett, Nicolson, & Dean, 1996; Hill, 1998).

Despite a priori leanings, one might view the issue as one to be determined merely on empirical grounds. Are there or are there not selective cognitive deficits in developmental disorders? Unlike modularity, Residual Normality is not an assumption that is a priori required in order to employ the cognitive neuropsychology framework. Instead, it is a hypothesis invoked to explain a particular set of empirical data. In this sense, Jackson and Coltheart are correct that the characterisation of the current deficit in a disorder could be blind to the causes of that disorder. Should RN, therefore, be seen as a neutral hypothesis, simply ‘calling the data’? The answer is no. The reason is that theory and data collection are clearly not independent. Disorders are typically first investigated by application of a range of standardised tests to establish which areas show behavioural deficits and which show behaviour in the normal range. If one has a predilection to believe that RN is true, the risk is that scores in the normal range will be accepted as final evidence of normal underlying processes, and data collection cut short prematurely. If, however, one is more suspicious of RN, as developmentalists usually are given the interactive nature of the developmental process, then there is a motivation to perform more fine-grained analyses to establish whether apparently normal behaviour is actually being achieved by atypical underlying processes. If so, then deficits are not specific.

An example illustrates the point. Despite deficits in visuo-spatial processing, face recognition in Williams syndrome was initially reported as a ‘spared’ ability, on the basis that scores on standardised tests fell within the normal range (Bellugi, Wang, & Jernigan, 1994; Udwin & Yule, 1991). This prompted claims that the development of systems underlying spatial reasoning is disrupted in WS, but the systems underlying face perception develop normally (Pinker, 1999). If one were happy to invoke RN, one would stop at this point, and perhaps use WS in combination with developmental prosopagnosia as a double dissociation implying the independence of face processing structures from general visuo-spatial processing.

However, suspicion of RN in genetic developmental disorders actually led to further investigation of this apparently intact ability. Closer examination of the items within the standardised tests on which individuals with WS performed well, and those on which they performed poorly, suggested that their recognition of faces proceeded atypically. Specifically, individuals with WS were better at recognising faces which could be identified by single features than those which required computation of configurations of features; control participants showed no such distinction (Karmiloff-Smith, 1997).

Subsequent research with specially designed face stimuli and geometrical patterns supported the hypothesis that face processing follows an abnormal developmental course in WS (Deruelle, Mancini, Livet, Cassé-Perrot, & de Schonen, 1999; Humphreys, Ewing & Karmiloff-Smith, 2002). Electrophysiological brain imaging studies also indicate anomalous underlying processing in WS, including reduced sensitivity to inverted faces compared to normal faces, and an absence of the progressive developmental pattern of right hemisphere localisation found in typically developing controls (Grice et al., 2001; Mills et al., 2000). In short, when examined in detail, a superficially intact ability turned out to be associated with quite atypical cognitive and brain processes.

In reconciling the two opposing positions, a subtler picture emerges. The static cognitive neuropsychology framework is not in principle inappropriate for the study of developmental disorders. For instance, there is nothing intrinsic to the non-developmental approach espoused by Jackson, Coltheart and others that would prevent it from verifying whether RN is true in a particular child, domain, or disorder. Empirical data will eventually reveal if there are selective cognitive deficits in a given case. If there are, it will be necessary to construct and test a developmental account in which RN holds. However, a tendency simply to assume RN – conditioned by research in adult cognitive neuropsychology where a background of normal function can often indeed be assumed – leads to inadequate data collection. These data are then insufficient to establish beyond reasonable doubt that normal processes underlie behavioural scores that fall within the normal range. This bias has impeded progress in the study of developmental disorders, and particularly in building links to developmental cognitive neuroscience, to developmental neurobiology, and ultimately to the genetic anomalies that underlie many disorders.

This state of affairs has arisen because insufficient attention has been applied to the process of development itself in the study of developmental disorders. In particular, since RN is typically implicitly assumed, there has been no elucidation of the necessary conditions under which it would actually hold.

4. Under what developmental conditions would we expect to see similarities between developmental and acquired disorders?

Acquired deficits and developmental deficits can be related to the same model of the normal cognitive system if a component that can be damaged in the endstate can also fail to develop in isolation of the entire developing cognitive system. This will occur under the following conditions:

By contrast, similarities between developmental and acquired disorders will not occur under the following conditions:

Similarities may occur between developmental and acquired disorders that cannot be related to the same model of the normal cognitive system under the following conditions:

What conditions actually hold in cognitive development? To address this issue, we need to know the answer to several further questions. First, how does the process of development interact with damage to a cognitive system to produce endstate behavioural impairments? Second, does the process of development always play a central role in producing the impairments – and if so, does development tend to produce a different pattern of endstate impairments to acquired damage in the endstate? Third, what is the origin of the specialised functional components stipulated in adult models? If components are not innately specified, how do they emerge through a process of development, as in Karmiloff-Smith’s theoretical notion of emergent modularisation (1992)? Finally, how can such a process of emergent specialisation be affected or unaffected by disruption to the computational conditions existing in the early cognitive system?

These are difficult questions, and the field of developmental cognitive neuroscience is some way from having answers to all of them. In what follows, we explore potential answers to these questions by examining computational models of cognitive development. To retain a focus, we restrict our investigation to computational models applied to our two target domains, reading and past tense formation, and to research based on a single, influential class of computational learning systems, connectionist networks.

5. Computer modelling

The modelling section comprises three parts. In the first, we compare the methods that researchers have used to extend connectionist models of normal development and adult function to cases of developmental and acquired impairments in reading and past tense formation. In the second, we introduce new simulations to gauge the contribution of the developmental process to producing patterns of endstate impairments, within the framework offered by those models. Specifically, we investigate the extent to which the process of development itself is a causal factor of the specific pattern of impairments shown in a developmental disorder. In the third, we use new simulations to examine the assumption of Residual Normality. In a system with emergent specialisation of function (i.e., one exhibiting modularisation), how viable is Residual Normality? Specifically, when one component of the system is prevented from developing normally, does the rest of the system nevertheless develop independently and normally? If not, what are the conditions under which learning systems would show RN, so that developmental impairments could be interpreted in terms of selective deficits to an adult model? From a behavioural perspective, how does assuming RN affect the inferences we can make from dissociations in behaviour to underlying structures?

6. Connectionist models of acquired and developmental deficits

Connectionist networks have been widely used in recent years to construct models of cognitive processing in adults (see e.g., McLeod, Plunkett, & Rolls, 1998; Rumelhart & McClelland, 1986a). Since one of the main strengths of these networks is their ability to learn input-output functions, they have increasingly been used to model the development of cognitive processes (Elman, Bates, Johnson, Karmiloff-Smith, Parisi & Plunkett, 1996; Thomas & Karmiloff-Smith, 2002). [Note 4].

When there exists a working model of a normal adult system, the validity of the model can be further tested by investigating its ability to capture patterns of acquired deficits when the model is damaged in various ways. Connectionist models have been used to capture deficits in a number of acquired disorders, including dyslexia, aphasia, alexia, prosopagnosia, epilepsy, phantom limbs, stroke, frontal lobe damage, unipolar depression, Parkinson’s disease, Alzheimer’s disease, and schizophrenia (see e.g., Reggia, Ruppin, & Berndt, 1996; Stein & Ludik, 1998). Where knowledge is encoded in the connectionist network through a training process, acquired deficits are modelled by damaging the network after that training process is complete.

Where connectionist networks have been used to model phenomena within cognitive development, this has permitted the investigation of developmental disorders when development is made to follow an atypical trajectory (Oliver, Johnson, Karmiloff-Smith & Pennington, 2000; Thomas & Karmiloff-Smith, 2002). Although work in this area is relatively new, models have already been put forward attempting to capture behavioural deficits in developmental dyslexia, SLI, WS, and autism (see Thomas & Karmiloff-Smith, 2002, for a review). In developmental models, knowledge is encoded in network systems via a training process, whereby the model aims to simulate both the developmental trajectory and the endstate abilities of the system. In contrast to models of acquired deficits, changes in models of atypical development are made to the network or to the way it learns prior to the training process. These changes in the computational constraints of the learning system lead to atypical trajectories of development, and an endstate performance that may exhibit behavioural impairments.

The contrast between connectionist models of acquired and developmental disorders is a fairly clear one. In the acquired case, damage of some sort is applied to the model at the end of a training process. In the developmental case, it is applied prior to the training process [Note 5]. Immediately one might ask, do modellers use the same kind of damage in each case, and does this damage cause the same behavioural impairment? If the answer to both is yes, one might conclude that the developmental process [Note 6] plays a limited causal role in generating the pattern of behavioural impairments. On the other hand, if the answer is no and different impairments result from the same damage in the two cases, then the implication would be that the developmental process is an important component in determining the pattern of impairments in developmental disorders. In the next two sections, we consider this question in relation to connectionist models of reading and past tense formation.

6.1 Connectionist models of Reading

Connectionist models of reading assume that the computational problem in this domain is to learn to map between representational codes of the written form of a word, the spoken form of a word, and the word’s meaning (Plaut, McClelland, Seidenberg, & Patterson, 1996; Seidenberg & McClelland, 1989). Typically, this involves three connectionist networks, one to map from orthography to phonology, one to map from orthography to semantics, and one to map from semantics to phonology, (although in many models only the first of these networks is implemented; but see Harm & Seidenberg, 2001, for an exception). Each network has a three-layered structure, comprising an input layer, an output layer, and a layer of hidden units in between. Some models employ recurrent connections that allow cycling activation patterns, so that the model settles into a stable output state. Sometimes a layer of ‘clean-up’ units is connected to the output layer to aid this settling process (see e.g., Harm & Seidenberg, 1999, 2001).

Within these models, acquired dyslexia is produced by different kinds of damage to the trained model. Acquired surface dyslexia, a deficit in reading exception words, has been modelled by damaging the network that maps orthography to phonology, via the removal of hidden units or the severing of connections (Patterson, 1990; Patterson, Seidenberg, & McClelland, 1989). Such damage produces a greater impairment on reading exception words than regular words. However, failures of this approach to fit more extreme patterns of surface dyslexia subsequently led to the claim that exception word reading might be achieved via an indirect semantic route, particularly in the case of low frequency words. Acquired surface dyslexia might represent damage to this indirect route, so that naming must proceed via the orthography-to-phonology route alone, a route that has not learnt to name low frequency exception words and, as a result, regularises them (Patterson, Plaut, McClelland, Seidenberg, Behrmann, & Hodges, 1996; Plaut et al., 1996; see Coltheart et al., in press, for discussion).

Previously, acquired phonological dyslexia was not explicitly simulated within these learning models because, in theoretical terms, it corresponded to selective damage to the entire orthography-to-phonology network; since most models only implemented the orthography-to-phonology network itself, such a lesion was outside their scope. Theoretically, lesioning the direct orthography-phonology route would mean that reading must be accomplished primarily or exclusively via the semantic route, so that novel words without a stored meaning would be severely impaired. However, recently Harm and Seidenberg (2001) have implemented the full connectionist reading model, including pathways between phonology, orthography, and semantics. The authors report a manipulation intended to simulate acquired phonological dyslexia, whereby noise is added to processing within the phonological component of the model (i.e., the phonological output units and associated clean-up units). Harm and Seidenberg demonstrate how this impairs the nonword reading of the model much more severely than its reading of words in the training set (Harm, pers. comm., June, 2001). It also accounts for several effects found in the nonword reading of acquired phonological dyslexics that were previously taken as support for the traditional model of reading.

Doubts have been raised as to the full developmental validity of several of these connectionist reading models. Nevertheless, we consider them here for their insight into systems that acquire representations appropriate for a cognitive domain through a learning process. Patterns of developmental dyslexia have been simulated by applying the relevant damage prior to this learning process. Thus surface dyslexia has been simulated in a number of models by removing units from the hidden layer in the orthography-to-phonology network prior to training (Harm & Seidenberg, 1999; Plaut et al., 1996; Seidenberg & McClelland, 1989). This damage produces a greater impairment in exception word reading than regular word reading or nonword reading, particularly for low frequency exception words. Because exception word performance generally lags behind regular word performance during training, some authors have in addition simulated poorer performance in reading exception words simply by giving the network less training, or less efficient training (Bullinaria, 1997; Harm & Seidenberg, 1999).

Developmental phonological dyslexia has been simulated in two main ways. The first approach reflects a prior claim that developmental phonological dyslexia may correspond to phonological representations (and perhaps orthographic representations as well) that have insufficient componentiality (Manis et al., 1993; Plaut et al., 1996). Harm and Seidenberg (1999) implemented this proposal by restricting the computational properties of the phonological component of their model (the phonological output layer, its recurrent connections, and its clean-up units). Their manipulations included the removal of the clean-up units and severing half the recurrent connections between the phonological units, or restricting the size of the weights in the recurrent connections, or making computations within the phonological component more noisy. All of these resulted in poorer nonword reading, and some impacted on exception word reading as well. Brown (1997) also demonstrated that when both orthographic representations and phonological representations are deliberately constructed with reduced componentiality, reduced nonword reading is found at the end of training.

The second approach seeks to constrain the nature of the computational function that can be learnt between orthography and phonology. For instance, Zorzi, Houghton, and Butterworth (1998) have argued that the reading system is better conceived of as including a network in which orthographic representations are directly connected to phonological representations. Lack of these direct connections in the initial architecture, and the presence only of a route mediated by hidden units, prevented their network from learning a simple function relating orthography and phonology, and so generalisation was reduced. Brown (1997) used another constraint on the computational function by employing several three-layer networks with progressively reduced numbers of hidden units and comparing them when their performance on regular and exception words was matched. (Necessarily, this meant that the networks with fewer hidden units had experienced more training.) Networks with fewer hidden units were unable to learn a robust function linking orthography and phonology, and so showed poorer nonword reading.

In summary, both surface and phonological dyslexia permit a direct comparison between simulations of the acquired and developmental forms. In the case of surface dyslexia, initial approaches used the same method, the removal of hidden units and/or connections, to simulate the same impairment, namely a deficit in exception word reading. Latterly, acquired accounts have appealed to the entire lesion of unimplemented routes. In the case of phonological dyslexia, several methods have been used to simulate the developmental impairment, including either altering phonological and/or orthographic representations, or constraining the computational function that the network can use to link orthographic and phonological codes. One method used by Harm and Seidenberg (1999), the addition of noise to processing within the phonological component of the model during training, was also used by Harm and Seidenberg (2001) to simulate acquired phonological dyslexia, where such noise was added to a normally trained model. Indeed, Harm and Seidenberg (2001) specifically comment that the ‘form of impairment is identical’ in the two cases (p. 80). In short, on the basis of connectionist models of reading, one might conclude that the same form of damage before and after training creates the same behavioural deficit – as if the developmental process itself contributed nothing to the nature of that behavioural deficit.

6.2 Connectionist models of Past tense formation

Connectionist models of past tense formation assume that the computational problem in this domain is to learn to map between representational codes of the phonological form of the stem of a verb and a phonological form of that verb’s past tense (Plunkett & Marchman, 1993, 1996; Rumelhart & McClelland, 1986b), sometimes in the presence of the verb’s semantic representation (Hoeffner, 1992; Joanisse & Seidenberg, 1999), sometimes in the presence of more restricted semantic information (MacWhinney & Leinbach, 1991; Plunkett & Juola, 1999).

Joanisse and Seidenberg (1999) sought to simulate two kinds of acquired deficits in their model of past tense formation, either an impairment in producing regular past tense forms (found in cases of non-fluent aphasia and Parkinson’s disease), or an impairment in producing exception past tense forms (found in cases of fluent aphasia and Alzheimer’s disease). Impairment to the formation of regular past tenses was achieved by randomly severing connections between the phonological output layer and that layer’s bank of clean-up units in the trained network. However, the fit to patient data was not ideal here, since the model showed a much larger decrement in extending the past tense rule to novel items (wug-wugged) than on the formation of existing regular past tenses (talk-talked). Patients, on the other hand, can show similar decrements to both (e.g., Ullman et al., 1997) [Note 7]. It is possible that formation of existing regular past tenses was driven too much by word-specific information because the model was trained only on a single inflectional paradigm. In a larger model in which individual words can be inflected in several different ways, regularities may be pushed further into the phonological part of the network, so that regular verbs would also be amenable to selective damage. Impairment to the formation of exception past tenses, on the other hand, was achieved by randomly severing connections between the semantic representations and their clean-up units in the trained model, while adding noise to the semantic activation level. This gave a good fit to patient data (though see Tyler, de Mornay Davies et al., 2002).

Developmental problems with regular past tense formation have been reported in SLI, although latterly Ullman and collaborators have argued that the deficit is relative, in that the normal advantage for regular verbs over exception verbs is much reduced and most past tense forms are uninflected (Ullman & Gopnik, 1999; van der Lely & Ullman, 2001). Hoeffner and McClelland (1993) sought to simulate the developmental regular verb deficit by altering the phonological representations of their model prior to training. The phonological representations were changed in line with a hypothesis that individuals with SLI have difficulty processing fast changing auditory signals, which particularly impairs perception of phonemes such as /t/ and /d/ (e.g., Tallal & Stark, 1981; though see Bishop, Carlyon, Deeks & Bishop, 1999; Joanisse & Seidenberg, 1998). Both these phonemes are involved in marking the regular past tense form in English. In the model, word final stops and fricatives were given weaker representations in the normal case to reflect their lower salience. In the impaired model, the overall strength of the phonological representations was weakened, exaggerating the disadvantage of word final stops and fricatives. When the model was trained with these altered representations, the result was poorer performance on past tense formation, such that regular past tenses showed a greater impairment than exceptions, and the predominant error pattern was a failure to inflect the verb stem. Moreover, just as in SLI, the model showed an impairment on morphemic phonemes (e.g. the final /d/ in cared) but not phonologically identical phonemes which were non-morphemic (e.g. the final /d/ in card). The model was able to produce a differential impairment for regular verbs in its trained state, but did not successfully simulate the very low and equal performance on both regular and exception verbs (see Ullman & Gopnik, 1999, for further discussion of the model).

Joanisse (2000) attempted to simulate the pattern of SLI data by applying processing noise to the phonological representations of his past tense model throughout the training process. Here, the model’s level of correct performance on regular, exception, and novel verbs was closer to that shown in recent empirical data (van der Lely & Ullman, 2001), with low scores on all types. However, the model did not reproduce the predominant error pattern of uninflected stems found in SLI, suggesting that a future model needs to incorporate aspects of both the Joanisse and Hoeffner & McClelland models.

In Williams syndrome, it was initially reported that there was a selective deficit in forming the past tense of exception verbs (Clahsen & Almazan, 1998). However, a larger study suggested that this apparent deficit was actually a consequence of language delay, since performance on exception verbs lags behind that on regular verbs in normal development, and language is typically delayed in WS (Thomas et al., 2001). When language delay was controlled for in this latter study, the greater deficit on exception verbs in WS disappeared. The Thomas et al. study did, however, reveal reduced generalisation of the past tense rule to novel forms in WS, a pattern which persisted even when language delay was controlled for. Using a past tense network that mapped from verb stem to past tense form in the presence of semantic information, Thomas and Karmiloff-Smith (in press a) explored the manipulations to the normal model that could reproduce this pattern of developmental data. Various claims have been made proposing that there are subtle deficits to the language system in Williams syndrome. These include the proposals that language development may be ‘hyper-phonological’, relying to a greater extent on phonological than lexical-semantic information (Grant, Karmiloff-Smith, Gathercole, Paterson, Howlin, Davies, & Udwin, 1997; Vicari, Brizzolara, Carlesimo, Pezzini, & Volterra, 1996; Vicari, Carlesimo, Brizzolara, & Pezzini,1996; Volterra, Capirci, & Caselli, 2001), that the phonological representations themselves may be atypical and perhaps rely on sensitive auditory processing (Karmiloff-Smith, Grant, Berthoud, Davies, Howlin, & Udwin, 1997; Majerus, Palmisano, van der Linden, Barisnikov & Poncelet, 2001; Neville, Mills, & Bellugi, 1994), that lexical-semantic representations may be atypical (Clahsen & Almazan, 1998; Rossen, Klima, Bellugi, Bihrle, & Jones, 1996; Temple, Almazan, & Sherwood, in press), or that lexical-semantics may be poorly integrated with phonology (Frawley, 2002; Karmiloff-Smith et al., 1998).

Thomas and Karmiloff-Smith found that a manipulation of the phonological representations that reduced their similarity and redundancy was sufficient to reproduce the delay for regular and exception past tense forms, as well as the reduction in generalisation. However, the pattern could also be reproduced when noise was added to the information coming from the lexical-semantic system. By contrast, slowed learning failed to produce a reduction in generalisation, suggesting that delay alone was insufficient to explain the data. While elimination or weakening of the lexical-semantic contribution produced a selective delay (but no final impairment) for exception verbs, it also failed to show the reduction in generalisation. In short, manipulations to phonology or to the integration of phonology and lexical-semantics could simulate the WS data, but a manipulation to lexical-semantics alone could not.

What if the WS data had shown a selective deficit in exceptions as initially reported in the syndrome – could the model have shown this pattern? Performance on exception verbs could be preferentially delayed under at least two conditions: by attenuating lexical-semantic input, or by restricting the computational complexity of the representations the system could learn (e.g., by employing a two-layer network or by reducing the numbers of hidden units by a certain calibrated amount). However, in both cases, the delay was not associated with an endstate impairment. The only way to achieve such a final deficit in exception verbs was to combine manipulations (for instance, attenuating lexical-semantic information and slowing down learning / terminating training at a point where regulars had reached ceiling but exceptions had not; or attenuating lexical-semantic input while restricting computational complexity).

In sum, for inflectional morphology, we have direct comparisons of attempts to simulate acquired and developmental deficits to both regular and exception verbs. Impairments to regular verbs were simulated by damage to phonology either prior to or following training. It is worth noting that in the Hoeffner and McClelland model, a specific regular deficit in the developmental case was achieved by effectively targeting information that encoded the regular rule. On the other hand, a deficit to regular performance in the acquired case was achieved with more general damage (putting aside, for a moment, the fact that acquired damage impaired generalisation of the rule rather than its application to existing verbs). For exception verbs, an acquired impairment was simulated by damaging the input from semantics. Similar damage in a developmental model delayed the learning of exception past tense forms but, importantly, failed to produce an impairment at the end of training. Broadly, then, phonological damage targeted regular inflection / generalisation while semantic damage targeted exception inflections.

6.3 Summary

What can we conclude from the detailed comparison of models of acquired and developmental deficits in these two domains? The results are somewhat contradictory. For both surface and phonological dyslexia, acquired and developmental approaches employed the same kind of damage to produce the same impairment – the intervention of the developmental process did not appear to contribute to the pattern of impairments. For impairments to regular past tense formation, however, more specific damage was required prior to training than at the end of training to generate a specific impairment in regular past tense formation – as if the developmental process risked changing the nature of the impairment. And indeed, for the impairment of exception past tense forms, damage to semantic input only produced a delay in acquiring these forms, while damage at the end of training produced a marked behavioural deficit. In other words, in this case the developmental process overcame initial damage to produce a successful outcome via an altered developmental trajectory.

Our ability to gauge the contribution of the developmental process to the final impairments is compromised by the fact that in each of the preceding cases, the comparisons have involved separate models whose implementations have differed in detail. No models have afforded a direct comparison of the outcome of the same damage carried out prior to versus following the training process. For this reason, in the next section, we describe a simulation designed specifically to make such a direct comparison, something never hitherto undertaken in the literature.

7. Simulation One: Comparing startstate and endstate damage

7.1 Introduction

The design of the following simulations is relatively straightforward. We take a given problem domain and model architecture, and train the model on the domain. This establishes its ‘normal performance’. We then run the model in two conditions. We either damage the model prior to its training process, or we damage it following its training process. Any difference in the pattern of impairments in the two cases must arise from the contribution of the training process, i.e., from development.

Three forms of damage are reported: (1) removal of a proportion of connections from the network; (2) addition of noise to the activation levels in the network; (3) alteration of the discriminability of the processing units, i.e., the ability of a unit to produce large changes in its activation state in response to small changes in the input it receives. All manipulations have been widely used in modelling both acquired and developmental deficits. For example, lesioning of network structure has been used to model dyslexia (e.g., Hinton & Shallice, 1991; Patterson et al., 1996; Plaut et al., 1996), alexia (Mayall & Humphreys, 1996), phantom limbs (Spitzer, 1996), stroke (Reggia, Goodall, Chen, Ruppin, & Whitney, 1996), Alzheimer’s disease (Ruppin, Horn, Levy & Reggia, 1996), prosopagnosia (Farah, O’Reilly, & Vecera, 1993), schizophrenia (Hoffman, 1996), and autism (Cohen, 1998). Addition of noise to processing has been used to model dyslexia (Harm & Seidenberg, 1999, 2001), SLI (Joanisse, 2000), language in Williams syndrome (Thomas & Karmiloff-Smith, in press a), and Alzheimer’s disease (Joanisse & Seidenberg, 1999). Alteration of unit discriminability has been used to model schizophrenia (Cohen & Servan-Schreiber, 1992), executive dysfunction (Levine, 1996) and the effects of ageing (Li & Lindenberger, 1999).

For comparison with the preceding discussion, we test the contribution of the training process in a domain analogous to past tense formation.

7.2 Simulation details

Architecture: A three-layered feedforward network was used, with the architecture as shown in Figure 1a).

 

Figure 1: Architectures of the models used in Simulations One and Two. a) Three-layer pattern associator. b) Dual-route pattern associator.

a)

 

b)

 

Training set: The training set was taken from Plunkett and Marchman (1993) and comprises an artificial language set constructed to reflect the most important structural features of English past tense formation. There were 500 monosyllabic verbs, constructed using consonant-vowel templates and the phoneme set of English. Phonemes were represented over 6 articulatory features, and separate banks of units were used to represent the initial, middle, and final phonemes of each monosyllable. The output layer incorporated an additional two features to represent the affix for regular verbs. This corresponds to a network with 18 input units and 20 output units. However, the current simulations involved removing connection weights, and Bullinaria and Chater (1995) have argued that when network models are lesioned, resulting patterns of impairments can be artefactual if very small networks are used. In an attempt to avoid this, the representational scheme was duplicated five times, with the addition of a small amount of noise (whereby the binary features in each duplication had a 20% chance of flipping their state). This preserved the nature of the computational problem faced by the network, but increased the network’s size to 90 input units and 100 output units. Fifty hidden units were used in the hidden layer.

There were four types of verbs in the training set: (1) regular verbs which formed their past tense by adding one of the three allomorphs of the +ed rule, conditioned by the final phoneme of the verb stem (e.g., tame-tamed, wrap-wrapped, chat-chatted); (2) exception verbs whose past tense form was identical to the verb stem (e.g., hit-hit); (3) exception verbs which formed their past tenses by changing an internal vowel (e.g., hide-hid); (4) exception verbs whose past tense form bore no relation to its verb stem (e.g., go-went). The token frequency of this last type of exception verb had to be higher for the network to learn them successfully (see Plunkett & Marchman, 1991), as is the case in real languages. As a result, this verb type experienced three times as much training as the other types. There were 410 regular verbs, and respectively 20, 68, and 2 of each exception verb type.

A separate set of novel verbs was constructed to evaluate the generalisation performance of the network. These verbs could differ depending on their similarity to items in the training set. For simplicity, 410 novel verbs were used, each of which shared two phonemes with one of the regular verbs in the training set. Generalisation was evaluated depending on the proportion of these novel verbs which were assigned the correct allomorph of the regular past tense rule.

Learning algorithm: The network was trained with the backpropagation learning algorithm, using cross-entropy between the output and target as the error signal. The learning rate was 0.01 and momentum was 0.00. The entire corpus was presented on each epoch and pattern update was used. Networks were trained for 5000 epochs.

Performance measure: A nearest-neighbour method was used to evaluate network performance, using a Euclidean distance metric. For each position in the output, the phoneme that the set of activation values most resembled was taken as the intended output for that position. If the resulting output string was the target output, it was marked as correct. Scores were thus percentage correct for each verb type.

Implementation of damage: Lesioning: weights were probabilistically set to zero throughout the network. The probability level determined the severity of the lesion. A probability of 0.3 would on average lesion 30% of the connection weights. Due to differences in sensitivity, probability levels of .01, .025, .05, .1, .2 and .3 were used for lesions applied at the end of training (with no retraining after damage), while probability levels of .5, .6, .7, .8, .9 and .95 were used for lesions applied at the beginning of training. Noise: noise was added to the activation levels of the units in the hidden layer, with a gaussian distribution with mean zero and a standard deviation which determined the severity of the damage. Standard deviations of .025, .05, .0625, .075, .0875, .1, .2, .3, .4, .5, .6, and .7 were used. The baseline condition included no noise. Units had a maximum activation level of 1 and a minimum of 0, and noise could not take the activation state of a unit outside of these limits. Discriminability: the activation of each processing unit in the hidden and output layers was determined by the following equation

where net input is the summed activation arriving at the unit including its bias, and where the Temperature parameter controls the steepness of this sigmoid function (see e.g., Hinton & Sejnowski, 1986). High temperatures correspond to low discriminability, while low temperatures correspond to high discriminability. Values of 4 and .25 were used.

Replications: For the baseline model and for cases of damage prior to training, results were averaged over six runs of each network using different random seeds. Initial weights were randomised within the range ±0.5 and pattern presentation during training was random without replacement. For cases of damage at the end of training, results were averaged over damage to each of the six baseline networks. For the addition of noise and the probabilistic lesioning of connection weights at the end of training, results were averaged over 10 repetitions of the damage for each of the six baseline networks. Graphs include standard error bars across the network replications as an indication of variability.

7.3 Results

The following graphs show performance on regular verbs, performance on the vowel-change exception verbs, and performance on generalisation of the regular rule. Results for the other two exception types were similar, and so are omitted. Baseline performance on the regular, exception, and rule pattern types was 100%, 100%, and 77% respectively. Figure 2a) shows the effect of lesioning weights either before training (‘startstate’ damage) or after training (‘endstate’ damage). For each pattern type, the figure shows the relative performance of the startstate and endstate conditions for increasing levels of damage.

Figure 2: a) The effect of removing connections from the network either prior to (startstate) or following (endstate) the training process for regular, exception, and novel patterns. The x-axis plots increasing levels of damage, with much greater startstate damage (S) required to produced an equivalent impairment than endstate (E). b) Direct comparison of impairments on regular and exception patterns following startstate and endstate damage.

a)

b)

The results here indicate a similar pattern of impairment for both startstate and endstate damage on regular and exception verbs. Figure 2b) demonstrates that in both cases, exception patterns suffer a greater impairment than regular patterns, echoing the surface dyslexia simulations. For novel items, startstate lesioning initially improves generalisation of the rule, while endstate lesioning is only deleterious. At higher levels of startstate lesioning, however, generalisation declines here also. There are two major points to note from this simulation. First, very much greater damage is required in the startstate than in the endstate to produce an equivalent amount of behavioural impairment. Thus a lesion of 2.5% of network connections in the endstate reduces performance on regular patterns to approximately 90%, whereas a lesion of 80% of the connections in the startstate is required to produce an equivalent deficit. Despite the fact that the same damage produces a similar behavioural impairment here, the training process creates a huge difference in sensitivity to damage between the startstate and endstate conditions. This is because the trained network is losing connections that have already stored specific knowledge, while the untrained network is reduced in its potential to learn, and may use the remaining potential to acquire the domain as best it can.

The second finding is that the relationship between regular, exception, and rule performance in the startstate and endstate differs. For example, for a given level of performance on regular patterns in the damaged networks, the startstate network will show lower exception performance and higher generalisation performance. Despite broadly equivalent behavioural impairments, in detail, the patterns of deficit are different in the acquired and developmental case.

Figure 3 illustrates the effect of adding noise to activation levels within the network, either once training is complete or throughout training. The results once more show a differential pattern of sensitivity, but now in the reverse direction to the lesioning condition. The network is much more sensitive to noise occurring in the startstate than it is to noise occurring in the endstate. In the endstate, the network has established its knowledge and, due to the non-linear processing units, is able to tolerate noise in processing. As a result, performance has not yet reached floor when noise is added with a standard deviation (SD) of .7. However, when damage occurs in the startstate, the network is never provided with a reliable rendition of the knowledge it must learn. When noise is added with an SD of as little as .2, no learning is possible at all. Although the acquired and developmental phonological dyslexia models of Harm and Seidenberg (1999, 2001) are not directly comparable to each other, it is interesting to note that the acquired impairment was simulated by the addition of noise an order of magnitude greater than that used to simulate the developmental impairment, in line with the current findings. And a similar indirect comparison of Joanisse’s (2000) model of SLI and Joanisse and Seidenberg’s (1999) model of aphasia in the past tense domain indicates a comparable requirement for greater noise in the endstate than the startstate to produce an equivalent level of deficit.

Noise added to the endstate led to a roughly uniform decrement across regular, exception, and novel patterns (e.g., from SD levels of .3 onwards). However, noise added to the startstate led to greater impairments to exception patterns than regular patterns. As an indication of this effect, when regular pattern performance was roughly comparable in the two conditions (startstate 90% with SD=.075, endstate 87% with SD=.25), exception patterns had fallen to 69% in the startstate against 79% in the endstate. The similarity of the mapping between regular patterns as well as their majority in the training set allows them to better overcome the addition of noise in training than the more unique and minority exception patterns. In contrast, the ability of the network to deal with noise in the trained state depends on the non-linear functions within the hidden units, units which, broadly speaking, are shared by all patterns. Based on the relation of these different components of performance, once more one must conclude that the detailed pattern of impairments in the acquired and developmental cases was different.

 

Figure 3: The effect of adding noise to activation levels within the network either prior to (startstate) or following (endstate) the training processing for regular, exception, and novel patterns.

In sum, the addition of noise could produce effects that were uniform and global in effect (across regulars, exceptions, and generalisation) as in the endstate, or that were differential, as in the startstate; but, most clearly, effects that were much stronger when the damage occurred in the startstate than when it occurred in the endstate.

Figure 4 shows the effect of changing the discriminability of the processing units within the network. Reduced discriminability has little effect when applied either to startstate or endstate. The network can evidently compensate for it during training, or tolerate this disruption at the end of training. An increase in discriminability also produces little effect when applied to the startstate – once more, the network can evidently compensate during training. However, if an increase in discriminability is applied to the endstate, the result is a marked and selective deficit in performance on exception patterns, dropping in performance from 100% to 30%. Meanwhile, regular patterns only suffer a minor dip in performance and generalisation increases slightly. Evidently, exception patterns rely more on the discriminability of the processing units than regular patterns. On the whole, this type of damage produces an impairment that is both selective in the behaviour it impairs, and that only occurs when the damage is applied to the endstate.

 

Figure 4: The effect of altering unit discriminability within the network either prior to (startstate) or following (endstate) the training processing.

 

7.4 Discussion

This simulation addressed two issues. First, does the process of development contribute to the pattern of deficits. Second, does the process of development produce patterns of deficits that are the same as those produced in acquired damage.

With regard to the first issue, direct comparison of the effects of identical damage at startstate and endstate demonstrated a complex relation between these two conditions, indicating a significant role for the process of training. Three different forms of damage produced three different possible relations. First, damage could produce a similar pattern of impairment for the startstate and endstate conditions, but the two conditions could vary in their sensitivity to the damage (removal of connections, addition of noise). Second, damage could produce impairments predominantly in the startstate (addition of noise) or predominantly in the endstate (increase in discriminability, removal of connections). Third, damage could produce impairments that were global (addition of noise, removal of connections) or selective (increase in discriminability). This complex relationship exists because the training process can sometimes play a crucial, compensatory role after damage occurs to the startstate of a learning system. However, it can play this role only to the extent that resources permit, and only to the extent that the representation of the domain remains reliable. To the extent this is a valid model of cognitive development, the suggestion here is that the process of development will indeed contribute to patterns of deficits in a single system, but that the exact contribution will depend on the type of damage and the structure of the problem domain. [Note 8].

With regard to the second issue, the results suggested that, at most broad similarities were evident in the deficits caused by startstate and endstate damage, for instance in the cases of lesioning and adding noise. However, in both cases, detailed examination revealed that the patterns of deficits were different – the behavioural impairments across related measures (regular, exception, and rule) did not line up. Again, to the extent that this is a valid model of cognitive development, the results do not support the idea that developmental and acquired deficits will produce precisely the same patterns in a single system.

It is instructive to see why this was the case. Take the example of lesioning the model. In the case of endstate deficits, the decline of regular and exception patterns was more closely tied because both pattern types shared a representational space that was being damaged. In the case of startstate deficits, the potential representational space was reduced, but the training process allowed the regular patterns to dominate the space that was available. The result was a system in which exception patterns were eventually squeezed out. These two impaired systems did not share a common final deficit because there is a distinction between a process of deleting parts of a representational space that is already occupied and the outcome of a process of occupying a representational space in which the initial size has been reduced.

However, the results by no means rule out the possibility that learning systems can be damaged in different ways prior to and following training, such that they exhibit identical endstate behavioural impairments. Generally, one must be very cautious about assuming identical causes in the case of identical outcomes. It is certainly the case that in connectionist models of developmental disorders, different forms of startstate damage can produce similar endstate behavioural impairments, as we saw in the case of phonological dyslexia and past tense formation in Williams syndrome. On the other hand, we have not yet unearthed any convincing examples in our own work or in the literature that separately, startstate and endstate damage can produce identical endstate deficits in these networks. Time will tell on this point.

8. Simulation Two: Testing the assumption of Residual Normality in a simple connectionist learning system

8.1 Introduction

Many claims for RN relate to static adult models containing multiple independent, functionally specialised components. These components are supposed to fail separately under both acquired and development damage. Claims about developmental damage, however, are quite inappropriately applied to such models, because they are not models of development (nor do they pretend to be). In this simulation, we address the issue of RN in models with specialised components that are the product of a learning process.

How do specialised processing components arise in the cognitive system? Most connectionist models of cognitive processes have focused on single domains – in effect, they have been models of components within a modular system (see discussion in Karmiloff-Smith, 1992). Less work has examined how specialised components may actually emerge from an initially undifferentiated computational substrate. Evidence from the study of brain processes suggests that the neocortices of newborns are less structurally differentiated compared to those of adults, and that cognitive processes are less localised in this early substrate (e.g., Johnson, 1999). However, the key question regarding specialised structures has been whether their emergence during development reflects the unfolding of a maturational blueprint, whether the process depends on experience, or whether it reflects a gradual process of modularisation which lies somewhere between these two extremes (Karmiloff-Smith, 1992; Elman et al., 1996).

In a recent review, Jacobs (1999) argues that the evidence points to the experience-sensitive theory of specialisation. He discusses three computational approaches that have sought to model the experience-dependent emergence of structure. In the first approach, called mixture of experts, the initial computational system is assumed to be computationally heterogeneous. There are components that, while not dedicated to processing any particular content, have different computational properties. These components compete to perform the computations corresponding to a new cognitive domain. The component whose computational properties best fit the demands of the domain (known as a ‘structure-function correspondence’) will win the competition, and come to specialise in processing that domain in the future (see Jacobs, 1997; Jacobs, Jordan, Nowlan, & Hinton, 1991). In the second approach, called neural selectionism or parcellation, the initial computational system has a surplus of connections. However, during learning, many of these connections are weeded out, while others are stabilised depending on usage. In addition, a locality constraint favours the stabilisation of connections between nearby processing units. The result is that nearby units communicate with each other and come to perform the same functions, while those far apart do not communicate and come to specialise in different functions (Jacobs & Jordan, 1992; Johnson & Karmiloff-Smith, 1992; see Plaut, submitted, for a recent application to a cognitive model of naming and gesturing). In the final approach, called the wave of plasticity, the initial computational system experiences differential responsiveness to learning, both spatially and temporally. Conceived of as a sheet of computational units, plasticity is reduced over time with one side of the sheet losing its plasticity earlier than the other. The result is that the later maturing units can use the functions computed by earlier maturing units as input, and derive more complex and abstract computational functions from them – in essence the later maturing units specialise on the more abstract or high-level aspects of the problem domain (Shrager & Johnson, 1996).

Let us assume that the outcome of normal development is a set of specialised components in the endstate, which can be revealed by adult neuropsychology. For our purposes, the relevant question is, if the developmental process itself is pushed off course in a developmental disorder, could this also alter the nature of the specialised structures that are the outcome of development? Little computational work has explored this question (see Oliver et al., 2000, for some preliminary work). It seems possible that specialisation could be disrupted in any of the above computational approaches. An alteration in the initial set of computational primitives or in the competition process could disrupt specialisation in the mixture-of-experts approach. An alteration in the method or timing of pruning long connections could disrupt specialisation in the neural selectionism approach. An alteration in the timing of plasticity changes could disrupt specialisation in the wave of plasticity approach.

Decisive evidence has yet to be put forward demonstrating radical differences in specialisation in developmental disorders, although some hints have been made in this direction. For instance, Karmiloff-Smith (1998) has speculated that the cognitive processes of individuals with Down syndrome may be characterised by insufficient specialisation, perhaps due to a failure to prune long connections during development. However, conceptually, it is not yet clear to what extent one could compromise the emergence of specialised cognitive structures in a disordered state and still produce a viable cognitive system.

To extend the static adult damage model to developmental disorders is to make a more precise claim, however, that damage may be highly selective and thwart the development of a single specialised module. The question now becomes, if specialisation is not predetermined, under what conditions will the rest of the system develop normally despite this early selective damage? Fortunately, some existing models of reading and past tense formation allow us to explore this question. In these models, structure-function correspondences have been used to generate emergent specialisation in connectionist learning systems with multiple processing routes. Such networks include two processing routes in an initially content-free network [Note 9]. The routes have different computational properties, and these properties line up respectively with the computational requirements of learning regular and exception patterns (see below). The result in both a reading model (Zorzi, Houghton, & Butterworth, 1998) and a past tense model (Plunkett, Bandelow & Juola, 2001) was partial specialisation of the two routes to processing regular and exception patterns (see Westermann, 1998, for a related constructivist approach). We sought to evaluate the assumption of Residual Normality using this dual-route model, and in particular to answer the following question: Does disruption to one route prior to training alter the function that the initially intact route takes on at the end of training?

8.2 Simulation details

Architecture: The architecture of the dual route network is shown in Figure 1b). The feedforward network included an input layer and an output layer connected via two processing routes. The Direct processing route comprised a set of connections linking the input and output layer. The Indirect processing route connected these two layers via an intermediate layer of 20 hidden units.

Structure-function correspondences: When exception patterns must be learnt in the face of a majority of regular patterns, additional computational resources are necessary. Specifically, while a two-layer network can learn the mappings for a set of purely regular patterns, hidden units are necessary to mark out the inconsistency of the exception patterns, typically involving the use of the three-layer architecture. These effects are not all or nothing. A two-layer network can tolerate a small proportion of exception patterns in the training set; exception patterns themselves can be more or less inconsistent with the regular patterns and therefore more or less demanding of hidden units to mark out their inconsistency. (In computational terms, the role of hidden units in overcoming the inconsistency between regular and exception patterns is a question of linear inseparability – see Elman et al., 1996, chapter 2 for an introductory discussion). Furthermore, the disadvantageous effect of inconsistency can be mitigated by increasing the frequency of exception patterns in the training set. Broadly, then, in a network combining a two-layer architecture and a three-layer architecture in separate routes, the two-layer route will be best fitted to learn the regular patterns, and the three-layer route will be required to learn the exceptions, more so the greater the inconsistency of the exception patterns with the regular patterns. Structure-function correspondences can drive specialisation in error correction networks with multiple routes, because there is competition between each route to reduce the disparity between output activations and the training target. If one route succeeds in reducing the disparity, no error signal is left to change the weight strengths in the other route(s).

Our training set includes three types of exception pattern. Those based on the No Change past tense paradigm (hit-hit) are the least inconsistent with regular verbs, since as with regulars, the verb stem is reproduced, while the affix is omitted. We term these exceptions EP1. Exception patterns based on the Vowel Change paradigm (hide-hid) are more inconsistent, since in addition to the omission of an affix, the central vowel of the verb stem must be transformed. These exceptions we term EP2. Finally, exception patterns based on the Arbitrary past tense paradigm (go-went) are the most inconsistent with the regular patterns, since the verb stem must be entirely transformed and the affix omitted. We might expect these patterns to be most dependent on the hidden units of the Indirect processing route. However, in the past tense domain, it is argued that arbitrary past tenses can only be retained in English if they are of very high token frequency. In the current training set, arbitrary mappings were presented three times as often as other forms. We term this most inconsistent exception type EP3f, to reflect the fact that high frequency may modulate patterns of specialisation.

Training set: The training set was identical to that in Simulation One, except that instead of two arbitrary patterns in the training set, there were 10 such patterns. This permitted a more sensitive evaluation of performance on this pattern type.

Learning algorithm: As in Simulation One.

‘Residual Normality’ condition: In addition to the normal training scheme, for comparison the model was also trained under a Residual Normality condition. This condition assumed ‘guided specialisation’ (see later). Here, the Direct route was trained on Regulars alone, and the Indirect route trained on Irregulars alone. Guided specialisation in a multi-component model requires an external control system to co-ordinate the subsequent function of the trained components (see e.g., the ‘Blocking’ device in Pinker’s (1991) dual-mechanism past tense model). For simplicity, the control system was assumed in the RN condition. Routes were trained and tested independently.

Performance measure: Performance was measured using a nearest-neighbour calculation based on output activations, and scores marked as percentage correct. Specialisation of a particular pattern type to a particular route was evaluated by selectively lesioning the network at the relevant point in training. The Direct route and the Indirect route were separately given a probabilistic lesion of its weights with p = .5 (50% of all weights in that route). If damage to the Direct route caused more impairment on a pattern type than damage to the Indirect route, it was assumed that the function for this pattern type was specialised to the Direct route, and vice versa. Note that removal of 50% of the connections may not have equivalent effects on the Direct and Indirect routes, since the latter has two layers of weights. However, we were concerned here with differential effects between pattern types rather than routes. We did check for interactions, i.e., the possibility that pattern types might show differential sensitivity to damage in each route whereby, for instance, a pattern type may appear to be specialised to one route at 50% damage but the other route at 10% damage. This possibility was explored by carrying out endstate lesions with probabilities of .025, .05, .1, .2, .25, .5, and .75. Although, overall, the Indirect route showed greater sensitivity to damage than the Direct route, there was very little modulation of relative specialisation levels of the pattern types across damage levels. However, the absolute level of specialisation was affected by whether the level of damage was so great / small that it produced floor / ceiling effects in regular or exception performance. A level of 50% lesioning was used to assess specialisation because this was in the mid-range of sensitivity for both regular and exception patterns.

Implementation of pre-training damage: This simulation sought to explore the implication of specific damage to either of the two routes prior to training. This was achieved by removing different proportions of the weights in each route. Probability levels of .6, .75, .9 and 1 (removal of entire route) were applied. In addition, this level of damage was performed on both routes simultaneously, as a control. After initial damage, training proceeded as normal, and levels of specialisation were then assessed in the endstate.

Replications: Results were averaged across 6 networks with different initial random seeds, for each level of startstate damage. Probabilistic lesions were carried out 10 times and the results averaged. Graphs include standard error bars as an indication of variability across the 6 network replications.

8.3 Results

Figure 5 illustrates the trajectory of specialisation for each pattern type. Fig. 5a) shows how performance improves during training and reflects the usual advantage of regular over exception patterns. Specialisation at each point in training was assessed via a 50% lesion to either route, and the mean decrement in performance caused by these lesions is depicted in Fig. 5b). Once again, the results here reflect the greater impairment suffered by exception patterns after network damage, although the data demonstrate that this effect increases during training.

Figure 5: a) Performance on each pattern type during training in the dual-route network, at 10, 25, 50, 100, 250, 500, 1000, 2000, and 5000 epochs of training. See Simulation Details for description of each pattern type. b) Performance after a 50% lesion to each route, averaged over the two routes, carried out at each point in training. The 50% lesion was used to measure specialisation of function to each route. c) Specialisation of function for each pattern type during training, indexed by the differential impairment caused by damaging each route in isolation. Positive values indicate specialisation to the Direct route, negative values to the Indirect route. Error bars show standard errors across network replications.

a)

b)

c)

Figure 5c) represents an index of specialisation, where positive values represent specialisation to the Direct route, and negative values represent specialisation to the Indirect route. This index corresponds to the differential impairment caused by lesioning to a single route. The first point to note is that specialisation is only partial. Using this measurement technique, in Fig.5c) damage to the Direct route can cause a maximum decrement in performance only 53% greater than damage to the Indirect route. Damage to the Indirect route can cause a maximum decrement 29% greater than damage to Direct route. Secondly, most patterns show a shift towards using the Indirect route later in training.

To understand this latter point, it is important to realise that the two routes of the network do not just differ in their computational properties, but also in their plasticity. By virtue of the learning algorithm, weights from the input layer to the hidden units change more slowly than the weights directly connecting input and output layers. In effect, this network comprises one relatively more ‘stupid’ but fast changing route, and one relatively more ‘clever’ but slow changing route. Early on in training, successful performance in largely due to the Direct route, and this performance is best on regular patterns, generalisation of the rule, and the EP1 exceptions – those that are least inconsistent with the regular patterns. Subsequently, the slower changing Indirect route increasingly contributes to performance, but very much more so for the exception patterns than for the regulars. However, by the end of training, both regular patterns and generalisation of the rule rely more on the Direct route, while all exception patterns rely more on the Indirect route. As expected, the more inconsistent EP2 patterns turn out to rely more on the Indirect route than the EP1 patterns. However, the higher frequency of the EP3f patterns means that their greater inconsistency does not lead to Indirect route specialisation any more than that shown by EP2; higher frequency allows these patterns to recruit more processing from the Direct route in the face of the dominance of regular patterns.

In sum, this network shows emergent specialisation of function of different pattern types to separate processing structures. While this specialisation is not complete, our concern here was to establish a baseline level, in order to explore the effect of initial, route-specific damage to this process. Figure 6 shows (from left to right) the result of startstate lesions to the whole network, to the Indirect route in isolation, and to the Direct route in isolation. Fig. 6a) illustrates the effect of these startstate lesions on endstate performance, while Fig. 6b) shows their effect on the profile of specialisation in the endstate network.

 

 

Figure 6: a) Performance at the end of training following initial damage to both routes, to the Indirect route only, or to the Direct route only. b) Patterns of endstate specialisation for each pattern type, after initial damage. Error bars show standard errors across network replications.

a)

b)

Figure 6 a) demonstrates that when the entire Indirect route is removed, endstate performance on regular patterns and generalisation is only mildly impaired. However, performance on the exception patterns reveals a marked decrement, particularly for EP1 and EP2 which are not protected by increased frequency. When the entire Direct route is removed, regular patterns are impaired to a greater extent. Exception patterns are also impaired, but less than when the Indirect route was lost. Removal of the Direct route, however, produces a marked deficit in rule generalisation. In isolation, then, each of the two routes will attempt to acquire both regular and exception patterns, but each does so less efficiently than in the dual-route system. The routes in isolation both produce decrements in exception performance but, relatively speaking, the Indirect route is less able (but not unable) to learn the regular rule and the Direct route is less able (but again, not unable) to learn the exception patterns.

These results mark the maximum compensation that is available to the network. Fig. 6 b) demonstrates the specialisation when there is residual processing capacity in the damaged route [Note 10]. Here the data are unambiguous. When there is initial damage to the Indirect route, specialisation of function increasingly moves over to the intact Direct route. When there is initial damage to the Direct route, specialisation of function increasingly moves over to the intact Indirect route. A crucial lesson is demonstrated by this simulation: the assumption of Residual Normality does not hold in this learning system. Damage one route and the other route will not develop normally. It will compensate, and take on part of the function of the damaged route, at the cost of poorer performance across all pattern types. It cannot be taken for granted that every learning system will show RN.

This simulation is interesting in two other respects. In the condition where both routes experienced initial damage, the overall outcome was reduced specialisation. A system with uniformly reduced computational resources does not have the luxury of allocating functions to different components. The effect of resources on specialisation is a point to which we return shortly. Second, the 50% lesion used to measure specialisation produced greater deficits in systems that had experienced startstate damage to either or both routes than those that had not. Importantly, even if systems which experience early damage achieve reasonable endstate performance, they remain more vulnerable to subsequent disruption. [Note 11].

8.4 Discussion

Given the compensatory characteristics of the training process evident in Simulation One, it is perhaps unsurprising to find similar compensation here. When the two routes of a dual-route network show endstate specialisation, startstate damage to either route results in compensation by the intact route and poorer performance overall. If RN had held in this model, damaging a route in the startstate would have led to the (endstate) loss of the very same function for which that route was responsible in the endstate of the normal unimpaired model, while retaining normal function in the initially undamaged route. This is precisely the claim that is made (for example) in the case of phonological dyslexia, where a developmental problem in the GPC route of the reading system is assumed to lead to the same pattern of behavioural impairments as damage to the GPC route in the adult state (Coltheart et al., in press), with the lexical routes functioning normally in both cases.

The feedforward network presented in Simulation Two is similar to connectionist systems that have been used to successfully capture a wide range of developmental phenomena. These systems plainly do not demonstrate RN. Nevertheless, as we saw in the Introduction, RN is frequently postulated (albeit implicitly) in many explanations of developmental disorders. Perhaps, then, despite their success, current connectionist models are not the right sort of learning system to explain how the structure of the adult cognitive system comes about. Can other sorts of learning system show the emergence of specialised components while exhibiting RN after initial damage?

To date, artificial neural networks are the computational systems that have been most widely applied to the study of cognitive development. However, it is certainly possible that other approaches will come to the fore in the future, such as decision-tree learning, Bayesian methods, production systems, reinforcement learning, instance-based learning, genetic algorithms, or indeed other types of artificial neural networks. In the meantime, from the perspective of developmental disorders, it is vital to stipulate what additional constraints any such learning systems would need to incorporate in order to achieve RN. We will discuss five (somewhat overlapping) notions: (1) stronger structure-function correspondences; (2) stronger competition; (3) early commitment; (4) guided specialisation; and (5) restrictions on computational resources. Note that all of these notions are based on the assumption that endstate cognitive structure is experience-sensitive. RN can of course be stipulated, as per accounts which propose that modular structure in the cognitive system is pre-specified and that if components develop, they do so independently. (Those who stipulate innate modularity would then need to justify this claim with evidence from developmental cognitive neuroscience – evidence that we currently believe to be wanting.)

9. Ways to achieve Residual Normality in systems with emergent modularisation

(1) Stronger structure-function correspondences. Each of the routes in the dual-route network was able to show a fair degree of compensation for the functions of the other, suggesting that the correspondences between the functions of the two routes and the regular / exception structure of the learning problem were partially overlapping. One way to assure RN would be to have much stronger structure-function correspondences, whereby the computational properties of each route were entirely inappropriate for learning the patterns on which the other route specialised.

We might illustrate this idea by stepping outside our two example domains for a moment, and considering a connectionist model of the development of object-oriented behaviour proposed by Mareschal, Plunkett & Harris (1999). This model used an input retina to represent the trajectories of various sorts of moving objects. The system had to learn to reach for some of the objects it saw but not for others. To achieve this task, it was given two routes, one which processed spatio-temporal information about the position of each object (the ‘where’ channel), the other which processed featural information about the identity of each object, such as its colour and shape (the ‘what’ channel). A final layer of the system combined the two processing routes to achieve reaching behaviour. The relevance of this model is that the computational properties of each processing route were very different. The ‘what’ channel utilised competitive learning to achieve translation-invariant feature recognition across the entire retina. The ‘where’ channel employed recurrent circuits to encode time-varying information about position. If either of these channels were to be damaged prior to the training process, the remaining route simply would not have the appropriate computational primitives to compensate for the function of the damaged route: translation-invariant featural information contains no clues to location; spatial trajectory information contains no clues about object identity [Note 12]. The result would be a system with RN, where the remaining initially intact route would develop the only skill it had the capacity to learn and nothing else. [Note 13].

(2) Stronger competition. In the mixture-of-experts approach to specialisation, separate components compete to represent a new domain. The component best able to represent the domain is given ‘sole rights’ to it, and the other components are inhibited. In the dual-route network, on the other hand, both routes worked in harness to learn the appropriate mappings – neither was prevented from adjusting its weights to improve performance on any pattern. The result was to encourage co-operation between the routes. Much stronger competitive processes might permit a component to claim ‘sole rights’ only to patterns that best suited its computational properties, and might inhibit it from acquiring patterns outside of that set. If the same component were to win the competition in both normal and atypical development (a big IF), then the component would exhibit RN and no compensation.

(3) Early commitment. If plasticity reduces rapidly over time, then early commitment may contribute to RN. Under this scenario, early commitment of separate components to functions must occur before the developmental disruption takes place. This can be either because the damage has not yet occurred, or because the underlying damage does not make itself apparent since it is not relevant for the cognitive processes appropriate for the current stage of development (see discussion in Thomas, in press). Early and irreversible (yet experience-dependent) specialisation could contribute to a modular system where RN holds (see, e.g., Miller & Erwin, 2001), provided an account exists of the required delay in the emergence of the developmental disruption.

(4) Guided specialisation. In the dual-route network, prior analysis of the computational properties of two- and three-layer networks suggested that the Direct route would be better suited for learning regular patterns and the Indirect route would be better suited learning the exception patterns. In principle, we could have determined to label each pattern as Regular or Exception in advance, and then only allowed the Direct route to alter its weights in response to Regular patterns, and the Indirect route to alter its weights in response to Exception patterns. Unsurprisingly, the result of this form of guided training would be independent specialisation. Clearly if one route were damaged prior to training without any change to the advanced labelling system, this route would fail to learn the patterns assigned to it. The other route would be unaffected, and would hence show RN.

In such a case, we are of course left with the burning question of where the advanced labelling information comes from. Some would argue it is innate. Alternatively, the labelling information could be the product of an earlier phase of learning, in which an analysis of the target domain identified the presence of regular and exception patterns before the dual-route system was engaged. It should be evident, here, that if one chooses to appeal to "guided" specialisation to support RN, then claims about the availability of a priori knowledge need serious substantiation.

(5) Restrictions on computational resources. We saw in the results of Simulation Two that specialisation was eliminated under severe resource limitation. Resource limitations may also have implications for limits on compensation, and indirectly, for RN. This idea can be illustrated with an example from the past tense domain, and claims made for the (computationally unimplemented) traditional model. The traditional model comprises two mechanisms, one employing rule-based representations and nominally responsible for learning regular inflections, the other employing an associative memory and nominally responsible for learning exception inflections. If these two mechanisms are to appropriately specialise on their respective inflections (in the absence of external guidance), it is important that neither mechanism be too powerful. Given sufficient ‘rule’ resources, for example, all past tenses could be learnt in terms of a large set of rules (see e.g., Ling & Marinov, 1993; Taatgen & Anderson, 2001). In contrast, given sufficient ‘associative memory’ resources, all past tenses could be learnt as specific instances. Generalisation to novel exemplars could be achieved in either case by an analogy-based strategy, such as similarity-to-the-nearest-known-exemplar. Given two overly powerful mechanisms, the result would be duplicated processing systems, each able to perform the whole task. To achieve specialisation, therefore, one needs to restrict the resources of each mechanism. For example, the English past tense has one rule and about a hundred and fifty exceptions. If the relevant mechanisms were restricted to these limits, then they would show little compensation in the event of damage to the other mechanism.

The case of SLI illustrates the point. Children with SLI show low levels of inflection on both regular and exception verbs, and poor generalisation of the regular rule to novel strings. Appealing to the traditional two mechanism model of past tense formation, Ullman and Gopnik (1999), Pinker (1999), and van der Lely and Ullman (2001) have all argued that there is a startstate deficit to the mechanism intended to learn the regular rule. All that remains is the exception mechanism, which learns the past tenses of some exception and some regular verbs, and (presumably by analogy) can struggle to offer a few correct generalisations to novel verbs. The implicit claim here is that there is a particular limit in computational resources in the exception mechanism that prevents it from learning more than a handful of regular past tenses by way of compensation. Specifically, the exception mechanism must be able to learn the couple of hundred exceptions to explain its performance in normal development. But, to explain the lack of compensation in SLI, it must be unable to learn a couple of thousand high frequency regular past tenses that would be sufficient to get by in everyday language use and therefore mask the regular-mechanism deficit. The explanation of the endstate impairment crucially relies on this precise memory stipulation which constrains compensation. (Interestingly, no empirical support is offered for such a memory limitation in the above accounts of SLI).

In short, resource limitations may be a necessary component of RN, but not a sufficient one.

10. Residual Normality and the inference from behaviour to structure

In Section 4, we identified two developmental conditions where behavioural similarities between acquired and developmental disorders should not lead to the inference that they reflect selective impairments to the same components of a static adult model. One condition was when features of the problem domain determine the pattern of breakdown rather than features of the processing structure. In the simulations, this situation is reflected in the greater vulnerability of exception patterns, irrespective of damage type. However, closer inspection revealed different patterns of deficit to regulars, exceptions and generalisation for different damage types, implying an effect of residual processing structures on performance.

The second condition was when similar patterns of behaviour in atypical development are generated by a different underlying structure of specialised components. Above, we have argued that the emergent pattern of functional specialisation depends on computational constraints operating during cognitive development. However, Jackson and Coltheart (2001) maintain that developmental disorders in their endstate are potentially no different from acquired disorders and can be used with reference to the normal adult model quite independently of the nature of the developmental process that produced them. One reading of their position is that, at a given moment in time, inferences can be made from behavioural impairments directly to underlying structure, irrespective of the developmental processes that produced the system.

Computational modelling allows us to explore this claim, since we can simultaneously generate patterns of behavioural deficits while knowing the underlying cause and the background developmental account in each case. Figure 7 illustrates a behavioural impairment generated from the dual-route model following selective damage to one of its routes. Two versions of the model were damaged, the one we have already studied in which RN does not hold, and a second version in which RN does hold (in this case, RN was achieved by guided specialisation of regulars and exceptions to the two routes. See Simulation details). Figure 7 shows that both versions of the model can generate similar behavioural impairments after startstate and after endstate damage, (although in the absence of RN, regular performance is not quite at ceiling). However, as summarised in Table 1, the inferences that one can make from intact behaviour to intact underlying process, or from impaired behaviour to impaired underlying process, crucially depend on the developmental constraints of the system. With regard to developmental deficits, where RN holds, intact behaviour implies intact underlying process and a dissociation of independent structures. Where RN does not hold, intact behaviour implies atypical underlying process, in a system that has experienced compensation during training. With regard to acquired deficits, on the other hand, the developmental