For information about subscribing or purchasing offprints of the published version, with commentaries and author's response, write to: journals_subscriptions@cup.org (North America) or journals_marketing@cup.cam.ac.uk (All other countries).
Cell assemblies; cerebral cortex; coordination; context; dynamic binding; epistemology; functional specialization; learning; Neural coding; neural computation; neuropsychology; reading; object recognition; perception; self-organization; synaptic plasticity; synchronization.
This research concerns forms of coding, processing and learning that are common to many different cortical regions and cognitive functions. Local cortical processors may coordinate their activity by maximizing the transmission of information that is coherently related to the context in which it occurs, thereby forming synchronized population codes. In this coordination, contextual field (CF) connections link processors within and between cortical regions. The effects of CF connections are distinct from those mediating receptive field (RF) input. CFs can guide both learning and processing without becoming confused with RF information. Simulations explore the capabilities of networks built from local processors with both RF and CF connections. Physiological evidence for CFs, synchronization, and plasticity in RF and CF connections is described. Coordination via CFs is related to perceptual grouping, the effects of context on contrast sensitivity, amblyopia, implicit influences of color in achromotopsia, object and word perception, and the discovery of distal environmental variables and their interactions through self-organization. In cortical computation there may occur a flexible evaluation of relations between input signals by locally specialized but adaptive processors whose activity is dynamically associated and coordinated within and between regions through specialized contextual connections.
1. Introduction
2. Arguments for and against common foundations for cortical computation
3. Computational studies of the contextual guidance of learning and processing
4. Physiological evidence for contextual integration and synchronized population
codes
5. Psychological implications and evidence
6. Issues arising
Notes
References
Figure Legends
The possibility of common foundations for cortical computation was first discussed by the authors (Phillips a psychologist, and Singer a neurophysiologist) in 1980. We had collaborated in the early 1970s, comparing single unit activity in cat lateral geniculate nucleus with the ability of humans to detect the appearances and disappearances of elements in random dot patterns (Phillips & Singer 1974; Singer & Phillips 1974). This background was important because it had helped convince us that psychophysics and neurophysiology could combine fruitfully and in detail. Since then we had not met for some years and the neurophysiologist asked what the psychologist's current interests were. The ensuing conversation went roughly as follows:
Psychologist: : Well, my main interest is in the fundamental differences between different cognitive domains. For example, what are the basic differences between sensori-motor systems and the higher conceptual systems? Then, within the conceptual systems what are the basic differences between visuo-spatial processing and verbal processing?
Neurophysiologist : But why are you emphasizing differences? The cortical algorithm is everywhere the same.
Psychologist: : Well if that is so it is very interesting, but from the psychological point of view there certainly seem to be some major differences. Consider learning and memory, for example. Information storage within the sensory systems is of very short duration, less than a second in visual sensory storage, whereas once it is put into a schematic conceptual form information can be voluntarily maintained for many seconds in STM, and can be learned and stored indefinitely in LTM.
Neurophysiologist : But there is also long-term plasticity in sensory systems, both during development and later. The receptive fields of cells in primary sensory cortex depend upon the stimulation that they get during development, and these use- dependent modifications of synaptic transmission can also occur in adults.
Psychologist: : Yes, of course, such effects are well established, but that is a quite different kind of learning.
Neurophysiologist : Well is it? Why do you suppose that learning and processing in the sensory cortex are fundamentally different from learning and processing elsewhere in cortex? Perhaps they are very similar, and from the neurophysiological point of view that's how it seems.
We are still searching for answers to questions raised by this discussion. Are there information processing operations that are common to different cortical regions and different cognitive sub- systems, and if so, what are these operations, why are they useful, and how are they implemented by cortical processes ? Different cognitive functions are of course performed by different cortical regions and at different levels of organization, but all regions of the neocortex share a common basic internal organization, and because of this predominant homogeneity it is also called isocortex. Computational capabilities of general utility may therefore arise from this common design. This paper is concerned with what those capabilities might be and with how they arise from cortical structures and processes.
The organization of cognition into distinct sub-systems is even more firmly established now than it was twenty years ago. This does not imply differences in the information processing operations that they perform, however, because sub-systems may differ in the information upon which they operate, but not in the operations that they perform upon that information. Many cognitive sub-systems are distinguished from each other just in terms of the information on which they operate, but it is also likely that some cognitive functions require special information processing capabilities. These include: episodic memory and working memory; intentional representation, i.e. processes that distinguish between representation and referent; and the creative aspects of language and long-range strategic planning. Higher cognitive functions such as these are central to human mental life, and they depend to a large extent upon cortical activity. These functions may not arise in any simple way from basic capabilities that are common to cortex in general, however, because (i) intentional representation and language, are not characteristic of mammals in general but are restricted to just one or at most a few; (ii) in contrast to skills, episodic memories cannot be acquired in the absence of the hippocampus (Squire 1992), and may require special computational capabilities ( McClelland et al. 1995); and (iii) the ability to dynamically create more than one level of grouping within the same set of units, such as ((AB)(CD)), may involve special computational problems (Fodor & Pylyshyn 1988; Hummel & Holyoak 1993). Thus our working assumption is that some cognitive functions require special capabilities in addition to those that are common to cortex in general. Furthermore, although we take the abilities that are provided by the common foundations for granted, they are crucial to the sensory, perceptual, and motor skills on which our daily lives depend.
The notion of functional specialization summarizes the vast body of findings showing that different cortical regions and different cells within regions transmit information about different things. Discussions of how the activity of these distinct processors can be coordinated have an equally long history, but this aspect remains much less well understood. The particular form of integration with which this paper is concerned is that which arises from a myriad of local coordinating interactions between pyramidal cells within and between cortical regions. This does not deny that the music of the hemispheres might be guided by some kind of conductor, but it does imply that integration can be achieved, at least in part, through local interactions between the players themselves. Musicians have two different sources of information which they normally use in two different ways. They have the score to tell them what to play, but they also watch and listen to each other to determine exactly when and how loudly to play it. The local processors that we postulate also have two classes of input. One is the receptive field input which tells them what features to signal, and the other is contextual input from the concurrent activity of other processors which is used to determine exactly when and how confidently to signal the features for which they have evidence.
A simple, general, and precise framework for describing functional specialization in neural systems is provided by the adaptive filter formulation (Carpenter, 1989). The basic idea is that the strengths of the synapses that mediate receptive field input perform a selective filtering operation that can be adapted through experience to better meet the environment and tasks to which the system is exposed. Filtering is necessary for at least two reasons: i) the amount of sensory data to be processed is so great that predictive relationships can only be found after dimensionality reduction; and ii) different information is relevant to different purposes.
We now need to add contextual integration to such formulations. Filtering is useful because it contributes to the more general goal of making good predictions. Predictive relationships of varying degrees of complexity are richly embedded within the input to the cortex, across both space and time, and the discovery and use of these relationships is a major goal of cortical computation at all stages and levels of processing. The integrative interactions that we hypothesize can be thought of as using these predictive relationships to produce patterns of activity that are coherent both within and between various streams and levels of processing. There is evidence that this involves synchronizing the activity of dynamically specified sub-sets of cells, using special synchronizing contextual connections that influence the probability that the target cells fire at any moment (e.g. Singer 1990, 1993, 1994, 1995; Singer & Gray 1995; Engel et al. 1992). A crucial aspect of this form of integration is that context affects activity without corrupting the information that is transmitted by that activity about the cell's receptive field input. Here we summarize the evidence for synchronization and for contextual connections, and we study computational capabilities that arise when cortical processors receive local contextual inputs that can be used to guide both learning and processing.
In the remainder of Section 1 we outline the issues and hypotheses to be discussed. Section 1.2 gives an informal outline of the possibilities that arise when local cortical processors coordinate their activities by using specialized contextual inputs to form synchronized population codes and to guide learning. Section 1.3 relates the codes and processes that we propose to other aspects of cortical function. Section 1.4 reviews prior proposals using synchronized population codes and contextual guidance. Section 1.5 summarizes the main hypotheses that we expect to be controversial. Section 2 outlines arguments for and against the hypothesis of common foundations for cortical computation. Section 3 specifies the goals of contextual guidance formally using concepts provided by information theory and multivariate statistics, and outlines computational studies showing the basic capabilities of simple networks built from local processors with contextual guidance. Section 4 outlines evidence from neurobiology for contextual guidance and synchronized population codes. Section 5 discusses the relevance of these hypotheses to psychological issues, outlining evidence that is already available from behavioral studies as well as ways in which our suggestions can be further tested and developed by such studies. Finally, Section 6 discusses a few of the issues that arise from these hypotheses.
A starting point for our approach is the hypothesis that, although cortical circuits are constrained to operate just upon the information that is locally available to them, coordination of their activity with what is going on elsewhere is central to their computational role. This is possible because they receive locally specific contextual input from other processors (but directly only from a tiny fraction of the other processors in the cortex as a whole). The contextual input is used to selectively enhance the transmission of that information in the processor's receptive field (RF) input that is coherently related to the context. Networks of such processors therefore tend to transmit sets of signals that as far as possible maximize their mutual coherence. As a signal only transmits information if it varies we call this capability the maximization of coherent variation. Useful consequences that follow from it are discussed below, and in Section 1.2.3 in particular.
The usefulness of the ability to organize distributed patterns of activity into coherent groups is widely acknowledged in discussions of the "binding problem". This problem would be solved if cells currently forming a coherent group synchronized their spike trains to within a few msec. This possibility was proposed by Milner (1974) and has long been advocated on both theoretical and biophysical grounds (von der Malsburg 1981). Neurophysiological evidence to be outlined in Section 4 now suggests that the spiking activity of cortical neurons can be synchronized to within a few milliseconds in a way that is appropriate to the prevailing context, and which includes synchronization between neurons in different streams of processing and between neurons at different stages of processing.
Synchronization would be an effective signal for grouping because inputs to pyramidal neurons are summed much more effectively if they are synchronized (Abeles 1982, 1991; Bernander et al. 1991). It is an inherently relational signal because it depends upon temporal relations between inputs from separate sources. Thus, unlike the more commonly studied rate and place codes, it is not defined upon the signals produced by individual cells, and will not be revealed by studies of single cell activity.
A major feature of the work on synchronization is that it suggests the existence of specialized cortico-cortical synchronizing connections that modulate post-synaptic activity but without corrupting the information that is transmitted about the receptive field features to which the cell is selectively sensitive (Engel et al. 1991b; Munk et al. 1992; Lowel & Singer 1992; Konig et al. 1993). That is, they help determine exactly when a cell fires, but they do not change the feature that is signaled by that activity. To explain how this is possible it is often suggested that this can be done by using the synchronizing connections to influence the phase but not the amplitude of oscillatory outputs that are produced by the local processors (e.g. Hummel & Biederman 1992; Shastri & Ajjannagadde 1993; Schillen & Konig 1994). Though useful, this may not be a general solution to the problem of combining both feature and grouping information within the same signal, however (e.g. see Nelson 1995). First, although oscillations are likely to play a major role in synchronization, they are not necessary because even single impulses can be synchronized. Second, there is evidence that synchronization can occur without oscillations (Konig et al. 1995). Third, there is doubt as to the generality of oscillations in the normal functioning of cortex (e.g. Tovee & Rolls 1992; Young et al. 1992; Bair et al. 1994). The computational studies described in Section 3 show how contextual inputs can guide processing without corrupting the transmission of RF information, and they do this in a way that does not require oscillations even though it is compatible with them.
Another major focus of this paper is on the possibilities that arise for learning when processors receive local contextual inputs. The computational studies outlined in Section 3 show that idealized local processors with contextual inputs can discover those receptive field features that are predictably related to the context within which they occur together with discovering the predictive relations between them. That is, in addition to learning the associations between features, the local processors can preferentially discover those features that are associated.
Local processors with contextual guidance receive a set of receptive field (RF) inputs and a set of contextual field (CF) inputs (Figure 1a). These processors are intended to be loosely analogous to local cortical circuits. In relation to the RF input they act as filters that transmit information about the RF features to which they are selectively sensitive. This selective sensitivity is specified by the strengths, W, of the synaptic connections that mediate the RF input. In addition, the probability that they transmit information about any RF feature at any moment is increased if that feature is as predicted by the context, and is decreased if it is incompatible with that prediction. The predictions are specified by the CF inputs as mediated by the strengths, V, of their synaptic connections. A crucial aspect of this form of processing is that the predictions are not confused with the RF evidence. The role of context is not to impose its predictions upon the processor, but to emphasize those outputs for which there is RF evidence and which are coherently related to the context. This can be done by using context to influence the confidence with which decisions are made on the basis of the RF evidence and to synchronize coherent outputs.
The strengths of the synapses, W and V, are not permanently fixed, but can change so as to better adapt the detailed operations performed by the processors to the statistical structure of the inputs that they receive. We hypothesize that providing local processors with contextual input enhances the learning of which they are capable. Major issues to be discussed are therefore what the goals of this learning could be, and by what synaptic modifications they may be achieved.
As illustrated by the width of the arrows in Figure 1, processors are assumed to have fewer outputs than RF inputs. This reflects the long-standing hypothesis that a major goal of sensory and perceptual processes is recoding to reduce redundancy (Attneave 1954; Barlow 1959, 1961, 1972, 1989; Linsker 1988; Barlow & Foldiak 1989; Foldiak 1990; Baddeley & Hancock 1991; Atick 1992; Atick & Redlich 1990, 1993; Redlich 1993; Li & Atick 1994). The underlying idea is that the flood of data to be processed can be reduced to more manageable amounts by using the statistical structure within the data to recode the information that it contains, with frequent input patterns being translated into codes that contain much less data than the patterns themselves. If the computational goals can be clearly specified, such as by using information theory, then rules for learning can be derived from those goals (Intrator & Cooper 1995a). An important limitation of recoding as a goal for cortical computation is that it is ultimately sub-ordinate to the goal of associative learning. There would be no point in recoding information about variables that have no relation to anything else known to the system. Proponents of recoding to reduce redundancy therefore usually see it as being preparatory to associative learning (e.g. Barlow 1993). A distinctive advantage of the processors that we propose here is that they are not forced to transmit just whatever RF variables carry the most information, but can selectively discover those that are associatively related to the context within which the processor operates.
Many different network architectures can be built from such processors. Felleman and Van Essen (1991) review a great deal of evidence concerning the overall system architecture within which local cortical processors operate. They distinguish three broad classes of connections between cortical regions: ascending feedforward connections, descending feedback connections, and lateral connections between regions that are at approximately the same stage of processing. The ascending projections from one stage to the next are localized such that neurons receive their primary ascending inputs from a small sub- set of neurons at the preceding stage, with different local groups of neurons receiving from different sub-sets. Thus, many distinct streams of processing project through a few stages in converging and diverging ways with primary feedforward connections being distinguishable from lateral and descending connections.
The local processors hypothesized here are broadly compatible with such an architecture. The ascending connections could provide much of the RF input, and both the descending and the lateral connections could include CF input. Furthermore, within cortical regions there are many distinct streams of processing and these are linked by long-range horizontal collaterals that could transmit synchronizing contextual information. Mutual contextual guidance of this sort is shown in Figure 1b and it could link distinct streams of processing both within and across cortical regions. Another possibility is that contextual inputs could be received from processors to which the processor concerned contributes RF input. This is shown in Figure 1c, and it would enable activity to be coordinated across different stages of processing, as well as within stages. Finally, Figure 1d shows that one set of RF signals can be transmitted to separate processors with different contextual fields. Each processor will emphasize the RF information that is relevant to its context, thus enabling different processors to extract different aspects of the same RF activity.
The patterns of connectivity shown in Figure 1 are not mutually exclusive, and can be combined in various ways. Furthermore, separate modules with recurrent internal RF connectivity could be linked by CFs that coordinate their activities. We assume that genetic constraints play a major role in determining patterns of RF and CF connectivity, and that the CF inputs are specific to the role of each local processor, just as are the receptive field inputs. For example, at early stages of the analysis of a visual scene other parts of the scene might provide a useful context, whereas at later stages of processing information from other modalities might provide a more appropriate context. If local cortical processors do receive specialized contextual inputs as proposed then principles or heuristics for determining where they should come from will be an important issue. Tononi et al. (1994, 1996) have shown that the overall pattern of cortical connectivity balances functional integration, produced here by the contextual inputs, against functional segregation, produced here by the RF filter functions, so as to produce a system with high complexity, high computing power, and the ability to use context to `go beyond the sensory information given' in an appropriate way.
1. The most important capability that arises in relation to processing, is that the effects of context will be such as to increase the probability that mutually coherent sub-sets of units will be active at any moment. That is, they will tend to produce synchronized population codes. As has already been argued in detail elsewhere (Singer 1990; 1993; 1994) such codes have important advantages: they are flexible because they are created dynamically; they are fast because in the limit all that needs to be synchronized are single spikes from each of the cells to be grouped; they can signal many different patterns because each cell can at different times be part of many different groups; they do not compromise the meaning of the signals to be grouped; and finally, they transmit appropriately structured patterns of activity rather than just arbitrary or unstructured labels (Phillips 1996; Phillips et al. 1995a).
A layer built from local processors with contextual guidance therefore produces patterns of activity where the RF filter functions ensure that the individual signals are justified by the RF input and the contextual connections maximize their coherence as a group. This implies that in the case of perceptual grouping, for example, the Gestalt criteria are embodied in the CF connections between the entities that are grouped. This predicts that synchronization of active cells in the visual system should reflect Gestalt principles of grouping, and evidence that this is so will be outlined in Section 4.
One way to see what is implied by dynamic flexibility is to note that if these processes are common to cortex then the inputs to each area will themselves be organized by the grouping processes operating in the areas generating those inputs. The RF filter functions will therefore not operate upon rigidly fixed data bases but upon ones that are already organized so as to emphasize coherent sub-sets of data within the RF. As those coherent subsets can occur within each RF in a very large number of different ways this enables the receiving cells to respond appropriately to many more inputs than they could do without the dynamic grouping. In addition, there is evidence that cortex receives inputs that are dynamically grouped in the thalamus (Sillito et al. 1994) and retina (Neuenschwander et al. 1996). (see Note 1)
2. In addition to grouping related features that are clearly evident in the RF input, local contextual information could improve the perception of features that are weak or ambiguous.
3. Important capabilities that arise in relation to learning are that contextual input enables local processors to become selectively sensitive to those variables within their RF input that are predictably related to that context, and to do so together with learning the predictions. (see Note 2)
The possibility of using relations between separate data-sets as a basis for self-organization is illustrated in Figure 2. As an example of this approach Becker and Hinton (1992) show how stereo-depth can be discovered by using the mutual information between separate streams of processing that receive inputs from neighboring patches of the image that are independent except for having the same disparity. Stone and Bray (1995) and Stone (1996) have also shown how coherence across time can be used to learn invariances and other salient visual parameters. (see Note 3) A general epistemological argument for this approach is that predictive relationships between diverse data-sets must depend in some way upon their distal origins. Discovering those relationships will therefore reveal distal variables and interactions within the proximal data.
4. Processors with contextual inputs from different sources, as shown in Figure 1d, will become selectively sensitive to just those aspects of their inputs that are relevant to those contexts. Thus this will help create appropriate functional specializations, and it will entail appropriate generalizations because the outputs of each processor will generalize across irrelevant dimensions of RF variation. To recognize facial expressions we must do so despite variations in personal identity; to recognize individual faces we must do so despite variations in facial expressions. Variables that are crucial to one goal may be irrelevant sources of noise to another. This problem would be solved if different cortical regions have a selective sensitivity to just those variables that are relevant to their role, and the evidence suggests that this is how face perception is organized (Bruce 1988); but how can local processors know what is relevant? Genetic specification cannot be the whole answer because some functional specialization is established through learning. Context could contribute to such learning by guiding RF selectivity to the relevant variables. For example, regions receiving RF input from visually perceived faces and CF input from regions concerned with evaluating emotional states could then learn to become selectively sensitive to just those variables in face images that are predictably related to emotional expression.
The CF inputs that we postulate are not equivalent to inputs from `beyond the classical receptive field' in general. Many investigations show that the response of cortical cells to their preferred stimulus is suppressed by the presence of similar stimuli in the surround (e.g. Blakemore & Tobin 1972; Nelson & Frost 1978; Allman et al. 1985; Knierem & Van Essen 1992). These effects are not due to the CF inputs that we propose because they are not concerned with producing coherent patterns of activity across multiple feature detectors, but with using information about the surround to suppress responses that do not differ from that surround. This is quite different from the use of contextual predictions to increase the probability of signals that agree with those predictions. It is more appropriate to view the subtraction of activity that is summed over some surrounding region as being included in the mechanisms that determine RF selectivity, rather than as being part of the mechanism for coordinating the activity of many simultaneously active feature detectors.Receptive fields that emphasize contrast with the surround show how the maximization of coherence is compatible with evidence for processes that emphasize the unexpected. If a single element differs from the others in an array on some simple variable, such as color or orientation, then the odd one out is very noticeable. This is evidence for processes that emphasize what is not predicted by the surround, and we account for that evidence by noting that RFs usually develop so as to detect such differences. Enhanced transmission of the unexpected is also proposed by some theories to be a major role for the descending feedback projections from higher to lower stages of processing (e.g. Mumford, 1992; Pece, 1992). In contrast to these theories, Sillito et al. (1994) provide evidence that feedback from V1 to the LGN synchronizes the activity of those LGN cells that agree with the interpretation at the higher level. Furthermore, psychological experiments show that context often supports what is consistent with that context (e.g. Biederman, 1972; Palmer, 1975; McClelland 1978; McClelland, Rumelhart, and Hinton, 1986).
We also need to distinguish the population codes proposed here from the population vector codes proposed by Georgeopolous (1990). The latter is a proposal about how a single vector could be signaled by the activity of a group of cells. Synchronization specifies which cells are in the same group. Synchronized population codes are compatible with but do not require population vector coding.
Finally we need to relate the local contextual guidance that we hypothesize to spatial attention, arousal, and other strategic control processes. Local contextual guidance arises automatically from the interactions of local processors and does not require specialized circuitry such as that hypothesized to be involved in spatial attention (e.g. Van Essen et al. 1994; Posner & Rothbart 1994) . Furthermore, the contextual inputs that we postulate are highly specific in relation to both timing and the features that they predict. Spatial attention seems to operate on a longer time scale, and simply to enhance the processing of whatever features are present at the locations and spatial scales attended, rather than to enhance some features and to oppose others on a locally specific basis (Nakayama & Mackeben 1989; Krose & Julesz 1989). Nevertheless, if there are local contextual interactions of the kind that we postulate then attentional control processes will operate upon the synchronized population codes that those interactions produce, possibly themselves using mechanisms that increase the synchronicity of attended items (see Tiitinen et al. 1993 for empirical evidence on this from EEG recordings, and see Goebel 1993 for computational studies).
The idea of population coding has a long history, with Hebb's (1949) notion of the cell-assembly serving as the leading representative. The further possibility that synchronized activity on a fine time scale specifies which sub-set of neurons are grouped to form the population code at any moment also has a long history (e.g. von der Malsburg 1981; Wang et al. 1990; Crick & Koch 1990; Eckhorn et al. 1991a,b; Engel et al. 1992; Tononi et al. 1992a,b; Abeles et al. 1993a,b; Bienenstock 1995; Yamaguchi & Hiroshi 1994; Goebel 1993). It has also been shown how synchronization can play a major role in neural network architectures based upon adaptive resonance (ARTMAPS) and upon the boundary contour system (BCS) (Grossberg & Somers 1991; Grossberg 1993).
Various versions of the distinction between RFs and CFs also occur in prior theories. The coupling connections referred to in previous discussions of the substrate of synchronization (e.g. Singer 1990; 1993; 1994) are an example of what are referred to here as CFs. Further examples are the linking connections proposed by Eckhorn et al. (1991a), and the fast enabling links proposed by Hummel and Biederman (1992). Some aspects of the distinction also occur in theories that do not rely upon synchronization. Ullman (1994), for example, proposes that there is a bi-directional, bottom- up and top-down, flow of information in which activity in one stream "primes" activity in the equivalent units in the reverse stream. This priming is such as to increase the probability of transmission of any RF information with which it agrees. It uses mechanisms that differ from those of CFs but has similar effects in that it modulates the probability of signals being transmitted in very locally specific ways, and it does so without corrupting the transmission of RF information.
One of the earliest theories with a distinction that is analogous to that between RFs and CFs is that of Edelman (1978, 1989). This approach has been developed using highly detailed synthetic modeling (Reeke et al. 1990), which uses large simulations (from about 1 to about 3.5 million connections), with many biological features from intra-cellular processing to the overall principles of connectivity being built-in. The models developed achieve perceptual grouping (Sporns et al. 1991), form recognition that is independent of color and position (Tononi et al 1992b), and also account for several other perceptual phenomena. Detailed analysis of their fine-grained internal temporal dynamics fits well with that observed in cortex (Sporns et al. 1989; Tononi et al 1992b). Phasic reentrant signaling is crucial to the success of these simulations. Its functional role and mechanisms are closely analogous to that proposed here for contextual input. Our approach differs from theirs in minor differences of emphasis, however. For example, we put more stress upon simplifying computational studies, upon formal descriptions of the underlying computational goals, and upon the possibility that information supplied by the contextual connections could guide RF learning thereby helping to establish some of the functional specialization that studies of reentrance have so far built-in.
The use of predictive relationships between separate streams of processing to guide learning within streams has also been studied previously (e.g. Becker & Hinton 1992; Becker 1996; Schmidhuber & Prelinger 1993; De Sa 1994 a,b; Stone & Bray 1995, Stone 1996). Although there is basic agreement between our work and that of Becker and Hinton (1992), there are some important differences. One difference is that we emphasize the use of context to coordinate ongoing activity and to form synchronized population codes, whereas Becker and Hinton communicate information between streams of processing for the purposes of learning only. The reason they did this was to ensure that distinct streams of processing could not increase the mutual information in their outputs simply by driving each other. Thus in the approach of Becker and Hinton (1992) local processors receive inputs that are used to change synaptic strength without directly affecting post-synaptic activity. We know of no biological evidence for such a process. Furthermore, a second major difference that follows from the first is that in their approach there are no cross-stream predictions to learn, whereas in our approach the CF predictions play a major role because they embody the knowledge that is used to integrate ongoing activity and form synchronized population codes. A third difference is that in our approach a single parameter specifies the balance between maximizing information transmission within streams and maximizing coherence between streams. No such parameter exists within their approach.
The above considerations lead to hypotheses that we expect to be controversial. The first is that there are basic computational capabilities that are common to many different cortical regions and to many different species. Second, in relation to the general functional role of any such capabilities, our hypothesis is that they include processes that gradually adapt those computations to the general statistical structure of the world in which the cortex finds itself, and they do so by maximizing the transmission of information that is predictably related to the context within which it occurs. Third, in relation to coding, we argue for synchronized population codes. Such codes contrast with single-cell codes in that they convey information about internal structure, and they contrast with the more usual form of distributed code in that stored knowledge is used to group the elements into coherent sub-sets. Fourth, in relation to the short-term processing dynamics, we hypothesize that local processors use contextual predictions to guide processing but without confounding those predictions with the information that they transmit about their receptive field inputs. This contrasts with the assumption, common to many connectionist theories of cognition, that local processors treat all of their specific informative inputs in essentially the same kind of way. Fifth, in relation to learning, we propose that RF features that are predictably related to the context within which they occur can be discovered, and that this occurs together with discovery of the predictive relationships between them. This contrasts with the common assumption that feature discovery is independent of associative learning. Finally, in relation to epistemology, we suggest that by discovering latent variables within diverse data-sets and the relations between them the local processors are in effect discovering distal variables and relationships. As a consequence they lay foundations for representation and meaning. Nevertheless, we will argue that these foundations do not constitute intentional representation proper because such local processors do not distinguish between the signals that they receive and the distal causes from which those signals arise.
It is well established that cortex contains many specialized regions but our central concern is with their internal organization. In what ways is it common, and in what ways does it vary? Although differences exist there is a widespread belief in commonalities: "It is easy to recognize a histological (e.g. Golgi) preparation as being cortex rather than cerebellum or tectum. It is much more difficult to tell whether it is human or bovine, motor, sensory, or associative cortex.", (Braitenberg 1978, page 444); "Laminations and vertical connections between laminae are hallmarks of all cortical systems, the morphological and physiological characteristics of cortical neurons are equivalent in different species, as are the kinds of synaptic interactions involving cortical neurons. This similarity in the organization of the cerebral cortex extends even to the specific details of cortical circuitry.", (White 1989, page 179); "Despite the many detailed properties that can be used to differentiate among the various cortical areas, the common properties of all the cortical areas are overwhelming. The same cell types, the same types of connections, and the same distributions of cells and connections across the cortical depth are found in all parts of the isocortex. These properties of the cortex are markedly different from those found in the other parts of the brain.", (Abeles 1991, page 33). If there are commonalities then it is crucial to find out what they are. For extensive reviews of this issue see Edelman and Mountcastle (1978), Rakic and Singer (1988), Martin (1988), White (1989), Shepherd (1990), Braitenberg and Schuz (1991), and Abeles (1991). Commonalities may exist at a number of different levels of organization and with respect to various aspects of function. Some may arise from small populations of pyramidal cells and their associated local circuit neurons, such as proposed for the "canonical circuit" of Douglas and Martin (1990), or the "basic circuit" of Shepherd and Koch (1990). Others may arise at lower levels such as the morphology and physiology that is common to pyramidal cells. Note, therefore, that a common multi-cellular circuit is not necessarily implied by the hypothesis of common foundations for cortical computation, because some of them may arise at other levels of organization.
The basic homogeneity of the neocortex is widely thought to imply common information processing operations: "The typical wiring of the cortex, which is invariant irrespective of local functional specialization, must be the substrate of a special kind of operation which is typical for the cortical level.", (Braitenberg 1978, page 444); "It is taken as an article of faith that there is an information processing algorithm unique to cortex that is a product of the regularities of its architecture." (Stryker et al. 1988, page 133). "For many anatomists, it seems perverse to regard the visual cortex as an ad hoc collection of specialist circuits, rather than a set of basic circuits adapted to perform many different tasks. ..... For the neocortex, an unconventional class of models needs to be developed - models that are neural networks, but based directly on the biology; derived from visual cortex, but not designed to solve a particular problem in visual processing.", (Douglas & Martin 1991, pages 291-292). Such views have a long history (e.g. Lorento de No 1949; Edelman & Mountcastle 1978; Rockel et al. 1980).
The belief in commonalities is supported by evidence that cortex contains some generalized learning algorithm that adapts each region to the input that it receives. For example, it has been shown that sensitivity to visual features can be induced in the primary auditory cortex of neonatal ferrets by replacing its normal auditory input with visual projections (Sur et al. 1988). Similarly, it has been shown that visual cortex has the potential to develop an array of functional units that is appropriate to the somatosensory input (Schlaggar & O'Leary 1991).
Although these arguments for commonalities have force, they are not conclusive. Noting similarities will not be convincing until we can clearly see how they provide capabilities that are of common utility. Differences that are critical from a computational point of view may not be obvious from an anatomical or physiological point of view. Furthermore, the suggestion that some form of columnar organization is common to the whole of cortex can be criticized (e.g. Swindale 1990; Purves et al. 1992). It is therefore important to note that although "cortical columns" are not central to the hypotheses developed here, criticism of this idea suggests limitations upon anatomical arguments for commonalities.
Functional specialization is also a major feature of cognitive organization. This has been established by studies of both normal and brain damaged subjects, with cognitive neuropsychology providing a rich source of data and theory (Ellis & Young 1988; Shallice 1988; McCarthy & Warrington 1990). Functional specialization is most firmly established for perceptual and motor functions. Its existence and nature within the highest level functions such as strategic control is less firmly established but there is some evidence for it even there (Shallice 1988; 1991).
The inferences drawn from cognitive and neuropsychological investigations are often shown in diagrams of functional specialization and information flow. Our primary concern here is not with this system level of organization, however, but with the operations that are performed by the various cognitive sub-systems. What are these operations, and which, if any, are common to different sub-systems? Mapping the cognitive architecture is a complex and important task, but adding or deleting sub-systems and routes between them will only be crucial to the search for commonalities to the extent that this changes the set of basic computational capabilities required. What those capabilities are is not obvious, and this issue needs wider discussion.
One simple aspect of cognitive neuropsychological theory that suggests common operations is that sub-systems are often distinguished by the content of the information with which they are concerned. This suggests that they differ primarily in what they operate upon, rather than in the operations that they are required to perform upon that information.
In contrast to our emphasis upon commonalities, studies of basic cognitive processes can give rise to skepticism about the value of a search for general principles. Crick reports the view to which Rama Ramachandran has been led by his elegant and ingenious psychophysical studies as follows "It may not be too farfetched to suggest that the visual system uses a bewildering array of special-purpose tailor-made tricks and rules-of-thumb to solve its problems. If this pessimistic view of perception is correct, then the task of vision researchers ought to be to uncover these rules rather than to attribute to the system a degree of sophistication that it simply doesn't possess. Seeking overarching principles may be an exercise in futility." (Crick 1988, page 156). Even with respect to Ramachandran's argument, however, Crick then adds: "It is, of course, possible that underlying all the various tricks there are just a few basic learning algorithms that, building on the crude structures produced by genetics, produces this complicated variety of mechanisms." (1988 page 156).
The cerebral neocortex evolved as an add-on to pre-existing neural systems, and has expanded rapidly at various stages of mammalian evolution (Jerison 1973). The speed of this evolution has been used to support the view that it embodies a multi-purpose form of computation: "Neocortex has expanded rapidly in phylogeny by creating multiple new areas. While mammals with very small cortices have behavioral capacities no more impressive than noncorticate animals, the capacity for rapid phylogenetic change may be the most important feature of cortex.", (Stryker 1988, page 133); "There is a separate and important evolutionary function that a generic principle for the development of a perceptual network layer - whether it be infomax or some other principle - can serve. Suppose that an evolutionary mutation produces a modified eye, or merges the auditory signals into the visual pathway at some new point. If there were no generic principle for layer development, we might imagine that mutations would have to occur simultaneously in the processing function of several layers, for those layers to be able to use the novel input properly. But if there is such a generic principle - one that applies to each layer regardless of what type of input reaches it - then the novel input will automatically be processed in accordance with that principle. This suggests that the existence of a generic principle may greatly increase the likelihood of a mutation being adaptive.", (Linsker 1988, page 116 - 117).
Another evolutionary argument for commonalities arises from the comparative study of learning. After an extensive search for basic differences in learning abilities across various species Macphail (1987) concluded that all of the problem solving abilities of non-human animals arise directly from a common basic associative process. Furthermore, the common process that he inferred from these comparisons is one that learns the causal links between events, i.e. one that learns what predicts what.
Evolutionary arguments can also be used to oppose the hypothesis of commonalities, however. Tooby and Cosmides (1995) say that the evolutionary perspective entails the functional analysis of niche-differentiated cognitive and neural machinery that is unique to the species: "the human cognitive architecture is far more likely to resemble a confederation of hundreds or thousands of functionally dedicated computers, designed to solve problems endemic to the Pleistocene, than it is to resemble a single general- purpose computer equipped with a small number of general- purpose procedures such as association formation, categorization, or production-rule formation", (Tooby and Cosmides 1995, page 1189). The denial of a small number of general-purpose learning procedures is a crucial part of this perspective. Gallistel (1995) concludes that the catalog of special-purpose learning procedures, such as the ability of birds to learn the position of the celestial pole, could be enlarged indefinitely. (see Note 4)
These arguments do not settle the issue, however. Neither the presence of highly specific abilities nor the absence of a single all- powerful ability implies that there are no abilities that are common to many different species and to many different cortical regions. To rebut the view that classical and operant conditioning are general purpose procedures Gallistel (1995) proposes that they are specialized for the solution of problems in multivariate, non- stationary time series analysis. This enables them to figure out what predicts what. Such a capability may not be all-purpose but it is far less specialized than an ability that can learn only the position of the celestial pole.
Impressive advances in the theory and technology of neural computation since 1980 have greatly encouraged our search for commonalities. This is because they show that powerful multi- purpose capabilities can be implemented in neural systems ( e.g. Rumelhart & McClelland 1986; Gluck & Rumelhart 1990; Amit 1989). They suggest that these capabilities are likely to contrast with those of conventional von Neumann computation, and it has often been proposed that this contrast depends upon the use of distributed representations or population codes: "Distributed representations give rise to some powerful and unexpected emergent properties. ... For example, distributed representations are good for content-addressable memory, automatic generalization, and the selection of the rule that best fits the current situation. ... Thus, the contribution that an analysis of distributed representations can make to these higher-level formalisms is to legitimize certain powerful, primitive operations which would otherwise appear to be an appeal to magic ", (Hinton et al. 1986, page 79). This viewpoint is important because it suggests that cognition may be based upon computational primitives that are not obvious a priori, but which are of general utility.
This section is concerned with what Marr (1982) calls computational theory. If we are ever to understand how the cortex works then we must understand the work that it does. What that work might be at the level of local cortical circuits is far from obvious. The hypothesis being examined here is that it includes the maximization of coherent variation, i.e. transmitting as much information as possible while keeping it coherently related to what is going on elsewhere, and thus keeping it "meaningful". These studies are designed as simplifying abstractions, not as detailed models of biological systems. Their goal is to make the underlying computational task and strategy clear (Marr 1982; Sejnowski et al. 1988; Phillips 1996). This will make it easier to build and interpret detailed models that embody that strategy, and to design experimental paradigms to determine whether it is used by real biological systems.
If the role of context is to modulate transmission through local processors so as to emphasize coherent outputs but without corrupting the information that is transmitted about the RF input then we need a transfer function with the following properties. If there is no RF input then output should remain at the neutral level; if there is no CF input then the output should be monotonically related to RF input in some standard nonlinear and biologically plausible way; if RF and CF inputs agree then the gain of the function relating output to RF input should be increased; if RF and CF inputs disagree then the gain of the function relating output to RF input should be decreased; CF input should affect the confidence with which decisions are made but only the RF input should determine what decisions are made. Physiological studies to be outlined in Section 4.2. show that neurons do indeed receive two classes of input that differ in approximately the way that this suggests. In addition to the classical forms of excitatory and inhibitory input they also receive inputs, such as those mediated by NMDA receptors, whose effects depend upon the prevailing state of activation and which could therefore fulfill the gain-controlling role of CFs (e.g. Fox et al. 1990). A function, A(r,c), giving the internal activation of probabilistic bipolar (-1,1) units has been derived from these computational and physiological considerations (Kay & Phillips 1994,1996; Phillips et al. 1995b), such that
A(r,c) = 0.5r (1+ exp(2rc) )
where r = summed weighted RF inputs including any bias input, c = summed weighted CF inputs including any bias input. An equivalent activation function can be given for processors that produce binary (0,1) outputs. To compute the output probability in the simulations we apply the standard logistic squashing function to the internal activation, so the transfer function as a whole is composed of the activation function followed by the squashing function. The neutral level is given by an output probability of 0.5. The continuous value transmitted between units is the expected value of outputs with this probability, which ranges from -1 to 1. As Figure 3 shows, this transfer function has the properties required. It is not unique, but it is a clear and simple representative of the limited class of functions with these properties (Kay & Phillips 1994,1996). A natural interpretation of the output given by this transfer function is that it gives the probability of a discrete event such as an action potential.
Our computational and empirical studies both emphasize three closely inter-related but distinct forms of neural signaling: relative timing, place, and firing rate. The possibility of using synchronization, or relative timing, to signal grouping using the transfer function just defined follows from the way in which the CFs influence output probability. They increase the probability that outputs from different processors will be produced at the same time if they are mutually predictive, and they reduce this probability if the outputs are opposed. Section 3.4.3 shows that this produces coherent groupings.Place coding is the transmission of information about different features or variables by different cells. It allows for the possibility that a number of different cells could all transmit information about the same feature. This form of signaling is preserved in the computational studies through the use of outputs from different units to signal different variables. Our emphasis upon self- organization implies that this coding is not fully pre-specified, but may change as the system adapts to its inputs through learning.
The classical form assumed for rate coding is the transmission of information through the firing rate of single cells measured over a time period that is long relative to the duration of individual action potentials. This is one way to transmit information about continuous variables such as the output probabilities that are generated by the above transfer function. It is not the only way, however. Imagine a set of cells that produces action potentials with a probability that is essentially the same for all cells, e.g. some form of neuronal group as proposed by Edelman (1978, 1989). The crudest estimate of that probability is given by sampling a single cell for a single brief interval that is long enough for just one binary output, e.g. 1 or 2 msec. A better estimate can be obtained by sampling many of the cells for this brief interval. If the output probability remains approximately constant for a time of more than 1 or 2 msec then an even better estimate can be obtained by sampling many of the cells over that longer time. Thus in this simple case these measures give different estimates, with varying amounts of precision and bias, of the same underlying quantity. This suggests that much of classical single-unit neurophysiology has been developed so as to exploit situations in which the relevant underlying quantity remains constant for long enough to allow adequate estimates to be obtained by sampling a single cell over longer intervals and by averaging across trials. The success of this enterprise does not imply the absence of sets of cells signaling the same underlying quantity in those situations, nor does it imply the absence of situations where that underlying quantity changes rapidly. The only way to obtain an accurate estimate in the latter case would be by averaging the outputs of a set of cells over a brief interval.
Continuous values were transmitted between processors in most of the simulations summarized below. A few simulations have been run in which only binary values were transmitted, however, i.e. single units were used for each output probability and at each iteration of the computation of the short-term dynamics each unit transmitted just a single binary output with that probability. Performance did not seem to be very sensitive to this change, so our working assumption is that high precision in transmitting the output probabilities is not a necessary requirement of the computational approach being developed here.
The goal of maximizing the transmission of coherent information can be specified in a precise but general way by using the concepts of Shannon Entropy, mutual information, and conditional information (Kay & Phillips 1994,1996; Phillips et al. 1995a; b). The Shannon Entropy, H(X), is the average amount of information in any variable X with a given probability distribution. Mutual information, I(X;Y), is a measure of the average amount of information that is shared by the probability distributions of two variables, e.g. X and Y. It is a measure of the extent to which uncertainty about one variable is reduced by observing the other, and it is a commonly used measure of information transmission. If two variables are independent then their mutual information will be zero. Conditional information, H(X|Y), is a measure of how much uncertainty is left about one variable, e.g. X, given that we already know another variable, e.g. Y. From these definitions it follows that H(X) = I(X;Y) + H(X|Y). For a lucid introduction to these concepts see Hamming (1980).
Consider a local processor to have input vectors R and C constituting the RF and CF inputs respectively, and to produce an output vector X. The Shannon entropy in X can be decomposed as follows
H(X) = I(X;R;C) + I(X;R|C) + I(X;C|R) + H(X|R,C)
where the first term on the right is a measure of the information that is common to X, R and C; the second is that common to X and R but not to C; the third is that common to X and C but not to R; and the fourth is information in X that is in neither R nor C. Figure 4 illustrates this decomposition in the case where all components are positive.
A goal for any local processor, X, can now be specified in terms of these four components. Each processor must adapt on the basis of just the information that is locally accessible to it. We therefore specify how X should adapt taking R and C as givens, but allowing for the possibility that any connections upon which R and C depend may themselves be adapting in the same way. We require X to convey information about major sources of variation in R, and in particular those that are predictably related to C. Data compression, as argued for in Sections 1.1 and 1.2.1, will be ensured by constraining processors to have fewer outputs than inputs.
Discovering major sources of variation in R requires the maximization of the mutual information between the output and the RF input, I(X; R), which consists of two components, i.e. I(X; R; C) + I(X; R | C). Consider first the transmitted information that is common to the RFs and CFs, i.e. I(X; R; C). This is the RF information that is coherently related to the context, and we require the local processor to transmit as much of it as possible. If the RFs and CFs arise from separate data-sets then any information that they share must reflect some common distal influences upon those data-sets, and the more diverse the data-sets the more distal those common influences are likely to be. Maximizing this component is therefore likely to transmit information about variables with relevance to the environment within which the system operates. Variables in the RF input that are unrelated to the context, i.e. I(X; R | C), may also be useful at some later layer of processing or stage of learning, however, so this component could also be increased, though with a lower priority than the information that is meaningfully related to its local context. Information that is shared by X and C but not by R, i.e. I(X; C | R), should be decreased because the role of X is to transmit information about R, but not about C. Finally, H(X | R, C) denotes variation in X that is due to neither R nor C, i.e. intrinsic noise that is added by the processor to its output . We would normally wish this component to be reduced.
We can now formulate the general class of objective functions
F = a0 I(X;R;C) + a1 I(X;R|C) + a2 I(X;C|R) + a3 H(X|R,C)
where F is the objective to be maximized, and a0-3 are parameters in the range 1 to -1. These parameters weight the various components of the transmitted information, H(X), with positive values for the components that we wish to increase and negative values for components that we wish to actively decrease. Different objectives can therefore be given as different values of these parameters. The goal of maximizing information transmission within streams requires an objective function F = I(X;R), which is given by setting a0 = a1 = 1 and a2 = a3 = 0. This is the goal studied by Linsker (1988) and many others, and he calls it Infomax. The goal of maximizing the transmission of information that is predictably related to the context requires an objective function F = I(X;R;C). This is given by setting a0 = 1 and a1 = a2 = a3 = 0. We call this goal Coherent Infomax. It is equivalent to that intended by Becker and Hinton (1992) but has a different form because we make explicit the requirement to maximize the transmission of RF information. (see Note 5)
Learning rules can be derived by performing gradient ascent on the objective function, F, relative to the strengths of the RF and CF connections (Kay & Phillips 1994,1996; Phillips et al. 1995b). The dependence of change in connection strength upon post-synaptic activity as specified by these rules is shown in Figure 5 (from Smyth, 1994). The learning rules have this same general form for the RFs and for the CFs. The change in synaptic strength is proportional to pre-synaptic activity, but it is non-monotonically related to post-synaptic activity. The non-monotonicity required is similar to the computationally powerful BCM learning rule proposed by Bienenstock et al. (1982), and to a simpler version, the ABS rule, that has been shown to have biological plausibility (Artola et al. 1990; Hancock et al. 1991a;b). Other learning rules have also been developed within this general approach, including one that maximizes the covariance between the integrated RF and CF inputs (Smyth & Der, 1995; Der & Smyth 1996). This latter rule is easier to implement than that shown in Figure 5, but has the same overall form of dependence on the level of post-synaptic activity, which it shares with the BCM and ABS rules.
A major feature of the learning rule shown in Figure 5 is the threshold of post-synaptic activity below which connection strengths are decreased and above which they are increased. In the rules derived by Kay and Phillips this depends upon three specific dynamic conditional averages of prior activity, i.e. the average prior output probability of the unit taken over all RF and CF inputs; the average output probability for the current RF input taken over all CF inputs; and the average output probability for the current CF input taken over all RF inputs. (see Note 6) Only the first of these three is used by the BCM rule, and its role is to make it harder to increase the strengths of connections to units that have already been too frequently active, and vice versa.
A simulation performed by Darragh Smyth at Stirling shows how context can guide RF learning. Two streams of single-unit processors were linked by CFs, and as a consequence they were able to discover variables that were correlated across streams (Figure 6). Pairs of vertical or horizontal bars were presented that were both vertical on 70% of occasions and both horizontal on 30% at random. The bars were bright on dark or vice versa at random. The signs of the vertical bars were uncorrelated across streams, but the signs of the horizontal bars were perfectly correlated across streams. The sign of the vertical bar therefore carries more information within streams, but the sign of the horizontal bar is more relevant to the correlation across streams.
The course of learning and the receptive fields found after learning are shown in Figure 7. When the goal of learning was specified to be the maximization of information transmission within streams, both local processors became sensitive to the sign of the vertical bar. When the goal of learning was specified to be the maximization of coherence across streams, both local processors became sensitive to the sign of the horizontal bar. The CF connection strengths were then also learned correctly, thus embodying the cross-stream predictions.
Other simulations show that these networks have a rich array of possible behaviors depending upon the goal specified, the activation function used, the learning rate, the starting weights, and the correlations within and between the RF and CF inputs. When the goal is to maximize the RF input that is predictably related to the context, then this is done irrespective of whether that maximizes transmission of information about the RF. When the goal is to maximize transmission of information about the RF, then that is achieved irrespective of contextual predictability. Transition between these two goals is specified by varying a single parameter, a1, from 0 to 1.
The ability to discover the relevant RF variables and to ignore the irrelevant has been shown when i) the relevant variables are the most informative within streams; ii) when they are not the most informative within streams; and iii) even when there is no evidence within streams as to the existence of these particular variables (Kay & Phillips 1994, 1996; Smyth, Kay & Phillips 1994; Smyth 1994; Phillips et al. 1995b). These abilities have been shown within a variety of network architectures including i) networks with multiple streams and with contextual connections between streams; ii) multi-stream networks with two layers of processing and with contextual connections within streams and from higher to lower layers; iii) multi-stream networks in which the RF fields of different streams overlap with each other; and iv) multi-stream networks with contextual connections between neighboring streams only (Kay & Phillips 1994,1996; Smyth et al. 1994; Smyth, 1994; Phillips et al. 1995b). In all cases the correct contextual predictions are learned together with the discovery of the RF features that they relate. The networks learn faster with more streams, and they are sensitive to small correlations between streams (Phillips et al. 1995b).
In the example shown in Figures 6 and 7 the features that were correlated across-streams were the same as each other, i.e. the sign of a horizontal bar, and in the case of visual input the different streams are most easily thought of as arising from different places in the image. Neither of these aspects is crucial, however. The features that are mutually predictable could arise from different features in different streams, from the same cues to a common underlying variable at different places in the image, from different cues to a common underlying variable at the same place in the image, and can also include cross-modal contextual input.
The approach has been developed to include local processors with multiple output units, as shown in Figure 8 (Kay et al. 1996; Floreano et al. 1995). Different units within a processor adapt their RF weights so as to transmit information about different variables, thus increasing the amount of information that each processor can transmit. These networks thus have four levels of organization; units, processors, streams, and layers. Local codes were produced for the relevant variables in the simulation shown in Figures 6 and 7, where there was little room for any other form of coding. When more than one variable is relevant within streams, and when multi- unit processors are used a greater range of possible codings exists (Kay et al. 1996; Floreano et al 1995). The codes produced vary but are not reliably related to the input variables in any simple way. Simple input variables are sometimes signaled by single units, and sometimes they are not. In short, single units within multi-unit processors do develop selective tuning functions, but these are not in general related to the input in an intuitively obvious way, and they vary across streams and different instances of the same network architecture and input pattern set. What matters is that the relevant information be transmitted. How that information is distributed across the available output signals is not crucial. In relation to the study of receptive field selectivity in cortical neurons this suggests that the information conveyed by a population of cells may be more important than the exact way in which it is distributed across the individual cells, and this is consistent with the rich variation in detailed selectivity that is often observed in cortical neurons.
To show how contextual connections can produce coherent grouping Dario Floreano has simulated large arrays of multi-unit processors, and has studied the effects of the CF input on the short-term processing dynamics. In the first simulation 25 streams of four-unit processors were arranged as a 5 x 5 array, with each stream receiving RF input from a 3 x 3 array of units. All four units in each stream received contextual input from all units in their neighboring streams (Figure 8). The training input was collinear horizontal, vertical and diagonal bars displayed upon the 15 x 15 input array. The learning algorithm was set to maximize coherence across streams (i.e. Coherent Infomax, Section 3.2). Learning was found to scale-up successfully to this case, with all streams tending to discover the relevant input variables at around the same time (Floreano et al. 1995) .
The influence of the CFs after learning can be seen in Figure 9, which shows the effects that are produced on one stream of processing by supportive or opposing activity in other streams. The output probabilities for the four units of the processing stream at the centre of the 5 x 5 array when presented with a horizontal bar on its 3 x 3 receptive field are shown for four cases: i. when the RF input is strong; ii. when the RF input is weak and there is no contextual input; iii. when the RF input is weak and there is supportive contextual input; and iv. when the RF input is weak and there is opposing contextual input. Note that the only way that the context can influence output in this architecture is via the contextual connections. The results show that these contextual inputs increase the probability of outputs that are coherently related to that context and decrease the probability of opposing outputs. These effects are produced rapidly, within just one or two iterations of the computations that update the outputs.
The second simulation is analogous to demonstrations such as the Rubin vase and the Necker cube, where two different perceptual organizations are possible but only one occurs at a time. Such phenomena might reflect the effects of mutual contextual guidance in cases where the input provides evidence for both of two internally coherent feature sets that are mutually incompatible, so to study the short-term processing dynamics in such a case a net with 100 single-unit streams was simulated. For the sake of this demonstration, two sub-sets of nine units were specified such that each had positive CF input from all other units within the same sub-set and negative CF input from all of the units in the other sub- set. All streams received inputs that varied randomly across iterations with the constraint that inputs to both sub-sets of nine units were positive.
The output probabilities produced at each iteration of the computations updating the activity of the network are shown in Figure 10. As these iterations involve just one synaptic delay, each iteration corresponds to just a few msec. A coherent sub-set of features emerges from the background within 3 or 4 iterations, but only one of the two alternative organizations emerges at any one instant. Within a cooperative sub-set all outputs emerge from the background simultaneously, and they are less affected by random variation in their inputs than are the responses to the background. These effects are similar to the retrieval of memories in recurrent auto-associative attractor networks (e.g. Hopfield 1982; Amit 1989), but with the important difference that contextual connections just organize the input data into coherent sub-sets without adding features for which there is no evidence in the input. The short-term dynamics of nets with contextual guidance and a feedforward RF connectivity is therefore constrained to remain close to the input. (see Note 7)
The simulations described above used very simple nets, but models of large and complex nets that combine contextual integration with many biological details show that they can preserve the capabilities required of the short-term dynamics. (see Note 8) As far as the contextual guidance of learning is concerned, theoretical considerations and our simulations suggest that this may become easier rather than harder in larger systems, because more streams can provide better guidance.
We have so far had only limited success in using the learning rule outlined in Section 3.3 to discover arbitrary non-linear functions, such as the exclusive-or, in nets with two layers of feedforward RF weights. Nets with feedback contextual guidance from higher to lower layers can sometimes discover such functions, and they do so more often with more streams of processing (Phillips et al. 1995b). They do not solve this problem reliably, however, and one reason for this is that when units in higher layers compute non-linear functions then the feedback predictions conflict with what the units in the intermediate layers are able to compute. Others have shown that learning by maximizing coherence across streams can discover useful higher-order functions when applied to real-world problems, however (de Sa 1994 a,b; Stone & Bray 1995; Stone 1996; see Becker 1996 for a review and further applications). These algorithms do seem more limited in what they can learn than supervised algorithms such as error back-propagation, but, as we will argue in Section 6.4, that does not necessarily make them less plausible as analogies to self-organization in the cortex.
Important outcomes of these computational studies are as follows. (i) The goal of maximizing the transmission of contextually relevant information can be specified precisely within the framework of information theory. This can be done despite the antithesis between information and meaning that is described so clearly by Hamming (1980, p 103), and which has limited the usefulness of information theory to psychology and neurobiology (Horgan 1995). The approach being developed here may therefore help extend the application of information theory to brain function beyond the sensory systems. (ii) Feature discovery and associative learning can cooperate in such a way as to discover variables that are predictably related across diverse data-sets without needing a supervisor that already knows about those variables. (iii) It is possible for the output of a local processor to be affected by contextual input while still transmitting unambiguous information about the RF input. (iv) The form of learning derived analytically from the information-theoretic goals adds further support to the hypothesis that changes in synaptic strength depend non- monotonically upon post-synaptic activity in approximately the way proposed for the BCM and ABS rules (Bienenstock et al. 1982; Artola et al. 1990; Hancock et al. 1991a;b).
Here we outline evidence for context-dependent synchronization of activity in the cortex, for cortico-cortical contextual connections that are involved in this synchronization, and for plasticity of the receptive field and contextual field connections. For more detailed reviews see Singer (1990, 1993, 1994, 1995) and Singer and Gray (1995).
Intracolumnar interactions are shown by simultaneously recording the activity of cells within a small region of cortex. Synchronization of neighboring cells (<200um apart) has been observed in many different species and cortical regions of awake and anaesthetized animals, and can be observed in the local field potential (LFP) as well as in the multi-unit and paired single-unit recordings (e.g. Toyama et al. 1981; Michalski et al. 1983; Ts'o et al. 1986, Gray & Singer 1989; Kreiter & Singer 1992; Gray & Viana Di Prisco 1993). Synchronization of neighboring cells with overlapping RFs and feature selectivity sometimes reflects common thalamic input, but is more often characterized by dynamic properties that can only be accounted for by reciprocal interactions via local intracortical connections. Overall, the evidence suggests that the activity of local neuronal groups of cells is often closely synchronized.
Intercolumnar interactions are shown by simultaneously recording the activity of cells in different parts of the cortex, and synchronization has been observed between cells that are far apart (e.g. >2mm). In that case it occurs predominantly between cells with similar receptive field selectivity, and it reduces with distance (e.g. Michalski et al. 1983; Ts'o et al. 1986; Gray et al. 1989; Schwartz & Bolz 1991). Its occurrence within and between visual areas depends upon whether the cells being observed are stimulated by single or separate objects. For example, synchronization is strong when two cells in V1 with non-overlapping but collinear preferred orientations are stimulated by a single long bar moving across their RFs (Gray et al. 1989). It is weaker when they are stimulated by two short bars moving in the same direction, and it is abolished altogether when the two short bars move in opposite directions. These and many other results support the view that the synchronization of distributed activity in the visual system implements the well established Gestalt principles of perceptual grouping.
The prediction that cells can be part of different groupings at different times depending upon the stimulating conditions has been tested in the primary visual cortex of the cat (Engel et al. 1991) and of the awake behaving monkey (Kreiter & Singer 1994). These experiments show that when two cells with different orientation and direction preferences are stimulated by a single moving bar that is sub-optimal for both then they synchronize, but when they are stimulated by two separate bars, each being optimal for one of the cells, then they do not. Synchronization occurs within the secondary visual area MT of the awake behaving monkey, and depends upon whether the cells are activated by a single common stimulus or by two different stimuli (Kreiter & Singer 1996). Synchronization has also been observed within and between a variety of other cortical regions, including olfactory, somatosensory, and motor regions, as well as across hemispheres (Singer & Gray, 1995).
The specific thalamic afferents to primary visual cortex, V1, provide examples of RF inputs, and the excitatory long-range horizontal collaterals connecting pyramidal cells in V1 with non- overlapping RFs provide an anatomical basis for the CF inputs (Gilbert 1995). Long-range horizontal connections are common in V1 (Rockland & Lund 1983; Gilbert and Wiesel 1989; McGuire et al. 1991) and in other cortical regions (Gilbert 1992), and these connections have a synchronizing action (Lowel & Singer 1992; Konig et al. 1993). It has also been shown that interhemispheric connections have a specific role in synchronizing activity (Engel et al. 1991a). The descending connections from higher stages may also include signals that have a synchronizing role. Such connections are ubiquitous but do not seem to play a primary role in driving the cells to which they project. In accordance with this suggestion it has been found that the activity of cells at different stages of visual processing can be synchronized (Engel et al. 1991b), and that cells at later stages of processing in the visual system can synchronize the activity of relevant sub-sets of cells at earlier stages of processing (Bullier et al. 1992; Sillito et al. 1994).
The anatomical and physiological evidence therefore suggests that the contextual connections within and between regions of the visual cortex are organized as shown in Figure 11. These connections are not distinguished just by their source, but also by the effect that they have on the processors to which they project, because they have a modulatory rather than a primary driving role. One way in which they could fulfill this role is through voltage-gated receptors. Synaptic receptors that are both ligand and voltage-gated have become known as NMDA receptors, and they are widely distributed on pyramidal cells throughout the cortex. These receptors provide a mechanism for voltage dependence because there is a magnesium block on them that is reduced by depolarizing the cell (e.g. Ascher et al. 1988). These channels therefore contribute more effectively to further depolarization when the cell is already partially depolarized, and so they provide a mechanism for gain control. Fox et al. (1990) show that cells in cat visual cortex have one class of receptor channel which provides the primary drive and which summate linearly, and a second class that provides amplifying gain-control (see Fox and Daw (1992) for computational studies of possible mechanisms for these effects). If the long-range horizontal collaterals do provide synchronizing contextual input as hypothesized here then their synaptic inputs should be predominantly modulatory rather than driving. The available evidence suggests that this is so (e.g. Hirsch & Gilbert 1991). Furthermore, if these long-ranging intra-regional connections did contribute to the structure of the receptive field proper, then the receptive fields of cortical neurons would be much larger and more broadly tuned than they actually are.
Note that the hypotheses developed here do not imply that all voltage-dependent channels mediate CF rather than RF inputs. If any such distinction is relevant to cortex it is more likely that RF inputs produce strong activation of both voltage-dependent and non-voltage-dependent channels (Armstrong-James et al. 1993), whereas CF inputs produce strong activation only of voltage dependent channels. Note also that the absolute division of inputs into either RFs or CFs is a simplifying idealization. In cortex individual inputs may contribute to both roles but to varying degrees. Furthermore, there is evidence that although the long- range horizontal input is usually modulatory it can become more effective in generating spiking activity itself when the primary RF input is removed for many weeks (Das & Gilbert 1995).
It is now well established that the activity-dependent self- organization of synaptic connections could provide a substrate for learning in the cortex (Singer 1987, 1990). This is likely to involve long-term potentiation (LTP) and long-term depression (LTD), as well as control by global gating systems, and it applies to mature as well as to developing cortex (Singer & Artola 1994).
The learning rules formally derived from the information- theoretic objectives in Section 3 require synaptic strength on active inputs to remain unchanged when post-synaptic activity is very low, to decrease when it is at intermediate levels, and to increase when it is high. The plasticity observed in slices of adult rat neocortex by Artola et al. (1990) supports these three specific predictions. Furthermore, it has been shown that much of the data on activity-dependent self-organization in the visual cortex can be explained by the BCM learning rule (Clothiaux et al. 1991), which makes the same three predictions.Given the contextual input to local processors the possibility arises for this input to affect RF learning. Indeed, it is unlikely to have no effect, and we have argued above that it could have effects with far-reaching computational consequences. We know of no empirical studies explicitly designed to explore these possibilities, but results reported by Gilbert and Wiesel (1990) may be relevant. Their studies were mostly concerned with the effects of concurrent context upon the response to RF stimulation. Such effects could therefore be due to context modulating post-synaptic activity, but without having any effects upon the strengths of the synapses that carry RF input. However, it was also observed that, under some conditions, prior contextual stimulation altered the orientation tuning function that was later obtained in response to RF stimulation alone. This suggests that the prior contextual stimulation played a role in changing the strengths of RF synapses, but as it could also have been due to other adaptation effects studies using a modification of their paradigm to address this issue more directly would be worthwhile.
Although the basic organization of the CF connections could be genetically specified, shaping by experience is also necessary because RF feature selectivity depends upon experience. In keeping with this it has been shown that long-range horizontal collaterals undergo activity-dependent changes in synaptic strength (Lowel & Singer 1992; Hirsch & Gilbert 1993). Furthermore, there is also evidence that the selection of these connections follows a correlation rule establishing preferential coupling between cells exhibiting correlated activity (Lowel & Singer 1992), in agreement with the predictive role proposed here for the CF connections. These ensemble-forming connections remain susceptible to use-dependent modifications in the adult (Singer & Artola 1994). Indeed, most of the experiments demonstrating LTP and LTD in neocortical synapses have been performed on cortico-cortical connections terminating on pyramidal cells in layers II/III or V (Artola & Singer 1993; Singer 1995), so they could predominantly reflect the plasticity of CF connections.
This section shows how our hypotheses concerning local contextual integration can be tested and developed by behavioral methods, including cases where these are combined with physiological methods. We will argue: i) that studies of the detection and grouping of simple stimulus elements provide behavioral evidence for contextual integration of the kind that we propose; ii) that it is reasonable to search for such processes at the higher levels of cognition, such as the perception of objects and words, because they are implemented by mechanisms that are widely distributed and of general utility; and iii) that there is already theoretical and empirical support for the view that contextual integration at the level of local processors is relevant to these higher levels of cognition.
Our focus is on the visual perception of simple line element displays and on the perception of words. Theories using contextual integration and synchronization have been applied to a wide variety of other tasks with psychological relevance, however; e.g. the cocktail-party problem (von der Malsburg & Schneider 1986); perceptual grouping within and between multiple visual feature domains (Wang et al. 1990; Eckhorn et al. 1991a; Schillen & Konig 1994 ; Tononi et al. 1992b); form from motion and motion capture (Tononi et al. 1992b); object recognition ( Tononi et al. 1992b; Neven & Aertsen 1992; Hummel & Biederman 1992); selective attention and scene perception (Goebel, 1993); the binding of events across widely distributed cortical zones (Damasio 1989); reasoning (Shastri & Ajjanagadde 1993), and consciousness (Crick & Koch 1990). Although these theories differ from each other in detail they all suggest ways in which contextual integration at the level of local circuits can produce useful cognitive capabilities.
If synchronization is relevant to behavior then stimuli should be more perceptible if they produce synchronized activity, and conditions that reduce synchronization should impair perception. These predictions have been tested by comparing the effects of induced strabismus, i.e. squint, in cats on both synchronization and behavior. In strabismic cats neurons driven by different eyes lose the long-range horizontal intracortical connections that initially connect them (Lowel & Singer 1992). As a consequence these neurons do not synchronize, and the cats cannot fuse images from the two eyes (Konig et al. 1993). Furthermore, strabismus often leads to impaired perception in one eye, i.e. amblyopia, but the cortical activity evoked by input to that eye has so far seemed to be normal. The discrimination of gratings by cats using either their normal or their amblyopic eye has now been shown to be closely related to the extent to which the gratings produce synchronized activity (Roelfsma et al 1994a,b). Both discrimination and synchronization are reduced in the amblyopic eye, and in both cases this reduction is greater at higher spatial frequencies. Recordings from the primary visual cortex of awake strabismic cats show that the amount of synchronization is directly related to perception and motor control. Under conditions of stimulation that lead to binocular rivalry, neurons connected to the eye that dominates perception and oculomotor response show increased synchronization and neurons connected to the suppressed eye show decreased synchronization (Fries et al. submitted). Changes in perceptual dominance are unrelated to changes in firing rate, however. These results show that response selection is closely related to synchronization, and they thus support the view that internal grouping through synchronization on a fine time scale is important for the selection of perceptually or behaviorally relevant signals.
Psychophysical evidence for dynamic grouping through a network of local linking connections between feature detectors is reported by Field et al. (1993). Subjects were shown arrays of 256 oriented band-pass line elements (Gabor patches) and had to detect a path of 12 elements with gradually changing orientation that was embedded within the random background. They found that performance was impaired by increases in the distance between the line elements and with the deviation of their orientations from collinearity, but it did not depend upon their relative "phases" (i.e. black on white or vice versa). They conclude that the ability to detect such paths is due to local "association" fields that link feature detectors in an organized way that depends upon their relative RF selectivities (Figure 12). Their results show that these connections link feature detectors over distances that are large compared with the sizes of their receptive fields, and do so in such a way as to implement Gestalt grouping principles of proximity and continuity.
The dynamic grouping observed by Field et al (1993) supports the hypothesis of locally specific processes of integration, and the "association" fields that they infer from their findings are much the same as the CFs that we propose. Field et al (1993) note the similarities between the conditions under which they find good perceptual grouping and the conditions producing synchronization in V1. They also note that these conditions seem well matched to those determining the extent to which pyramidal cells are linked by long-range horizontal collaterals. These similarities are further strengthened by detailed comparisons between perceptual grouping criteria and the anatomy of the tangential intracortical connections which show them to be closely matched (Schmidt et al. 1996).
Finally, Field et al (1993) argue that detection of the path in their studies must have been mediated by the grouped activity of the set of detectors activated, rather than by the activity of a single high level detector because a new path was formed randomly on each trial, and because the bandpass nature of the stimuli precluded their detection by cells with classically defined RFs covering the whole of the path. This argument is of relevance to the issue of the relative roles of local and distributed codes to which we will return below.
The psychophysical evidence just discussed provides evidence for local contextual fields in the grouping of easily detected stimuli. If such fields exist then they could also mediate locally specific effects of context on the detection of faint or ambiguous stimuli. Several psychophysical experiments indicate that this is so (Polat & Sagi 1993, 1994a,b). Such studies show that targets are surrounded by a small region within which additional stimuli suppress target detection, and then by a larger region within which they facilitate detection provided that they are coherently related to the target such as by being collinear or near collinear. They thus suggest that within local streams of processing inhibitory mechanisms force a choice between alternative features, whereas between streams contextual interactions facilitate the detection of coherent features.
A detailed comparison of psychophysical and physiological evidence for such facilitatory effects of context on the detection of target elements is reported by Kapadia et al. (1995). They show that human visual contrast sensitivity is improved by a neighboring suprathreshold line element in a way that is reduced by increases in their spatial separation and with the deviation of their orientation selectivities from collinearity. Using equivalent stimuli in electrophysiological studies Kapadia et al. (1995) also show that the response of superficial layer complex cells to low contrast stimuli in V1 of awake attending Rhesus monkeys, as measured by summing spikes over 200 msec, depends upon the local relations between target and context in a way that is very similar to that seen in the psychophysical experiments. The physiological experiments also showed that these effects were not due to the context encroaching within the RF of the recorded cell, but were due to modulatory interactions between cells with non-overlapping RFs.
The contextual conditions producing enhanced detection of low contrast stimuli in both the psychophysical and the electrophysiological studies of Kapadia et al (1995) are very similar to those producing grouping of high contrast stimuli in the psychophysical studies of Field et al (1993) and synchronization in electrophysiological studies (e.g. Singer & Gray 1995). All four sets of findings show similar effects of spatial separation, spatial frequency, orientation, and collinearity. They may therefore be of great importance in reflecting common underlying mechanisms for local contextual integration. This view is further strengthened by their close match to the anatomy of long-range horizontal collaterals (e.g. Schmidt et al. 1996). If such mechanisms do indeed exist then they will provide an obvious candidate for explaining many other locally specific effects of context. The hypothesis that grouping and contextual facilitation of element perception involve common processes will be discussed further below in relation to theoretical and experimental studies of contextual integration in word perception.
One of our main hypotheses is that context can modulate the transmission of information about something other than the context. Thus, if some input variable is used only for contextual guidance then no explicit information will be transmitted about that variable in its own right.
A dramatic demonstration of this possibility comes from two neuropsychological patients who no longer see the world in color but only in black and white. The first, HJA, has been studied for many years (Humphreys & Riddoch 1987), but it has now been discovered that color does have implicit influences on his detection of luminance contrasts (Humphreys et al. 1992). In one task he had to say whether the top and bottom halves of a rectangular display differed in the second or the first of two intervals. Sometimes the two halves differed only in luminance, sometimes only in color, and sometimes in both. When the color difference was presented without any luminance difference then performance was at chance. This shows his achromotopsia. When the two halves differed only in luminance, performance improved as luminance contrast increased. When a color difference was added, performance improved more rapidly than it did with the luminance contrasts on their own. The second patient, WM, was also tested on these tasks, and was compared in detail with HJA on a variety of other tasks. He has a different form of achromotopsia, but shows the same kind of implicit modulatory effects of color differences upon the detection of luminance differences (Troscianko et al. 1993; Troscianko et al. 1996).
A simple interpretation of these results is that, in these patients, color streams continue to modulate luminance streams, but with their own feedforward outputs no longer functioning properly. Color differences can therefore still influence the detection of luminance differences, but without themselves being perceived. Color differences may influence the detection of luminance differences because they are highly correlated in natural visual images, and this is reflected in the connectivity between color and luminance channels. The correlations are not used to conflate the two variables, however, but to provide contextual guidance. Further evidence for facilitatory effects of color can also be found in psychophysical studies of normal subjects (Gur & Akri 1992; Troscianko 1994). The above findings therefore illustrate a key difference between RFs and CFs. Outputs are "seen" by later stages of processing not as conveying information about the CF input, in this case color, but as conveying information about the RF input, in this case luminance. The introspective reports of the two patients described above suggest that in this case those later stages involve conscious awareness.
To show how analogous effects can be sought using normal subjects we outline experiments that are currently being run at Stirling by Craven et al. on the interaction of target and contextual cues to texture segregation. A 20 x 20 array of small line elements, divided into two halves differing in mean line length, is displayed for 1 sec, and Ss decide whether the display is divided into long and short elements by a vertical or a horizontal boundary. Contextual input is provided by also dividing the array into two halves that differ in the mean orientation of the elements. This boundary is coincident with the length boundary on 70% of trials and is orthogonal to it on 30%. If modulatory interactions are occurring then: i) the effect of context will increase as target strength increases; ii) this will occur just for a low range of target strengths; and, iii) this range will be at higher values of target strength for weak than the strong contexts (Smyth et al. 1996). (see Note 9) We also predict that the perception of the target boundary will be facilitated by a context with which it is coherent. Results obtained so far support these predictions, and include modulatory effects of weak contextual cues that have no direct influence on response themselves.
Until the early 1980s plasticity in sensory systems was widely thought to be restricted to a critical period during development. There is now abundant anatomical, physiological, and psychophysical evidence for such plasticity in adults. Plasticity has been shown in visual, auditory, somesthetic and motor systems. It is particularly dramatic at the cortical level, and has been shown by studies of reorganization following deafferentation, nerve section, and cortical lesions, and from studies of the effects of more subtle changes in the patterns of sensory input received (Kaas 1995; Gilbert 1995). These effects occur on various time scales, and include psychophysical evidence for fast perceptual learning in the visual sensory system of adult humans (e.g. Karni & Sagi 1991; Poggio et al. 1992; Polat & Sagi 1994b). These psychophysical effects are interpreted as being due to changes at sensory stages of processing because they are specific for eye, orientation, and spatial frequency, as well as for spatial position.
Some of these findings can be interpreted as changes in feedforward RF selectivity, such as when major reorganizations of sensory or motor maps occur. In other cases they are more likely to involve changes in the connections mediating intracortical contextual integration (Gilbert 1995). Consider the psychophysical findings reported by Polat and Sagi (1994b). Prior to practice the spatial range within which contextual stimuli facilitate target detection is up to about six times the spatial period of the target (Polat & Sagi 1993). After a few hours of practice, covering the whole range of separations to be spanned, the range of facilitation was increased by at least a factor of three (Polat & Sagi 1994b), probably by strengthening chains of local facilitatory interactions between filters with nearby but non-overlapping RFs.
There are also other ways in which psychophysical experiments could study the learning of contextual predictions. Consider, for example, the techniques for studying local contextual integration described in Sections 5.2, 5.3, and 5.5.2. Each of these could be used in paradigms where the predictive relationships between context and target are manipulated to see whether the effects of local context depend upon experience. There are already some findings suggesting that such studies would be worthwhile; e.g. i) the effects of strabismus outlined above show that experience affects both the CF connections and the probability of synchronization; ii) further studies of the achromotopsic patient WM suggest that the color- luminance interactions adapt to correlations that are experimentally induced between them (Troscianko et al 1995); and iii) there are large practice effects in the texture segregation task outlined in Section 5.5.2.
An example of the effect of learning on cross-modal integration is provided by Durgin (1995) who presented random dot patterns that had a greater dot density on either the left or the right. A tone was presented simultaneously and its pitch was perfectly correlated with the side of greater density. After 180 such pairings, a staircase procedure was used to measure perceived equivalence between left and right at an intermediate density. Simultaneous presentation of a tone affected matching such that the side that had been more dense when that tone was presented during training was seen as being more dense than it should have been to give an accurate match. The extent to which such cross-modal contextual learning affects discrimination within modalities when later tested in the absence of the cross-modal stimulus is not yet clear, but it is clearly amenable to further psychophysical study. Physiological evidence for effects of auditory stimulation on activity in the visual cortex (e.g. Spinelli et al. 1968; Fishman & Michael 1973) encourages such experiments. Psychophysical studies of learned cross-modal effects upon grouping would be of particular interest given the evidence suggesting that contextual interactions play a major role in grouping.
In sum, there are large effects of learning in both mature and immature sensory systems. These effects often include changes in RF connectivity, but changes in CF connectivity are also likely, as expected on the grounds that the contextual predictions would otherwise be invalidated by RF changes. Finally, psychophysical experiments designed specifically to study changes in contextual integration due to learning are both possible and worthwhile.
Ways in which binding through synchronization could help produce object recognition performance similar to that of humans has already been discussed in detail elsewhere (e.g. Hummel & Biederman 1992; Mozer et al 1992) so we restrict ourselves to a brief outline. One central idea is that shape recognition could generalize well across irrelevant dimensions, such as position, if shape descriptors are insensitive to those dimensions. Synchronization is used to bind shapes to positions to show what shapes are where. The Hummel and Biederman (1992) model has seven layers through which image features are combined into parts, and then into structural descriptions of objects in terms of their parts and relationships. Synchronization is used to group image features into volumetric parts, and to bind parts to relationships, and is achieved through a network of fast enabling links that are similar to our CFs. Crucial aspects of human performance displayed by the model include recognition that generalizes well across position, size, left-right reflection, and rotation in depth, but poorly across rotation in the picture plane. Goebel (1993) developed a similar model that has more flexible synchronizing connections and which also incorporates mechanisms that produce performance consistent with psychophysical evidence for selective spatial attention. Like human performance, such systems are highly dependent upon internal grouping processes such that the same image grouped in different ways can give rise to very different outputs. These demonstrations of computational feasibility and similarity to human performance are encouraging, but direct tests of the hypothesis that timing on a fast time scale is important for grouping in object perception are also needed, and this is a major goal for further research.
Word perception is a major focus for studies of contextual integration within cognitive psychology, so here we discuss several ways in which the two areas of research can be related. We note similarities between context effects in word perception and in the perception of simple line elements, and conclude that as the latter can be studied by both psychophysical and physiological methods this may bring a rich new source of evidence to bear on cognitive conceptions of contextual integration. We relate this to dynamic grouping and to synchronized population codes, and outline neuropsychological evidence for such codes in word perception. Finally, we note a possible role for contextual guidance in learning to perceive words. Various paradigms have shown that both letter and phoneme perception depend upon local context. Cattell (1886) showed that letters are recognized more accurately in the context of a familiar word, and the many subsequent studies of such phenomena include demonstrations that forced-choice discrimination between a pair of prespecified letters is better if the test letter appears within the context of a familiar word or pronounceable non-word (e.g. Reicher 1969; Johnston & McClelland 1973; Rumelhart & McClelland 1982). Strong effects of local context also occur within speech perception. For example, Massaro & Cohen (1983) presented computer generated syllables such as /sli/, /tri/, /sri/, and /tli/, to subjects and asked them to classify the middle phoneme as an /l/ or an /r/. This phoneme was presented at seven different levels on a continuum from being very /l/-like to being very /r/-like by varying the frequency from which the third formant (F3) began at the phoneme's onset. Each was factorially combined with one of four different leading consonants /s/, /t/, /p/, /v/. Perception of the central phoneme was predominantly determined by the direct acoustic cue to that phoneme as given by F3 frequency, but it was also affected by the preceding consonant, particularly when the direct cue was most ambiguous. Similar effects have been shown in studies of reading. For example, when an ambiguous lower-case letter that can be read as either an e or a c is placed in contexts that support one or the other alternative then identification is biased towards the contextually appropriate alternative (Massaro 1979).
These effects are similar to those in the perception of simple line elements displays (e.g. Polat & Sagi 1993, 1994a,b; Kapadia et al. 1995) in several ways. Although they occur at a higher level of analysis context effects in word perception also occur rapidly and automatically. The effects are locally specific in that they depend upon the particular properties of the entities that are interacting. For example, the context /s/?/i/ supports some target letters and not others, just as the context of two collinear oriented line elements supports some intervening oriented line elements and not others. Furthermore, in both cases the interactions are such as to emphasize things that are expected in that context, rather than things that are unexpected. In both domains the effects of context on the perception of individual elements seem to be greatest when the direct stimulus evidence is most ambiguous. Yet another similarity is that contextual interactions in both domains are affected by learning. Finally, there are effects of object-specific knowledge on the detection of line segments (e.g. Weisstein & Harris 1974; McClelland 1978) that may be analogous to the effects of word-specific knowledge on the forced-choice discrimination between letters. At present, therefore, there seems to be enough similarity to justify the search for a common explanation for these various effects of context.
Theories of contextual interaction in word perception have been undergoing vigorous development for many years and can now account for many details of performance with impressive precision (e.g. McClelland & Rumelhart 1981; Rumelhart & McClelland 1982; McClelland & Elman 1986; Richman & Simon 1989; Massaro 1989a). The problem is not that no explanation is available but that each of several diverse theories can fit the data so well that it is difficult to know what inferences to draw. Thus we need to find common elements, or fundamental mechanisms, that may appear in different forms in the different theories, and which are crucial to the cognitive functions with which they are concerned (Richman & Simon 1989). As the theories all aim at generality one way to do this is to widen the range of phenomena to which they are applied, including psychological tasks that can be related to known physiological mechanisms if possible. It is particularly appropriate to relate Interactive Activation and Competition (IAC) theories (e.g. McClelland & Rumelhart 1981) to anatomy and physiology because they explicitly use a neural style of computation, and played a major role in promoting the use of that style in cognitive theory. Basic aspects of these models that are broadly compatible with what is known about cortical physiology include having just a few different levels of analysis; having many replications of processors spanning feature space at different input positions within a level; and having local inhibitory relations to force a choice between incompatible alternatives within levels. For further in-depth discussions of these theories and related issues see McClelland (1991), Massaro and Cohen (1991), and Movellan and McClelland (1995).
We now discuss five major unresolved issues from the perspective of our general approach and of the analogy between context effects in the perception of words and of simple line elements: 1. What is the architecture of information flow, and in particular to what extent do the effects of context depend upon the feedback of information from higher to lower levels of analysis? 2. How does contextual information affect processing, and in particular does it do so in essentially the same way as direct evidence from the target? 3. Do the mechanisms producing the effects of context on the perception of ambiguous or just detectable stimulus elements also play a role in dynamically grouping those elements? 4. To what extent does each level use local as opposed to population codes for those entities with which it is concerned? 5. Is the goal of maximizing coherence between distinct streams of processing relevant to learning within streams? (see Note 10)
1. Where does the contextual information received by local processors come from, and in particular does it include feedback from higher levels of analysis? The detailed properties of context effects in the perception of line elements strongly suggest that they are due in part to long-range horizontal collaterals that directly link distinct entities within the same level of analysis (e.g. Singer 1995; Kapadia et al. 1995). If there are connections that are specialized to mediate contextual interactions within the segmental (i.e. phonemic or letter) levels of word perception, for example, then this could contribute to the superiority of regular over irregular nonwords, and may help explain some of the more rapidly occurring components of contextual interaction. Contextual connections within segmental levels may not be the best way to explain the effects of word-specific knowledge, however, and these are often assumed to involve activity at a higher level of analysis that is specialized to distinguish words. The question that then arises is whether activity at this level influences processing at the segmental levels. This issue is unresolved but an analogous issue can be studied at lower levels of visual processing using physiological techniques. Studies of the role of feedback from V1 to LGN provide evidence that it synchronizes the firing of those LGN cells that combine to form higher level entities in V1 (Sillito et al. 1994). This suggests that higher levels do influence processing at lower levels but in a way that is distinct from the ascending RF input and more like the process of contextual integration within levels. As feedback connections are ubiquitous within the cortex, this could be a general feature of cortical processing, so it would be worthwhile obtaining further evidence on it by combining psychophysical measures with electrophysiological measures in V1 as in Kapadia et al (1995), but measuring both rate and relative timing in multiple single unit recordings, and using stimulus elements that either do or do not combine together to form a single familiar entity at a higher level. The possibility that a major role of feedback is to group activity at lower levels will be examined further under point 3 below, which discusses grouping processes in word perception.
2. Is contextual information used in essentially the same way as target information in word perception? If not then contextual integration is not a fundamental issue within that domain. Note that as contextual interactions may be reciprocal the question is not whether some parts of a word are processed just as targets and others just as contexts, but whether words are composed of distinct but interacting parts such that the processes by which they interact differ from those by which they are kept distinct.
One of the clearest ways of distinguishing between contextual and direct stimulus effects in word perception is that it is the direct stimulus input and not the context which determines the alternatives between which choice is made; i.e. context influences the choice between those competing alternatives for which there is direct but ambiguous stimulus support. (see Note 11) As evidence for this view Massaro (1989b) notes that context does not by itself produce the phoneme restoration effect (Warren 1970) because some bottom-up support for the presence of a missing phoneme is required, even if only in the form of a brief noise burst. This possible asymmetry in the roles of context and target is not apparent in the experiment of Massaro and Cohen (1983) because target support was always available. It should therefore be possible to make it more apparent by presenting stimuli such as /li/, /ri/, /si/, /ti/, /sli/, and /tri/ with various levels of ambiguity for the /l/ or /r/ phoneme and asking subjects to decide whether an /l/ or /r/ or neither is present. The distinction being proposed here predicts that context will have a large effect in the presence of an ambiguous target while having little or no effect in its absence. In contrast, the direct stimulus cues should have large effects whether the context is present or not. Such a result would not be surprising but it would show how the asymmetry in the roles of RFs and CFs can be reflected in performance. Furthermore, this asymmetry in dependence would contrast with the various ways in which information from different sources can be combined such that they all influence decision in essentially the same way (e.g. Massaro & Friedman 1990), as do different sources of information from within the RF. An important aspect of this latter form of combination is that, although the different sources may be independent prior to combination, their individual contributions are not kept distinct in the output decisions produced. Thus, in contrast with the effects of CF inputs, all RF inputs help determine the meaning of the decisions to which they contribute.
The distinction between the roles of RFs and CFs may also be relevant to the long-running debate between theories that emphasize the use of internal knowledge to go beyond the input from external stimuli and theories that emphasize remaining faithful to that input so as to avoid hallucination. The latter danger is often noted in discussions of the effects of context in word perception (e.g. Massaro 1989a, Massaro & Cohen 1991), and can be used as an argument for assuming that context and target do not interact. If context has distinct effects upon processing, then instead of having to choose between avoiding hallucinations and allowing contextual interaction we can have both, including direct contextual interactions within levels, which might otherwise overwhelm stimulus processing with hallucination.
3. Sections 5.2. and 5.3 outlined evidence suggesting that for simple line element displays the grouping of elements into coherent wholes depends upon the same knowledge embodied in the same mechanisms as do the effects of context on the perception of the individual elements. If this is so, and if contextual integration is achieved in the same way in different regions then it will also apply to word perception. We take it for granted that grouping processes are a crucial part of word perception at both lexical and sub-lexical levels, and this is easily demonstrated. At the lexical level, for example: thismustbegroupeddynamicallyusingknowledgeofspecificwords. Internal grouping processes also occur at sub-lexical levels. PIGHAM, for example, will be pronounced either with or without consonants in the middle depending upon whether or not IGH is grouped to form one grapheme. If grouping is computed dynamically, as it must be if it is signaled by relative timing, then such groupings could change rapidly from moment to moment, thus making various possible alternative groupings successively available. Studies of a neurological patient, who will be discussed further below, illustrate the relevance of such dynamic grouping processes to word perception. She was quite unable to read PIGHAM as a single unfamiliar nonword, but could read it easily when she saw it as two familiar words (Goodall & Phillips 1994). She made it clear in various ways, e.g. by drawing a pencil line between the appropriate letters, that this involved feedback of grouping information from a lexical level to a level containing a precise topographic map of the individual letters.
There is also evidence from normal subjects that the effects of word familiarity on perception involve internal grouping processes. For example, familiarity reduces asymmetrical left-to-right letter position effects that can be explained as being due to processing letters separately in unfamiliar stimuli (Phillips 1971). Many other effects can also be explained as being due to processing familiar items as a single coherent whole, or `chunk', but processing unfamiliar items as a number of separate chunks (Richman & Simon 1989). One implication of these considerations is that in order to understand the role of feedback in word perception it may be worthwhile emphasizing tasks that reveal the effects of grouping processes. For example, if grouping and disambiguating involve a common mechanism then disambiguating interactions between elements should depend upon whether those elements are grouped together or not. Consider the disambiguating effects of local context in the experiments of Massaro and Cohen (1983), for example. In those experiments the phonemes that interacted were always part of a single word. If more than one word were presented then it would be possible to test whether or not the interaction between neighboring phonemes depends upon their being perceived as parts of the same word or phrase.
4. Are familiar words signaled by local or by population codes? A fundamental difference between these two possibilities is that population codes can transmit information about inner structure but local codes cannot. This difference was used to provide evidence on this issue by studies of two neuropsychological patients whose ability to read and write is very largely restricted to words with which they are familiar (Goodall 1994; Goodall & Phillips 1994; Phillips & Goodall 1994). Their reading and writing therefore provides a direct window on the contribution of lexical knowledge when isolated from sub-lexical processes that treat the input as a string of separate letters or phonemes. Several experiments were run with the general format of giving the patients visual discrimination training on a set of nonwords and then testing their ability to write those nonwords to dictation as compared with matched but unfamiliar nonwords. (see Note 12) These experiments show that the patients can write visually familiar nonwords to dictation accurately and fluently on first hearing them spoken aloud even though they have never written them before and cannot write any of the unfamiliar nonwords. This is a surprising result for theories proposing that the role of word recognition is to produce a local code that indicates which word has been recognized, because that cannot explain how familiarity makes a description of internal structure available for matching with input from other modalities and for generating appropriately structured output. These results do not show that local codes do not exist for familiar words, but they cannot be explained by proposing such codes. (see Note 13) Furthermore, if the output of word recognition systems is a structured description then this could help explain how output lexicons could obtain knowledge of word structure. If it is assumed that they just receive local codes from input lexicons and then generate the appropriately structured output then it is not obvious how they acquire knowledge of those structures. These findings therefore support theories that use distributed or population codes for familiar words, but other evidence outlined above suggests that they are processed as single whole items or chunks. How can these two conclusions be reconciled? If familiar words are given a local code then we have difficulty explaining the transmission of information about their structure. If we assume that they are given a classical distributed code then we have difficulty explaining the effects of familiarity adequately, because such codes can transmit descriptions of novel items as well as of familiar items. (see Note 14) Theories using distributed codes that are formed by internal grouping processes may be able to overcome both difficulties because then the way in which elements of a distributed pattern of activity are combined depends upon internal grouping processes that embody knowledge of familiar or coherent combinations. This assumes a highly distributed input to all stages of cortical processing, and emphasizes the importance of being able to separate that input into distinct sub-sets on the basis of their internal coherence. It also suggests the possibility that information that is conveyed as a number of separate groupings at one level or stage of processing may be conveyed as a single coherent grouping at later levels or stages.5. The effects of context on disambiguation and grouping in word perception imply that the contextual predictions are learned, but do contextual inputs from other streams also influence feature discovery within streams? Possible examples of such an influence are the effect of learning to read on phonological awareness (Bentin et al. 1991), and of phonological awareness on learning to read (Bentin 1992). Another example is provided by Kanevsky (1989) who gave subjects with impaired hearing supplementary tactile input generated from the speech signals that they were lip-reading. Their performance when lip-reading alone was significantly improved by several months of training with the coherent cross- modal inputs. Thus processing of features within one stream was affected by their correlations with features in another stream.
The possibility that coherence between distinct streams could be of relevance to speech processing has been studied computationally by Becker (1996) and de Sa (1994 a,b). Both emphasize the need for an algorithm that can discover the features that distinguish words in the absence of any external supervisor that tells the network what those features are, and both propose that such an algorithm could use the statistical dependency between distinct streams of processing. Becker applied the Imax algorithm to the Peterson-Barney data consisting of the first and second formant frequencies of 10 vowels spoken by many different speakers. She showed that from just these two low level acoustic features alone the Imax algorithm could discover higher-order features that could be used to classify the vowels with better than 75% accuracy. (see Note 15) Cross-modal interactions have been studied by de Sa and Ballard using the task of learning to categorize consonant-vowel syllables that were both heard and seen as a pattern of mouth movements. Tightly synchronized audio and video recordings of 5 speakers saying 5 vowels were fed into two separate streams of processing, which learned to classify their inputs by minimizing disagreements between the two streams. When then tested on each stream of input alone syllable categorization accuracy was better than 90% for the audio input, and better than 75% for the video input. (see Note 16) These demonstrations are valuable because they show how learning by discovering statistical dependencies between separate streams of processing can discover useful features within streams when applied to real-world speech processing tasks.
If there are common foundations for cortical computation then a central goal for neurobiology must be to discover what they are and how they are embodied in cortical structures and processes. The primary issues that arise therefore concern the justification of the search for common foundations, and the promise of the various means by which common cortical organization, common computational primitives, and common requirements of cognitive sub-systems may be revealed. We hope to encourage discussion of these more general issues, as well as of our more specific findings and hypotheses. Many issues arise in regard to the latter, such as ways in which contextual integration and synchronization can be related to attention (e.g. Tiitinen et al. 1993), imagery (e.g. Ishai & Sagi 1995), and consciousness (e.g. Crick & Koch 1990). Here we note just those most closely related to the aspects that we have emphasized above.
The possibility of distinguishing between RF and CF inputs raises a number of specific questions. 1. Can RF and CF inputs be distinguished in the cortex on the basis of the synaptic receptor channels that they activate? One possibility is that CF inputs are more dependent upon voltage-dependent channels than are RF inputs, but more research is required on this issue. 2. Can RF and CF inputs be distinguished on the basis of the anatomical distribution of their input sites? For example, at least one source of contextual input, the long-range intra-regional tangential connections, seems to preferentially contact the apical dendrites of their target pyramidal cells (Gilbert & Wiesel 1983; Kisvarday et al. 1986). 3. If there are contextual connections then do they affect only timing, without having any effect upon the total number of spikes that are produced by the cell or local group of cells, or can they affect both (Konig et al. 1995)? One possibility is that the answer depends on the circumstances, with effects upon the number of spikes, for example, being more likely with weak RF input.
Section 5 suggested several paradigms through which this distinction might be reflected in behavior. Some of these ask whether the effects of appropriately selected contextual variables depend upon the target to a greater extent than the effects of the target depend upon them. From an information-theoretic perspective they seek conditions under which behavior transmits information about the target rather than about the context, even though it is influenced by context. One specific test for this is to see how the effects of context depend upon the strength of the target evidence. If the putative contextual variable does not modulate transmission of information about the target but just contributes directly to the response decision itself then the effects of context will decrease as the strength of the target variable increases (Massaro 1989a,b; Massaro & Friedman 1990). If it does have a modulatory influence, however, then the effects of context will increase as the strength of the target variable increases from low to medium values (Smyth et al. 1996). We have also suggested that some aspects of contextual integration in higher cognitive functions, such as word perception, might be related to cortical neurobiology by analogy to studies of the perception of simple line element displays that can combine both psychological and physiological techniques. Whether the research that this suggests will reward the investment required remains to be seen. However this turns out, it is hard to see how the distinction between RFs and CFs could fail to have psychological relevance if it is biologically valid.
Various doubts have been raised concerning the functional role of synchronization, but biophysical and theoretical arguments suggesting that it has a major role have already been noted ( e.g. Abeles 1982, 1991; von der Malsburg 1981), and Sections 4 and 5 outlined further support for this, including evidence that synchronization and behavior are closely related, and that the stimulus conditions producing synchronization have detailed similarities to those producing grouping in perception. Further evidence could be obtained on this issue by experimentally manipulating the synchronization of spikes on a fine time scale to see whether this then has consequences for behavior. This is technically difficult but one way in which it might be done is by studying how the modulatory effects of context depend upon the precise temporal relationships between target and context. For example, a paradigm for distinguishing modulatory from direct effects such as that described by Smyth et al. (1996) could be used in a texture segregation task in which there are both target and contextual cues to segregation. The stimulus parameters used to specify the target and contextual boundaries could both change rapidly, and with various phase relations. The central question would then be whether the context preferentially modulates the detection of target boundaries with which it is synchronized to within a few msec. To improve the chances that the external timing relations imposed upon the cues will give adequate control of the timing of internal activity the stimuli should be presented under conditions that maximize the use of fast high temporal resolution pathways (e.g. low spatial frequencies, dark adapted Ss, etc.). Such studies are encouraged by results showing that in texture segmentation tasks temporal differences in stimulus onset of as little as 10 msec can be used to segregate figure from ground (Leonards et al. 1996).
As noted in Section 3.5 algorithms that learn by maximizing coherence across streams can discover useful higher-order functions (Becker 1996), but may nevertheless be more limited in what they can learn than supervised algorithms such as error backpropagation (Rumelhart, Hinton & Williams 1986). This does not necessarily make them less biologically plausible, however, and would make them more plausible if their limitations were shared with the processes of cortical self-organization. Unfortunately, those limitations are not yet clear. Some studies suggest that the human visual system has great difficulty learning arbitrary non-linear functions such as the XOR (Thorpe et al. 1989), but more evidence on this is required. Furthermore, in cases where such difficult learning problems are solved this might be because the input is transformed in a way that makes the problem easier (Clark & Thornton 1996), so we need to know how any input data is recoded internally in order to know what internal learning problems have been solved. Major tasks for future research are therefore to determine what functions can and cannot be learned by cortical self-organization, and to compare those with the capabilities and limitations of algorithms derived from computational theory.
Four well-specified learning rules that have been proposed as a basis for cortical self-organization turn out to have a close family resemblance to each other even though they were derived in different ways (i.e. those of Bienenstock et al 1982; Hancock et al. 1991; Kay & Phillips 1994, 1996; Der & Smyth 1996). They all relate the change in synaptic strength to post-synaptic activity in approximately the non-monotonic way shown in Figure 5. Furthermore, there is direct physiological evidence for a dependence of this general form (Artola et al. 1990; Dudek & Bear 1992; Singer & Artola 1994; Kirkwood et al. 1996). Intrator and Cooper (1995) provide detailed arguments for the computational value and biological plausibility of such rules, and for the view that they are common to the hippocampus as well as to both mature and immature cortex. A key feature of all of these rules is the threshold of synaptic activity above which connections are strengthened, and they differ in the way that this is specified. Some keep it fixed, others move it dynamically as a function of prior activity, and the latter differ in what aspects of prior activity are used and in how they are used. It is possible that various versions of this rule exist, with simpler and less computationally powerful ways of specifying the threshold being found in some species and/or neural sub- systems, and more complex and powerful ways being found in others. There is physiological evidence for a dynamically moving threshold (Huang et al. 1992; Kirkwood et al. 1996), and investigations of possible molecular bases for it have already begun (Mayford et al. 1995). The detailed properties of this threshold and its relation to computational theory is clearly a major issue for the biology of learning.
Does context affect RF learning? Gilbert and Wiesel's (1990) observation that prior contextual stimulation altered the orientation tuning function that was later obtained in response to receptive field stimulation alone suggests that it can, but these effects were not studied systematically, and their causes are uncertain. More direct study of this issue is therefore required, and ways in which this can be done were suggested in Sections 4.3, 5.6, and 5.7.2 (point 5).
The ability to form representations of the external world is often thought to be fundamental to cortical computation. We have argued that networks of local processors with contextual guidance can in effect discover distal variables and relationships by discovering mutual information in diverse data sets. This may not be the same as forming explicitly intentional representations of the external world, however (Phillips et al. 1995a). "Representation" is commonly used to describe what sensory and perceptual systems do, and in that general use it is synonymous with "the transmission of information". It also has a more specific meaning, however, such that the distinction between representation and referent is critical. Intentional representation is using one thing, the representation, in the place of another thing, the referent. It implies a user that knows about and distinguishes both the representation and the referent. The relation between representation and referent can be iconic, symbolic, or both, and the relationship is asymmetrical. This asymmetry does not apply to the notion of information transmission, however, which is defined as the mutual information that is shared between input and output. Representation proper plays an important role in human cognition, but much cortical function may proceed without it, nevertheless. Phylogenetic studies of such representational abilities show them to be rare or absent outside of the primate line and to progressively emerge within it (e.g. Chavalier-Skolinkoff 1983; Byrne & Whiten 1992). In children different aspects of this skill develop during the years 1 to 6, gradually becoming more flexible, internalized, and differentiated (e.g. Piaget 1954; Wimmer & Perner 1983; De Loach 1987; Zaitchik 1990; Campbell & Olson 1990). We know of no physiological observations showing that local cortical processors treat their inputs as standing for something other than themselves. Furthermore, we know of no neural network model that puts intentional representation into the dynamics of the network, rather than just leaving it in the mind of the designer. Such models often have considerable computational power, nevertheless, even though they do not incorporate any proper intentional representations of an external world.
We conclude that representation proper is not a common feature of cortical computation, but arises late in both phylogenetic and ontogenetic development, and perhaps upon foundations such as the implicit form of realism that has been hypothesized here to result from the maximization of coherent variation by local cortical processors. If so, a better understanding of the capabilities and limitations of those foundations will help us understand how higher cognitive functions are possible, and why they are needed.
1. For further discussions of the advantages of such "constructive" effects of contextual guidance, or reentrance, upon RF selectivity see Finkel and Edelman (1989) and Tononi et al. (1992b; 1996).
2. From a statistical point of view these learning abilities combine the descriptive aims of techniques such as principal component analysis with the predictive aims of multiple regression. They are therefore related to techniques of latent structure analysis, such as canonical correlation (Hotelling 1936; Giffins 1985), which seek functions defined upon separate data-sets such that the correlation between them is as large as possible. Canonical correlation can in principle be implemented within a neural network (Kay 1992), but is concerned with linear functions of just two data- sets. The computational studies in Section 3 extend it to the multistage analysis of multiple data-sets.
3. Further examples of this approach are provided by Zemel and Hinton (1991), who show how it can discover viewpoint-invariant relationships that characterize objects, and by Becker (1993) who shows how it can learn to compute translation-invariant object categories using just the continuity of an object's identity across time.
4. One of the many examples that Gallistel cites is that migratory birds learn the position of the celestial pole while they are fledglings (Able & Bingham 1987). He argues that this requires a procedure that is specially designed to compute and store the values of variables that specify the position of the celestial pole in the pattern of dots defined by the circumpolar stars. He concludes that: "We should no more expect to find a general-purpose learning mechanism than we should expect to find a general-purpose sensory organ.", Gallistel (1995, page 1266).
5. The goals of Infomax and of Coherent Infomax differ only in the value given to ¿1. This therefore suggests an important role for this parameter. It specifies the balance between increasing information transmission within streams of processing and increasing predictability across streams. It is analogous to the parameter `eta' in the algorithm for discovering predictable classifications developed by Schmidhuber and Prelinger (1993). An advantage of specifying the relative priority to be given to these two goals by a single parameter is that it provides for a continuous transition between them. Appropriate values for this parameter might then vary with the layer of processing and stage of learning.
6. How these two conditional probabilities could be estimated in a biological system is not yet clear, so a simpler rule such as that proposed by Der and Smyth (1996) or by de Sa (1994 a,b) may be more plausible. On the other hand it can be argued that we often underestimate the computational power of individual neurons, so an approximation to the more complex rule may be feasible.7. These demonstrations show how CF input increases the probability that at any moment outputs will be produced that are mutually supportive of each other. They thus produce grouping through synchronization. These effects are compatible with temporal structure in the outputs arising from any of a variety of sources, including periodicity arising at the level of single cells, local circuits, or larger populations. Any temporal variations in the outputs will be transmitted through the CF connections in such a way as to increase the probability that coherent outputs will be active simultaneously.
8. Examples of such nets are those incorporating reentrant connections (e.g. Sporns et al. 1991; Tononi et al 1992b).
9. Tests for modulatory effects can be specified in terms of the mutual and conditional information between target, context and response (Smyth et al. 1996). These tests can be used in cases where context and target variables are correlated, and where the context either does or does not also have a direct influence on response. One distinctive sign of modulatory effects is provided by an increase in I(X;C|R) as target strength increases from low to medium values. I(X;C|R) is the information that is transmitted about the context in addition to any information that is transmitted about the target, and this would intuitively be expected to decrease with increases in target strength. To test for an increase in I(X;C|R) we estimate the probability of a correct response in a two-alternative forced choice task in the presence of either supporting or non-supporting context at two different levels of target strength. If the effect of context increases with target strength then the context is modulating transmission of information about the target.
10. Other major issues that need to be resolved in order to provide an adequate account of context effects in word perception include the following: 1. What are the effects of noise in the input and noise generated from within the system itself (Movellan & McClelland 1995). 2. How is overt responding related to the different levels of analysis, and in particular to what extent do segmental decisions (i.e. decisions concerning letters or phonemes) reflect the activity at segmental levels and to what extent at word or morphemic levels? 3. What decision processes are used to generate overt responses from the information that is made available by the speech perception system (Massaro & Friedman 1990) ? 4. What is the time course of context effects upon processing, and in particular how soon do these effects become apparent after stimulus onset?
11. Another possible way of distinguishing contextual influences is by proposing that target information sets limits upon discriminability whereas context just adds bias. This is not identical to our emphasis upon contextual interactions that do not corrupt the meaning of the signals that interact because information transmission depends upon both sensitivity and bias, but it is similar, and much of the evidence from studies of word perception can be seen as supporting it (e.g. Morton 1969; Massaro & Cohen 1983; Massaro 1989a; Krueger & Shapiro 1979). The evidence is not wholly unambiguous on this issue, however, because accuracy in forced-choice discrimination between letters depends upon context (e.g. Reicher 1969; Johnston & McClelland 1973), and because target discriminability, as measured by d«, is also affected by context in some other paradigms (e.g. Phillips 1971; Samuel 1981, 1996). Furthermore, it is difficult to draw strong inferences about internal processing from such phenomena alone because whether they are compatible with certain aspects of a model, e.g. the architecture, depends upon other aspects, e.g. the role of noise (Movellan & McClelland 1995).
12. Patient AN had a stroke in 1979 when she was 47, and she has been studied at Stirling since 1985 (Goodall & Phillips 1994, Phillips & Goodall 1994). Patient AM is her nephew, and he had a stroke in 1990 at the age of 35, leaving him with very similar deficits (Goodall 1994). The mental status of both patients is good and they are cooperative and insightful during testing. In one study (Goodall, 1994) five 4-letter nonwords were created and printed on a sheet that was placed in front of the patient, who had to say whether each of a number of separately presented test items was on the sheet. To enforce accurate visual discrimination the test items included nonwords differing from the training items by just one letter. This training proceeded until performance was accurate, which took about ten minutes. At no time did either the patient or the experimenter speak or write the nonwords, and indeed the patients could not accurately read them aloud, either before or after training. After a 5 minute break the 5 familiar nonwords and 5 unfamiliar nonwords were read for writing to dictation. The familiar nonwords were all written correctly and fluently, but none of the unfamiliar nonwords were written either correctly or fluently. The familiar non-words were read to 42 normal subjects for writing to dictation, but without any prior presentation of their written form. The spellings produced by these subjects were quite different from those given by our two patients, because unlikely but plausible spellings were chosen when selecting the stimuli to be used with the patients. This confirms that the patients were using the knowledge obtained from the visual discrimination training when writing the familiar nonwords to dictation.
13. It may be possible to preserve the view that the normal processes of word perception use local codes for words by proposing that these results reflect the use of voluntary visual imagery or episodic memory strategies. Several aspects of the results greatly weaken such interpretations, however. First, neither patient ever claimed to be visualizing. Retrieval of the learned information was fast, fluent, and automatic in the sense that the patients showed no sign of cognitive mediation at retrieval. Indeed, both were surprised at their ability to write the non-words, and on one occasion AM, quite unprompted, said "Now how did I do that?". Second, the information transmitted from the visual input to the written output was not at the level of a visual image because the patients transcribed typeface into their own cursive script. Third, knowledge of the new items was acquired gradually and retained over periods of months. This is not characteristic of episodic memory functions. Fourth, writing to dictation is a typical implicit memory task, and was used as such here. No reference was made to previous presentations. The patients were simply asked to write down what was said to them. This is exactly the kind of task that relies on procedural skills and which consequently is unimpaired in patients with deficiencies in episodic memory (McCarthy and Warrington 1990; Shallice 1988). Fifth, familiar English words occurred as error responses to unfamiliar nonwords, but the familiar nonwords did not. This shows that nonwords were not being processed in a way that kept them separate from knowledge of familiar English words.
14. This difficulty is made clear by studies comparing patient AN's ability to read and write familiar items with her ability to read and write closely related but unfamiliar items. She can read familiar words such as NAPKIN and MUSKET but not unfamiliar nonwords such as NAPKET and MUSKIN made by recombining their component syllables, and this is unlikely to be due to the absence of semantic mediation because she can learn to read meaningless nonwords with which she is familiarized (Goodall & Phillips 1994). She can also learn to copy meaningless nonwords such as BONSED and MUNIZE, but this ability does not transfer to the copying of BONIZE and MUNSED (Phillips & Goodall 1994). If both kinds of item are processed as distributed codes that are treated as a single group, whether familiar or not, then analysis and simulation both suggest that there will be good transfer to novel items that share components in common with familiar items. This applies to both feedforward networks (e.g. Baldi & Hornik 1989; Brousse & Smolensky 1989; Phillips, Hay & Smith 1993) and to recurrent architectures with attractors (e.g. Plaut & McClelland 1993). Thus these studies of AN suggest that the letter strings were processed as distributed codes to which internal processes were applied to form familiar or coherent groupings. Computational studies of coding through synchronization in the dynamic link architecture demonstrate the possibility of mapping structured descriptions as familiar wholes while specifically avoiding false combinations of their component sub-sets (e.g. von der Malsburg 1988; Lades et al. 1993), but whether this approach can be developed to account for word perception in detail remains to be seen. An alternative, and simpler, possibility is that familiar items are signaled by distributed codes of the kind used in most connectionist theories but with the addition of internal grouping processes that segregate unfamiliar combinations into distinct subsets that are processed separately.
15. Becker (1996) applied the Imax algorithm to the task of learning to categorize phonemes by presenting each of two streams with different versions of the same vowel. We assume that this design could easily be modified to explore the use of sequential dependencies between successive acoustic inputs of different phonemes to learn how to categorize them. Guiding learning within streams by coherence across streams does not require either that the coherence should be perfect or that the different streams are processing different instances of the same thing. 16. Paper under revision: de Sa, V. R. and Ballard, D. H., Category learning through multi-modality sensing.
This work is supported by a Network grant from the Human Capital and Mobility Program of the European Community. Special thanks to Jim Kay, Dario Floreano, and Darragh Smyth for the computational studies outlined in Section 3. We thank our many colleagues at Stirling and Frankfurt for their crucial contributions to this work. We also thank Moshe Abeles, Elie Bienenstock, Ralf Der, John Hertz, Geoffrey Hinton, Nathan Intrator, Michael Jordan, Ralph Linsker, Tim Shallice, Olaf Sporns, Tom Troscianko, John Taylor, Christoph von der Malsburg, and Rolf Wurtz for valuable discussions of these issues. We also thank Sue Becker and two anonymous referees for their helpful and insightful suggestions.
Abeles, M. (1982) Local cortical circuits: Studies of brain function, vol 6. Springer.
Abeles, M. (1991) Corticonics. Cambridge University Press.
Abeles, M., Vaadia, E., Bergman, H., Prut, Y., Haalman, I., & Slovin, H. (1993a) Dynamics of neuronal interactions in the frontal cortex of behaving monkeys. Concepts in Neuroscience 4:131-58.
Abeles, M., Bergman, H., Margalit, E., & Vaadia, E. (1993b) Spatiotemporal firing patterns in the frontal cortex of behaving monkeys. Journal of Neurophysiology 70:1629-38.
Able, K. P. & Bingham, V. P. (1987) The development of orientation and navigation behavior in birds. Quarterly Review of Biology 62:1-29.
Allman, J., Miezen, F. & McGuinness, E. (1985) Stimulus specific responses from beyond the classical receptive field: neurophysiological mechanisms for local-global comparisons in visual neurons. Annual Review of Neuoscience 8:407-30.
Amit, D. J. (1989) Modeling Brain Function. Cambridge University Press.
Armstrong-James, M., Welker, E. & Callahan, C. A (1993) The contribution of NMDA and non-NMDA receptors to fast and slow transmission of sensory information in the rat SI barrel cortex. Journal of Neuroscience 16:480-87.
Artola, A., Brocher, S. & Singer, W. (1990) Different voltage-dependent thresholds for the induction of long-term depression and long-term potentiation in slices of the rat visual cortex. Nature 347:69-72.
Artola, A. & Singer, W. (1993) Long-term depression of of excitatory synaptic transmission and its relationship to long-term potentiation. Trends in the Neurosciences 15:218-26.
Ascher, P., Bregestrovski, P. & Nowak, L. (1988) N-methyl-D-aspartate activated channels of mouse central neurones in magnesium-free solutions. Journal of Physiology 339:207-26.
Atick, J. J. (1992) Could information theory provide an ecological theory of sensory procesing? Network 3:213-51.
Atick, J. J. & Redlich, A. N. (1990) Predicting ganglion and simple cell receptive field organizations. International Journal of Neural Systems 1:305.
Atick, J. J. & Redlich, A. N. (1993) Convergent algorithm for sensory receptive field development. Neural Computation 5: 45-60.
Attneave, F. (1954) Informational aspects of visual perception. Psychological Review 61:183-93.
Baddeley, R. J. & Hancock. P. J. B. (1991) A statistical analysis of natural images matches psychophysically derived orientation tuning curves. Proceedings of the Royal Society of London B 246:219-23.
Bair, W., Koch, C, Newsome, W., Britten, K. & Niebur, E. (1994) Power spectrum analysis of bursting cells in area MT in the behaving monkey. Journal of Neuroscience 14:870-92.
Baldi, B. & Hornik, K. (1989) Neural networks and principal component analysis: learning from examples without local minima. Neural Networks 2:53-58.
Barlow, H. B. (1959) Sensory mechanisms, the reduction of redundancy, and intelligence. In The Mechanisation of Thought Processes, Her Majesty's Stationary Office.
Barlow, H. B. (1961) Possible principles underlying the transformations of sensory messages. In Sensory Communication, ed. W. A. Rosenblith. MIT Press.
Barlow, H. B. (1972) Single units and sensation: a neuron doctrine for perceptual psychology? Perception 1:371-94.
Barlow, H. B. (1989) Unsupervised learning. Neural Computation 1:295-311.
Barlow, H. B. (1993) The biological role of neocortex. In Information Processing in the Cortex, ed. A. Aertsen & V. Braitenberg. Springer-Verlag.
Barlow, H. B.& Foldiak, P. (1989) Adaptation and decorrelation in the cortex. In The Computing Neuron, eds. R. Durbin & C. Miall. Addison-Wesley.
Becker, S. (1993) Learning to categorize objects using temporal coherence. In Advances in Neural Information Processing Systems 5, 361-68, Morgan Kaufmann.
Becker, S. & Hinton G. E. (1992). Self-organizing neural network that discovers surfaces in random-dot stereograms. Nature 355: 161-63.
Becker (1996) Mutual information maximization: models of cortical self-organization. Network 7:7-31.
Bentin, S. (1992) Phonological awareness, reading, and reading acquisition: A survey and appraisal of current research. In Orthography, Phonology, Morphology, and Meaning, ed. R. Frost & L. Katz. Elsevier.
Bentin, S., Hammer, R. & Cahan, S. (1991) The effects of aging and first grade schooling on the development of phonological awareness. Psychological Science 2:271-74.
Bernander, O., Douglas, R. J., Martin, K. A. C. & Koch, C. (1991) Synaptic background activity influences spatiotemporal integration in single pyramidal cells. Proceedings of the National Academy of Sciences USA 88:11569-73.
Biederman, I. (1972) Perceiving real-world scenes. Science 177: 77-80.
Bienenstock, E. (1995) A model of neocortex. Network 6:179-224.
Bienenstock, E. L., Cooper L. N., & Munro P. W. (1982). Theory for the development of neuron selectivity: Orientation specificity and binocular interaction in visual cortex. Journal of Neuroscience 2: 32- 48.
Blakemore, C. & Tobin, E. A. (1972) Lateral inhibition between orientation detectors in the cat's vsual cortex. Experimental Brain Research 15: 439-40.
Braitenberg, V. (1978) Cortical architectonics: General and areal. In Architectonics of the Cerebral Cortex, ed. M. A. B. Brazier & H. Petsch, Raven Press.
Braitenberg, V. & Schuz, A. (1991) Anatomy of the Cortex. Springer- Verlag.
Brousse, O. & Smolensky, P. (1989) Virtual memories and massive generalization in connectionist combinatorial learning. In Proceedings of the 11th Annual Conference of the Cognitive Science Society. Erlbaum.
Bruce, V. (1988) Recognising Faces. Erlbaum.
Bullier, J., Munk, M. H. J. & Nowak, L. G. (1992) Synchronization of neuronal firing in areas V1 and V2 of the monkey. Society of Neuorsciences Abstracts 18: 11.7.
Byrne, R. W. & Whiten, A. (1992) Cognitive evolution in primates: evidence from tactical deception. Man 27: 609-27.
Campbell, R. N. & Olson, D. R. (1990) Children's thinking. In Understanding Children: Essays in honour of Margaret Donaldson, ed. R. Grieve & M. Hughes. Blackwell.
Carpenter, G. A. (1989) Neural network models for pattern recognition and associative memory. Neural Networks 2:243-257.
Cattell, J. M. (1886) The time taken up by the cerebral operations. Mind 11: 377-92.
Clark, A & Thornton, C. (1996) Trading spaces: Computation, representation and the limits of uninformed learning. Behavioral and Brain Sciences, in press.
Clothiaux, E. E., Cooper, L. N. & Bear, M. F. (1991) Synaptic plasticity in visual cortex: Comparison of theory with experiment. Journal of Neurophysiology 66: 1785-804.
Chevalier-Skolinkoff, S. (1983) Sensori-motor development in orang- utans and other primates. Journal of Human Evolution 12: 545-46.
Crick, F. (1988) What Mad Pursuit. Penguin.
Crick, F. & Koch, C. (1990) Towards a neurobiological theory of consciousness. Seminars in the Neurosciences 2: 263-75.
Damasio, A. R. (1989) The brain binds entities and events by multiregional activity from convergence zones. Neural Computation 1: 123-32.
Das, A. & Gilbert, C. D. (1995) Long-range horizontal connections and their role in cortical reorganization revealed by optical recording of cat primary visual cortex. Nature 375: 780-84.
De Loach, J. (1987) Rapid change in the symbolic functioning of very young children. Science 238: 1556-57.
Der, R. & Smyth, D. (1996) Local online learning of coherent information. Neural Networks. In press.
De Sa, V. (1994a) Unsupervised classification learning from cross- modal environmental structure. PhD Thesis, University of Rochester, N.Y.
De Sa, V. (1994b) Learning classification with unlabled data. In Advances in Neural Information Processing Systems 6, Morgan Kaufmann.
Douglas, R. J. & Martin, K. A. C. (1990) Neocortex. In The Synaptic Organization of the Brain, ed. G. M. Shepherd, Oxford University Press.
Douglas, R. J. & Martin, K. A. C. (1991) Opening the grey box. Trends in the Neurosciences 14:286-93.
Dudek, S. M. & Bear, M. F. (1992) Homosynaptic long-term depression in area CA1 of hippocampus and the effects on NMDA receptor blockade. Proceedings of the National Academy of Sciences, U.S.A. 89:4363-67.
Durgin, F. H. (1995) Contingent aftereffects of texture density: Perceptual learning and contingency. PhD Thesis, Department of Psychology, University of Virginia.
Eckhorn, R., Dicke, P., Arndt, M., & Reirboeck, H. (1991a) Flexible linking of visual features by stimulus-related synchronizations of model neurons. In Induced Rhythms in the Brain, ed. E. Basar & T. H. Bullock. Birkhauser.
Eckhorn, R., Schanze, T. Brosch, M., Salem, W. & Bauer, R (1991b) Stimulus-specific synchronizations in cat visual cortex: Multiple microelectrode and correlation studies from several cortical areas. In Induced Rhythms in the Brain, ed. E. Basar & T. H. Bullock. Birkhauser.
Edelman, G. M. (1978) Group selection and phasic re-entrant signalling: a theory of higher brain function. In The Mindful Brain, ed. G. M. Edelman & V. B. Mountcastle. MIT Press.
Edelman, G. M. (1989) The Remembered Present: A Biological Theory of Consciousness. Basic.
Edelman, G. M. & Mountcastle, V. B. (1978) The Mindful Brain. MIT Press.
Ellis, A. W., & Young, A. W. (1988) Human Cognitive Neuropsychology. Erlbaum.
Engel, A. K., Konig, P, & Singer, W. (1991) Direct physiological evidence for scene segmentation by temporal coding. Proceedings of the National Academy of Sciences, U.S.A. 88: 9136-40.
Engel, A. K., Kreiter, A. K., Konig, P, & Singer, W. (1991a) Synchronization of oscillatory neuronal response between striate and extrastriate visual cortical areas of the cat. Proceedings of the National Academy of Sciences, U.S.A. 88: 6048-52.
Engel, A. K., Konig, P, Kreiter, A. K., Schillen, T. B. & Singer, W. (1992) Temporal coding in the visual cortex: new vistas on integration in the nervous system. Trends in the Neurosciences 15: 218-26.
Engel, A. K., Konig, P, Kreiter, A. K., & Singer, W. (1991b) Interhemispheric synchronization of oscillatory neuronal responses in cat visual cortex. Science 252: 1177-79.
Felleman, D. J. & Van Essen, D. C. (1991) Distributed hierachical processing in the primate cerebral cortex. Cerebral Cortex 1: 1-47.
Field, D. J., Hayes, A. & Hess, R. F. (1993) Contour integration by the human visual system: Evidence for a local "association" field. Vision Research 33: 173-79.
Finkel, L. H. & Edelman, G. M. (1989) The integration of distributed cortical systems by reentry: a computer simulation of interactive functionally segregated visual areas. Journal of Neuroscience 9:3188-208.
Fishman, M. C. & Michael, C. R. (1973) Integration of auditory information in the cat's visual cortex. Vision Research 13: 1415-19.
Floreano, D, Phillips, W. A. & Kay J. (1995) A computational theory of learning visual features via contextual guidance. Perception, 24S, 22.
Foldiak, P. (1990) Forming sparse representations by local anti- Hebbian learning. Biological Cybernetics 64: 165-70.
Fodor, J. A. & Pylyshyn, Z W. (1988) Connectionism and cognitive architecture: A critical analysis. Cognition 28: 3-71.
Fox, K. & Daw, N. (1992) A model for the action of NMDA conductances in the visual cortex. Neural Computation 4:59-83.
Fox, K., Sato H. & Daw N. (1990) The effect of varying stimulus intensity on NMDA-receptor activity in cat visual cortex. Journal of Neurophysiology 64: 1413-28.
Gallistel, C. R. (1995) The replacement of general-purpose theories with adaptive specializations. In The Cognitive Neurosciences, ed. M. S. Gazzaniga. MIT Press.
Georgopoulos, A. P. (1990) Neural coding of the direction of reaching and a comparison with saccadic eye-movements. Cold Spring Harbour Symposium on Quantitative Biology 55:849-59.
Gilbert, C. D. (1995) Dynamic properties of adult visual cortex. In The Cognitive Neurosciences, ed. M. S. Gazzaniga. MIT Press.
Gilbert, C. D., & Wiesel, T. N. (1983) Clustered intrinsic connections in cat visual cortex. Journal of Neuroscience 3: 1116-33.
Gilbert, C. D. (1992) Horizontal integration and cortical dynamics. Neuron 9: 1-13.
Gilbert, C. D., & Wiesel, T. N. (1989) Columnar specificity of intrinsic horizontal and cortico-cortical connections in cat visual cortex. Journal of Neuroscience 9: 2432-42.
Gilbert, C. D., & Wiesel, T. N. (1990) The influence of contextual stimuli on the orientation selectivity of cells in primary visual cortex of the cat. Vision Research 11:1689-701.
Giffins, R. (1985) Canonical Analysis: A Review with Applications in Ecology Biomathematics 12. Springer-Verlag.
Gluck, M. A., & Rumelhart, D. E. (1990) Neuroscience and Connectionist Theory. Erlbaum.
Goebel, R. (1993) Perceiving complex visual scenes: An oscillator neural network model that integrates selective attention, perceptual organisation, and invariant recognition. In Advances in Neural Information Processing Systems 5. ed. S. J. Hanson, J. D. Cowan & C. L. Giles. Morgan Kaufmann.
Goodall, W. C. (1994) Neuropsychological studies of Reading and Writing. PhD Thesis, University of Stirling, Scotland, UK.
Goodall, W. C. &Phillips, W. A. (1994) Three routes from print to sound: Evidence from a case of acquired dyslexia. Cognitive Neuropsychology, 12, 113-47.
Gray, C. M., Konig, P., Engel, A. K. & Singer, W. (1989) Oscilliatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties. Nature 338:334-37.
Gray, C. M. & Singer, W. (1989) Stimulus-specific neuronal oscillations in orientation columns of cat visual cortex. Proceedings of the National Academy of Sciences USA 86:1698-702.
Gray, C. M. & Viana Di Prisco, G. (1993) Properties of stimulus- dependent rhythmic activity of visual cortical neurons in the alert cat. Society of Neuroscience Abstracts 19: 359.8.
Grossberg, S. (1993) Self-organizing neural models of categorization, inference and synchrony. Behavioral and Brain Sciences 16: 460-61.
Grossberg, S. & Somers, D. (1991) Synchronized oscillations during cooperative feature linking in a cortical model of visual perception. Neural Networks 4: 453-66.
Gur, M. & Akri, V. (1992) Isoluminant stimuli may not expose the full contribution of color to visual functioning: spatial contrast sensitivity measurements indicate interaction betwen color and luminance processing. Vision Research 32: 1253-62.
Hamming, R. W. (1980) Coding and Information Theory. Prentice-Hall.
Hancock, P. J. B., Smith, L. S. & Phillips W. A. (1991a). A biologically supported error-correcting learning rule. Neural Computation 3: 201-12.
Hancock, P. J. B., Smith, L. S. & Phillips W. A. (1991b). A biologically supported error-correcting learning rule. In Procedings of the International Conference on Artificial Neural Networks, ed O. Simula. Elsevier.
Hebb, D. O. (1949) The Organization of Behaviour. Wiley.
Hinton, G. E., McClelland, J. L., & Rumelhart, D. E. (1986) Distributed representations. In Parallel Distributed Processing. Explorations in the Microstucture of Cognition. Volume 1, ed. D. E. Rumelhart., & J. L., McClelland, MIT Press.
Hirsch, J. A. & Gilbert C. D. (1991) Synaptic physiology of horizontal connections in cat's visual cortex. Journal of Neuroscience 11: 1800-09.
Hirsch, J. A. & Gilbert C. D. (1993) Long-term changes in synaptic strength along specific intrinsic pathways in the cat visual cortex. Journal of Physiology 461: 247-62.
Hopfield, J. J. (1982) Neural networks and physical systems with emergent collective computational capabilities. Proceedings of the National Academy of Sciences USA 79:2554-58.
Horgan, J. (1995) From complexity to perplexity. Scientific American June:74-79.
Hotelling, H. (1936) Relations between two sets of variables. Biometrika 28: 321-77.
Huang, Y-Y., Colino, A., Selig, D. K. & Malenka, R. C. (1992) The influence of prior synaptic activity on the induction of long-term potentiation. Science 255: 730-33.
Hummel, J. E. & Biederman, I. (1992) Dynamic binding in a neural network for shape recognition. Psychological Review 99:480-17.
Hummel, J. E. & Holyoak, K. J. (1993) Distributing structure over time. Behavioral and Brain Sciences 16: 464.
Humphreys, G. W. & Riddoch, M. J. (1987) To see but not to see: A case study of visual agnosia. Erlbaum.
Humphreys, G. W., Troscianko, T., Riddoch, M. J., Boucart, M. Donnely, N., & Harding, G. F. A. (1992) Covert processing in different visual recognition systems. In The Neuropsychology of Consciousness, ed. A. D. Milner & M. D. Rugg. Academic Press.
Intrator, N. & Cooper, L. N. (1995a) Information theory and visual plasticity. In The Handbook of Brain Theory and Neural Networks, ed. M. A. Arbib. MIT Press.
Intrator, N. & Cooper, L. N. (1995b) BCM theory of visual cortical plasticity. In The Handbook of Brain Theory and Neural Networks, ed. M. A. Arbib. MIT Press.
Ishai, A & Sagi, D. (1995) Common mechanisms of visual perception and imagery. Science 268:1772-74.
Jerison, H. J. (1973) Evolution of the Brain and Intelligence. Academic Press.
Johnston, J. C. & McClelland, J. L. (1973) Visual factors in word perception. Perception and Psychophysics 14: 365-70.
Kaas, J. H. (1995) The reorganization of sensory and motor maps in adult mammals. In The Cognitive Neurosciences, ed. M. S. Gazzaniga. MIT Press.
Kanevsky, D. (1989) A multiple source, or, is a striped apple more than a striped orange? Behavioral and Brain Sciences, 12: 767-69.
Karni, A. & Sagi, D. (1991) Where practice makes perfect in texture discrimination: Evidence for primary visual cortex plasticity. Proceedings of the National Academy of Sciences, U.S.A. 88: 4966-70.
Kapadia, M. K., Ito, M., Gilbert, C. D. & Westheimer, G. (1995) Improvement in visual sensitivity by changes in local context: Parallel studies in human observers and in V1 of alert monkeys. Neuron 15: 843-56.
Kay, J. (1992) Feature discovery under contextual supervision using mutual information. International Joint Conference on Neural Networks. Baltimore.
Kay, J., Floreano, D. & Phillips, W. A. (1996) Contextually guided unsupervised learning using local multivariate binary processors. Technical Report 96-8, Department of Statistics, University of Glasgow (& submitted to Neural Networks).
Kay, J. & Phillips, W. A. (1994) Activation functions, computational goals and learning rules for local processors with contextual guidance. Technical Report CCCN-15, University of Stirling, Centre for Cognitive and Computational Neuroscience.
Kay, J. & Phillips, W. A. (1996) Activation functions, computational goals and learning rules for local processors with contextual guidance. Neural Computation. In press.
Kirkwood, A., Rioult, M. G. & Bear, M. F. (1996) Experience-dependent modification of synaptic plasticity in visual cortex. Nature 381: 526- 28.
Kisvardy, Z. F., Martin, K. A. C., Freund, T. F., Magloczky, Z., Whitteridge, D., & Somogyi, P. (1986) Synaptic targets of HRP-filled layer III Pyramidal cells in the cat striate cortex. Experimental Brain Research 64: 541-52.
Knierem, J. J. & Van Essen, D. C. (1992) Neuronal responses to static texture patterns in area V1 of the alert macaque monkey. Journal of Neurophysiology 67: 961-80.
Konig, P., Engel, A. K., Lowel, S. & Singer, W. (1993) Squint affects synchronization of oscillatory response in cat visual cortex. European Journal of Neuroscience 5 501-08.
Konig, P., Engel, A. K. & Singer, W. (1995) Relation between oscillatory activity and long-range synchronization in cat visual cortex. Proceedings of the National Academy of Sciences U.S.A., 92: 290-94.
Kreiter, A. K. & Singer W. (1992) Oscillatory neuronal responses in the visual cortex of the awake macaque monkey. European Journal of Neuroscience 4:369-75.
Kreiter, A. K. & Singer W. (1994) Global stimulus arrangement determines synchronization of neuronal activity in the awake macaque monkey. Supplement European Journal of Neuroscience 7:153.
Kreiter, A. K. & Singer W. (1996) Stimulus-dependent synchronization of neuronal responses in the visual cortex of the awake macaque monkey. Journal of Neuroscience 16:2381-96.
Krose, B. J. A. & Julesz, B. (1989) The control and speed of shifts of attention. Vision Research 29: 1607-09.
Krueger, L. E. & Shapiro, R. G. (1979) Letter detection with rapid serial presentation: Evidence against word superiority at feature extraction. Journal of Experimental Psychology: Human Perception and Performance 5: 657-73.
Lades, M., Vorbrugen, J. C., Buhmann, J., Lange, J., von der Malsburg, C., Wurtz, R. P. & Konen, W. (1993) Distortion invariant recognition in the dynamic link architecture. IEEE Transactions on Computers 42: 300-311.
Leonards, U., Singer, W. & Fahle, M. (1996) The influence of temporal phase differences on texture segmentation. Vision Research. In press.
Li, Z. & Atick, J. J. (1994) Efficient stereo coding in the multiscale representation. Network 5: 157-74.
Linsker, R. (1988). Self-organization in a perceptual network. Computer 21: 105-17.
Lorento de No, R. (1949) Cerebral cortex: Architecture, intracortical conections, motor projections. In Physiology of the Nervous System, ed. J. F. Fulton. Oxford University Press.
Lowel, S., & Singer, W. (1992) Selection of intrinsic horizontal connections in the visual cortex by correlated neuronal activity. Science 255: 209-12.
Macphail, E. M. (1987) The comparative psychology of intelligence. Behavioral and Brain Sciences 10: 645-95.
Marr, D. (1982) Vision. Freeman.
Martin, K. A. C. (1988) The Wellcome Prize lecture: From from single cells to simple circuits in the cerbral cortex. Quarterly Journal of Experimental Physiology 73: 637-02.
Massaro, D. W. (1979) Letter information and orthographic context in word perception. Journal of Experimental Psychology: Human Perception and Performance 5: 595-09.
Massaro, D. W. (1989a) Testing between the TRACE model and the fuzzy logic model of speech perception. Cognitive Psychology 21: 398-21.
Massaro, D. W. (1989b) Multiple book review of Speech perception by ear and eye: A paradigm for psychological inquiry. Behavioral and Brain Sciences 12:741-94.
Massaro, D. W. & Cohen, M. M. (1983) Phonological constraints in speech perception. Perception and Psychophysics 34: 338-48.
Massaro, D. W. & Cohen, M. M. (1991) Integration versus interactive activation: The joint influence of stimulus and context in perception. Cognitive Psychology 23: 558-14.
Massaro, D. W. & Friedman, D. (1990) Models of integration given multiple sources of information. Psychological Review 97: 225-52.
Mayford, M., Wang, J. Kandel, E. R. & O'Dell, T. J. (1995) CaMKII regulates the frequency response function of hippocampal synapses for the prodauction of both LTD and LTP. Cell, in press.
McCarthy, R. A., & Warrington, E. K. (1990) Cognitive Neuropsychology. A Clinical Introduction. Academic Press.
McGuire, B. A., Gilbert, C. D., Rivlin, P. K. & Wiesel, T. N. (1991) Targets of horizontal connections in macaque primary visual cortex. Journal of Comparative Neurology 305:370-92.
McClelland, J. L. (1978) Perception and masking of wholes and parts. Journal of Experimental psychology: Human Perception and Performance 4: 210-23.
McClelland J. L. (1991) Stochastic interactive processes and the effect of context on perception. Cognitive Psychology 23: 1-44.
McClelland, J. L. & Elman J. L. (1986) The TRACE model of speech perception. Cognitive Psychology 18: 1-86.
McClelland, J. L., McNaughton, B. L., & O'Reilly, R. C. (1995) Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review 102:419-507.
McClelland, J. L. & Rumelhart, D. E. (1981) An interactive activation model of context effects in letter perception, Part I: An account of basic findings. Psychological Review 88: 375-07.
McClelland, J. L., Rumelhart, D. E., & Hinton, G. E. (1986) The appeal of parallel distributed processing. In Parallel Distributed Processing. Explorations in the Microstucture of Cognition. Volume 1, ed. D. E. Rumelhart., & J. L., McClelland, MIT Press.
Michalski, A., Gerstein, G. L., Czarkowska, J. & Tarnecki, R. (1983) Interactions between cat striate cortex neurons. Experimental Brain Research 51: 97-107.
Milner, P. M. (1974) A model for visual shape recognition. Psychological Review 81:521-35.
Morton, J. (1969) Interaction of information in word recognition. Psychological Review 76: 165-78.
Movellan, J. R. & McClelland, J. L. (1995) Stochastic interactive processing, channel separability, and optimal perceptual inference: An examination of Morton's law. Technical Report PDP.CNS.95.4, Department of Psychology, Caregie Mellon University, Pittsburg, PA.
Mozer, M. C., Zemel, R. S., Behrmann, M. & Williams, C. K. I. (1992) Learning to segment images using dynamic feature binding. Neural Computation 4: 650-65.
Mumford, D. (1992) On the computational architecture of the neocortex. II The role of cortico-cortical loops. Biological Cybernetics 66: 241-51.
Munk, M. H. J., Nowak, L. G., Chouvet, G., Nelson, J. I. & Bullier, J. (1992) The structural basis of cortical synchronization. European Journal of Neuroscience Supplement 5: 21.
Nakayama, K. & Mackeben, M. (1989) Sustained and transient components of focal visual attention. Vision Research 31:1221-36.
Nelson, J. I. (1995) Binding in the visual system. In The Handbook of Brain Theory and Neural Networks, ed. M. A. Arbib. MIT Press.
Nelson, J. I. & Frost, B. J. (1978) Orientation-selective inhibition from beyond the clasic visual receptive field. Brain Research 139: 359-65.
Neuenschwander, S., Engel, A. K., Konig, P., Singer, W. & Varela, F. J. (1996) Long-range synchronization of oscillatory light responses in the cat retina and lateral geniculate nucleus. Nature 379:728-733.
Neven, H. & Aertsen, A. Rate coherence and event coherence in the visual cortex: a neuronal model of object recognition. Biological Cybernetics 67: 309-22.
Pece, A. E. C. (1992) Redundancy reduction of a Gabor representation: a posible computational role for feedback from primary visual cortex to lateral geniculat nucleus. In Artificial Neural Networks 2, eds. I. Aleksander & J. Taylor. Elsevier.
Piaget, J (1954) The Construction of Reality by the Child. Basic Books.
Phillips, W. A. (1971) Does familiarity affect transfer from an iconic to a short-term memory? Perception and Psychophysics 10: 153-57.
Phillips, W. A. (1996) Theories of cortical computation. In Cognitive Neuroscience, ed M. D. Rugg. University College Press.
Phillips, W. A. & Goodall, W. C. (1994) Lexical writing can be non- semantic and fluent without practice. Cognitive Neuropsychology, 12, 149-74.
Phillips, W. A., Hay, I. M., & Smith, L. S. (1993) Lexicality and pronunciation in a simulated neural net. British Journal of Mathematical and Statistical Psychology 46: 193-205.
Phillips, W. A., Kay, J., & Smyth, D. (1995a) How local cortical processors that maximise coherent variation could lay foundations for representation proper. In Neural Computation and Psychology, ed. L. S. Smith, P. J. B. Hancock, Springer Verlag.
Phillips, W. A., Kay, J., & Smyth, D. (1995b) The discovery of structure by multi-stream networks of local processors with contextual guidance. Network 6: 225-46.
Phillips, W. A. & Singer, W. (1974) Function and interaction of on and off transients in vision. I: Psychophysics. Experimental Brain Research 19: 493-506.
Plaut, D. C. & McClelland, J. L. (1993) Generalization with componential attractors: Word and nonword reading in an attractor network. In Proceedings of the 15th Annual Conference of the Cognitive Science Society. Erlbaum.
Poggio, T., Fahle, M. & Edelman S. (1992) Fast perceptual learning in visual hyeracuity. Science 256: 1018-20.
Polat, U & Sagi, D. (1993) Lateral interactions between spatial channels: Suppression and faciltation revealed by lateral masking experiments. Vision Research 33:993-99.
Polat, U & Sagi, D. (1994a) The architecture of perceptual spatial interactions. Vision Research 34: 73-78.
Polat, U & Sagi, D. (1994b) Spatial interactions in human vision: From near to far via experience-dependent cascades of connections. Proceedings of the National Academy of Sciences U.S.A., 91:1206-09.
Posner, M. I. & Rothbart, M. K. (1994) Constructing neuronal theories of mind. In Large-scale Neuronal Theories of Brain Function, ed. C Kock & J. L. Davis. MIT Press.
Purves, D., Riddle, D. R. & LaMantia, A-S. (1992) Iterated patterns of brain circuitry (or how the cortex gets its spots). Trends in the Neurosciences 15: 362-68.
Rakic, P. & Singer, W. (1988) Neurobiology of Neocortex. Wiley.
Redlich, A. N. (1993) Redundancy reduction as a strategy for unsupervised learning. Neural Computation 5: 289-04.
Reeke, G. Jr., Finkel, L. H., Sporns, O. & Edelman, G. M. (1990) Synthetic neural modeling: a multilevel approach to the analysis of brain complexity. In Signal and Sense: local and global order in perceptual maps. eds. G. M. Edelman, W. E. Gall & W. M. Cowan. Wiley.
Reicher, G. M. (1969) Perceptual recognition as a function of meaningfulness of stimulus material. Journal of Experimental Psychology 81: 274-80.
Richman, H. B. & Simon, H. A. (1989) Context effects in letter perception: Comparison of two theories. Psychological Review 96: 417-32.
Rockel, A. J., Hiorns, R. W. & Powell, T. P. S. (1980) The basic uniformity in structure of the neocortex. Brain 103: 221-44.
Rockland, K. & Lund, J. S. (1983) Intrinsic laminar lattice connections in primate visual cortex. Journal of Comparative Neurology 216: 303-18.
Roelfsma, P. R., Konig, P., Engel, A. K., Sireteanu, R. & Singer, W. (1994a) Reduced synchronization in the visual cortex of cats with strabismic amblyopia. European Journal of Neuroscience 6:1645-55.
Roelfsma, P. R., Engel, A. K., Konig, P. & Singer, W. (1994b) Oscillations and synchrony in the visual cortex: Evidence for their functional relevance. In Oscillatory Event-related Brain Dynamics, ed. C. Pantev. Plenum.
Rumelhart, D. E. & McClelland, J. L. (1982) An interactive activation model of context effects in letter perception, Part II: The contextual enhancement effect and some tests and extensions of the model. Psychological Review 89: 60-84.
Rumelhart, D. E., & McClelland, J. L., (1986) Parallel Distributed Processing. Explorations in the Microstucture of Cognition. Volume 1. MIT Press.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986) Learning internal representations by back-propagating errors. Nature 323: 533-36.
Samuel, A. G. (1981) Phonemic restoration: Insights from a new methodology. Journal of Experimental Psychology : General 110:474-94.
Samuel, A. G. (1996) Does lexical information influence the perceptual restoration of phonemes? Journal of Experimental Psychology: General 125: 28-51.
Schlaggar, B. L., & O'Leary, D. D. M. (1991) Potential of visual cortex to develop an array of functional units unique to somatosensory cortex. Science 252:1556-60.
Schmidhuber, J. & Prelinger, D. (1993) Discovering predictable classifications. Neural Computation 5: 625-35.
Schmidt, K. E., Lowel, S., Goebel, R. & Singer, W. (1996) The perceptual grouping criterion of colinearity is reflected by anisotropies of connections in primary visual cortex. European Journal of Neuroscience. In press.
Schwartz, C. & Bolz, J. (1991) Functional specificity of the long-range horizontal connections in cat visual cortex: a cross-correlation study. Journal of Neuroscience 11: 2995-3007.
Sejnowski, T. J., Koch, C. & Churchland, P. S. (1988) Computational neuroscience. Science 241: 1299-306.
Shallice, T. (1988) From Neuropsychology to Mental Structure. Cambridge University Press.
Shallice, T. (1991) Precis of From Neuropsychology to Mental Structure. Behavioral and Brain Sciences 14: 429-69.
Shastri, L. & Ajjanagadde, V. (1993) From simple associations to systematic reasoning: A connectionist representation of rules, variables and dynamic bindings using temporal synchrony. Behavioral and Brain Sciences 16: 417-94.
Shepherd, G. M. ed. (1990) The Synaptic Organization of the Brain. Oxford University Press.
Shepherd, G. M. & Koch, C. (1990) Introduction to synaptic circuits. In The Synaptic Organization of the Brain, ed. G. M. Shepherd, Oxford University Press.
Schillen, T. B. & Konig, P. (1994) Binding by temporal structure in multiple feature domains of an oscillatory neuronal network. Biological Cybernetics 70: 397-405.
Sillito, A. M. Jones, H. E., Gerstein, G. L., & West, D. C. (1994) Feature- linked synchronization of thalamic relay cell firing induced by feedback from the visual cortex. Nature 369: 479-82.
Singer, W. (1987) Activity-dependent self-organization of synaptic connections as a substrate of learning. In The Neural and Molecular Bases of Learning, eds. J.-P Changeux & M. Konishi. Wiley.
Singer, W. (1990) Search for coherence: A basic principle of cortical self-organization. Concepts in Neuroscience 1: 1-26.
Singer, W. (1993) Synchronization of cortical activity and its putative role in information processing and learning. Annual Review of Physiology 55:349-74.
Singer, W. (1994) Coherence as an organizing principle of cortical functions. International Review of Neurobiology 37: 153-83.
Singer, W. (1995) Development and plasticity of cortical processing architectures. Science 270:758-64.
Singer, W. & Artola, A. (1994) Plasticity of the mature cortex. In Cellular and Molecular Mechanisms Underlying Higher Neural Functions, eds. A. I. Selverston & P Ascher. Wiley.
Singer, W. & Gray, C. M. (1995) Visual feature integration and the temporal correlation hypothesis. Annual Review of Neuroscience 18: 555-86.
Singer, W. & Phillips, W. A. (1974) Function and interaction of on and off transients in vision. II: Neurophysiology. Experimental Brain Research 19: 507-21.
Smyth, D. (1994) Simulations of networks of local processors with contextual guidance. MSc Thesis, CCCN, University of Stirling.
Smyth, D. & Der, R (1995) Learning to bind. Poster presented at the symposium on Phenomena and Architectures of Cognitive Dynamics, University of Leipzig. June.
Smyth, D., Kay, J., & Phillips, W. A. (1994) Discovery of high-order functions in multi-stream, multi-stage nets without external supervision. Paper presented to the Neural Computation and Psychology Workshop, CCCN, University of Stirling.
Smyth, D., Phillips, W. A. & Kay, J. (1996) Measures for investigating the contextual modulation of information transmission. Network 7: 307-16.
Spinelli, D. N., Starr, A. & Barrett, T. W. (1968) Auditory specificity in unit recordings from cat's visual cortex. Experimental Neurology 22: 75-84.
Sporns, O., Gally, J. A., Reeke, G. N. Jr. & Edelman, G. M. (1989) Reentrant signaling among simulated neuronal groups leads to coherency in their oscillatory activity. Proceedings of the National Academy of Sciences U.S.A. , 86: 7265-69.
Sporns, O., Tononi, G & Edelman, G. M. (1991) Modeling perceptual grouping and figure-ground segregation by means of active reentrant connections. Proceedings of the National Academy of Sciences U.S.A. , 88: 129-33.
Squire, L. R. (1992) Memory and the hippocampus: a synthesis from findings with rats, monkeys, and humans. Psychological Review 99: 195-231.
Stone, J. (1996) Learning perceptually salient visual parameters through spatio-temporal smoothness constraints. Neural Computation, in press.
Stone, J & Bray A (1995) A learning rule for extracting spatio-temporal invariances. Network 6:429-436.
Stryker, M. P. et al (1988) Group report. Principles of cortical self- organization. In Neurobiology of Neocortex. ed. P.Rakic, & W. Singer. Wiley.
Sur, M., Garraghty, P., & Roe, A. (1988) Experimentally induced visual projections into auditory thalamus and cortex. Science 242: 1437-41.
Swindale, N. V. (1990) Is the cerebral cortex modular? Trends in the Neurosciences 12: 487-92.
Thorpe, S. J., O'Regan, K. & Pouget, A. (1989) Humans fail on XOR patern classification problems. In Neural Networksfrom Models to Applications, eds. L. Personnaz & G. Dreyfus. IDSET, Paris.
Tiitinen, H., Sinkkonen, J., Reinikainen, K, Alho, K., Lavikaainen, J. & Naatanen, R. (1993) Selective attention enhances the auditory 40-Hz transient response in humans. Nature 364: 59-60.
Tononi, G, Sporns, O. & Edelman, G. M. (1992a) The problem of neuronal integration: Induced rhythms and short-term correlations. In Induced Rhythms in the Brain, ed. E. Basar & T. H. Bullock . Birkhauser.
Tononi, G, Sporns, O. & Edelman, G. M. (1992b) Reentry and the problem of integrating multiple cortical areas: simulation of dynamic integration in the visual system. Cerebral Cortex 2: 310-35.
Tononi, G, Sporns, O. & Edelman, G. M. (1994) A measure for brain complexity: Relating functional segragation and integration in the nervous system. Proceedings of the National Academy of Sciences U.S.A. , 91: 5033-37.
Tononi, G, Sporns, O. & Edelman, G. M. (1996) A complexity measure for selective matching of signals by the brain. Proceedings of the National Academy of Sciences U.S.A., 93:3422-27.
Tooby, J. & Cosmides, L. (1995) Mapping the evolved functional organization of mind and brain. In The Cognitive Neurosciences, ed. M. S. Gazzaniga. MIT Press.
Tovee, M. J. & Rolls, E. T. (1992) The functional nature of neuronal oscillations. Trends in the Neurosciences 15: 387.
Toyama, K., Kimura, M. & Tanaka, K. (1981) organization of cat visual cortex as investigated by cross-correlation techniques. Journal of Neurophysiology 46: 202-14.
Troscianko, T. (1994) Contribution of colour to the motion aftereffect and motion perception. Perception 23: 1221-31.
Troscianko, T., Davidoff, J., Humphreys, G., Landis, T., Fahle, M., Greenlee, M., Brugger, P. & Phillips, W. A. (1996) Human colour discrimination based on a non-parvocellular pathway. Current Biology 6: 200-10.
Troscianko, T., Landis, T. & Phillips, W. A. (1993). Chromatic discrimination in cerebral achromotopsia: Additional evidence favouring magno-based perception, and a neural-net model. Perception 22 supplement: 8-9.
Troscianko, T., Prince, C., Fahle, M. & Regard, M. (1995) The uses of high-temporal-frequency chromatic information in visual perception. Perception 24S: 59.
Ts'o, D., Gilbert, C. & Weisel, T. N. (1986) Relationship between horizontal interactions and functional architecture in cat striate cortex as revealed by cross-correlation ananlysis. Journal of Neuroscience 6: 1160-70.
Ullman, S. (1994) Sequence seeking and counterstreams: A model for bidirectional information flow in the cortex. In Large-scale Neuronal Theories of Brain Function, ed. C Koch & J. L. Davis. MIT Press.
Van Essen, D. C., Anderson, C. H. & Olshausen, B. A. (1994) Dynamic routing strategies in sensory, motor and cognitive processing. In Large-scale Neuronal Theories of Brain Function, ed. C Koch & J. L. Davis. MIT Press.
von der Malsburg, C. (1981) The correlation theory of brain function. Internal report 81-2. Department of Neuorbiology, Max-Planck- Institute for Biophysical Chemistry, Gottingen, Germany.
von der Malsburg, C. (1988) Pattern recognition by labled graph matching. Neural Networks 1:141-48.
von der Malsburg, C. & Schneider, W. (1986) A neural cocktail-party processor. Biological Cybernetics 54:29-40.
Wang, D., Buhmann, J. & von der Malsburg, C. (1990) Pattern segmentation in associative memory. Neural Computation 2: 94-106.
Warren, R. M. (1970) Perceptual restoration of missing speech sounds. Science 167:392-93.
Weisstein, N. & Harris, C. S. (1974) Visual detection of line segments: An object-superiority effect. Science 186: 752-55.
White, E. L. (1989) Cortical Circuits: Synaptic Organization of the Cerebral Cortex. Structure, Function and Theory. Birkhauser.
Wimmer, H. & Perner, J. (1983) Beliefs about beliefs: representing and constraining function of wrong beliefs in children's understanding of deception. Cognition 13: 103-28.
Yamaguchi, Y. & Shimizu, H. (1994) Pattern-recognition with figure- ground separation by generation of coherent oscillations. Neural Networks 3: 49-63.
Young, M. P., Tanaka, K. & Yamane, S. (1992) On oscillating neuronal responses in the visual cortex of the monkey. Journal of Neurophysiology 67: 1464-74.
Zaitchik, D. (1990) When representations conflict with reality. Cognition 35: 41-68.
Zemel, R. S. & Hinton, G. E. (1991) Discovering viewpoint-invariant relationships that characterize objects. In Advances in Neural Information Processing Systems 3, 299-305, Morgan Kaufmann.
Local processors with contextual guidance and some of the network architectures that can be built from them. Solid black lines show receptive field (RF) inputs; dashed grey lines show contextual field (CF) inputs. a) A local processor receives input from a receptive field vector, RF, via a vector of synaptic strengths, W, and from a contextual field vector, CF, via a vector of synaptic strengths, V. Each element of the two input vectors is multiplied by its particular synaptic strength, these are summed to give the integrated RF and CF inputs r and c, which are then used as specified by the transfer function to determine output probability. For simplicity the synaptic strength vectors and the integrated inputs are omitted in the examples of possible architectures shown in b), c), and d).
The aim of learning (based upon an unpublished diagram by Geoff Hinton). Predictive relationships exist between separate input data sets, but these relations will often be between higher order functions rather than between pairs of individual data points. The aim of learning is to find mappings into new spaces that make the predictive relationships easier to compute. Predictions are here shown going in one direction only but they may also be reciprocal. In our approach, but not in that of Becker and Hinton (1992), the predictions are used to guide the short-term processing as well as the learning.
A transfer function for local processors with contextual guidance showing how output probability is related to the integrated receptive field (RF) input, r, and to the integrated contextual field (CF) input, c. When the RF input provides no evidence in favour of either a positive or a negative output decision then r = 0, output remains at the neutral probability level of 0.5, and is unaffected by CF input. The lower panel shows the effect of context in the case where the RF input supports a positive output decision, for just three specific values of c. When c = 0 the probability of a positive output increases from 0.5 as specified by the logistic function. When the context predicts a positive output, c = 1, the probability of a positive output increases more rapidly as a function of positive RF input. When the context predicts a negative output, c = -1, the probability of a positive output increases less rapidly as a function of positive RF input. The equivalent effects also occur when the RF input favours a negative decision.
A Venn diagram illustrating the decomposition of the information in the output of a local processor, X, into four disjoint components (in the case where I(X;R;C) is positive). A goal for processing can be specified as the relative importance attached to increasing or decreasing each component. H(X) is the total information (Shannon entropy) in X, etc. The length and directions of the arrows indicate the goal of increasing the transmission of coherent information, I(X;R;C), as much as possible, increasing the transmission of the information in the RF that is independent of context, I(X;R|C), but to a lesser extent, and reducing the transmission of any information in the context that is independent of the RF, I(X;C|R). The fourth component of H(X), i.e. H(X|R,C), is information in the output that was in neither the RF or the CF input. This is noise that is generated by the processor itself. Reduction in each of these last two components can either be specified as part of the goal or can be left to occur by default as a result of dedicating output channel capacity to other components. The information flow is shown by the icon bottom left.
Change in the strength of the RF and CF connections (weight change) as a function of post-synaptic activity (output probability). The threshold above which connection strengths are increased is a function of prior activity.
The architecture and stimuli used to compare the goals of maximising coherence across streams, i.e. Coherent Infomax, with that of maximising information transmission within streams, i.e. Infomax. Within streams the single unit in each processor received RF input from all nine receptors. It also received CF input from the unit in the other stream. The stimuli consisted of the six pairs of inputs, each being presented at random with the probabilities shown. The sign of the horizontal bars was not the most informative variable within streams but it was correlated across streams.Figure 7. Results of the simulations using the architecture and stimuli shown in Figure 6. Learning specified by the goal of maximising coherence across streams is shown on the left; and that specified by maximising information transmission within streams is shown on the right. Each of the four panels shows the information transmitted (loge ) by each of the two output units at different stages of learning. The synaptic strengths produced by each form of learning are shown at the bottom, with white being positive, black being negative, and diameter being proportional to the absolute value. Each of the two sets of synaptic strengths is representative of what was found in both streams.
The architecture of the net used to study coherent grouping. Twenty-five streams of processing with non-overlapping RFs were arranged as a 5 X 5 array with contextual connections between neighboring streams as shown by the lines joining the local processors. The enlargement on the right shows that in each stream the processors were composed of four units, with each unit receiving RF input from each of nine input units. Units receive CF input from each of the units in the neighboring streams, as well as from all other units within the same processor. Input patterns were continuous, positive or negative, horizontal, vertical and diagonal bars as shown by the examples.
The effects of opposing or supportive context on the output probabilities of the four units of the local processor at the centre of the 5 x 5 array in the architecture shown in Figure 8. The four output probabilities are shown as increases or decreases from the neutral level of 0.5.Figure 10. Coherent grouping in a net with one hundred single-unit streams. Each unit receives just a single RF input. Two groups of nine units, each arranged as a 3 x 3 rectangle, have positive CF connections with all other members of their group and negative CF connections with all members of the other group. Input and output strengths for each unit are shown as the absolute deviation from an output probability of 0.5 with zero being black and 0.05 being white. Outputs are calculated over 12 iterations. Input strengths vary randomly from iteration to iteration, but the inputs to both sets of nine units are always positive. Output probabilities increase simultaneously for all nine units of a group within a few iterations, but only one group emerges from the background at a time.
Organization of the synchronizing connections between (A) , and within (B) regions of the visual cortex (Singer 1995). RF connections are shown as solid lines; synchronizing connections are shown as grey lines in (A), and as dotted lines in (B).
a). Local contextual, or `association', fields as inferred from psychophysical studies of contour integration in human vision (based on Fig. 16 of Field et al. 1993). The square patches stand for oriented RFs at a particular spatial scale and in the relative positions shown. A sample of the contextual fields through which the output of cell a can be grouped with the outputs of other cells is shown. The arrows on the left show that these connect cells whose preferred orientations form first-order curves, with a strength that reduces with deviation from collinearity. Cell a will therefore not be connected to cells whose RFs have orientations and positions as shown on the right. This joint dependance on orientation and position is similar to that for the synchronizing connections shown in Fig. 11. b). An illustration of the perceptual effects assumed to arise from such contextual grouping connections. A long upward-facing curve can be seen running through the whole array. In this example, element a is more likely to be seen as being grouped with b than with c, even though it could equally well be paired with either on the basis of the pairwise connections. Such grouping processes may involve cooperative population effects, but an additional possibility is that the activity of cells operating at spatial scales that distinguish the individual elements are grouped by contextual inputs from cells operating at a coaser scale, and which detect the long curve without distinguishing its elements.