Amit, D.J. (1994). The Hebbian paradigm reintegrated: Local reverberations as internal representations. BEHAVIORAL AND BRAIN SCIENCES (1994) 18(4) 617-26

The Hebbian paradigm reintegrated: local reverberations as internal representations

Daniel J. Amit
Istituto di Fisica
Universita di Roma,
Ple Aldo Moro, Roma
damita@il.ac.huji.fiz.ilios


Abstract:

The neurophysiological evidence from Miyashita et al.'s experiments on monkeys as well as cognitive experience common to us all suggests that local neuronal spike rate distributions might persist in the absence of their eliciting stimulus. In Hebb's cell-assembly theory, learning dynamics stabilize such self-maintaining reverberations. Quasi-quantitive modelling of the experimental data on internal representations in association-cortex modules identifies the reverberations (delay spike activity) as the internal code (representation). This leads to cognitive and neurophysiological predictions, many following directly from the language used to describe the activity in the experimental delay period, others from the details of how the model captures the properties of the internal representations.


Keywords:

active memory, associative cortex, attractor dynamics, content sensitivity, internal representations, learning, modeling.




então concluiremos que uma palavra, quando dita, dura mais que o som e os sons que a formaram, fica por aí, invisivel e inaudivel para poder guardar o seu proprió segredo...

we may conclude that a word, once said, persists longer than the vibration or the sound that formed it, it stays there, invisible and inaudible so as to preserve its very own secret

Jose Saramago, A Jangada de Pedra 1988 The Stone Raft

1. Introspective Attractors

Modeling brain function by neural networks can be roughly divided along the lines separating feed-forward networks and feed-back, or attractor networks. A very clear and informative manifestation of the presence of attractors in temporal cortex of primates is presented in the recent studies of Miyashita et al (See also e.g. ref. Here I will suggest that most of the results observed in these seminal experiments could be predicted on the basis of simple observations on common cognitive phenomena, without recourse to any specific model. One can then argue that considerations of such type have, most probably, underlaid Hebb's postulation of synaptic dynamics as a means for stabilizing reverberations in neural assemblies. The first consequence of this discussion is the potential usefulness of such a mechanism of reverberations as an organic way for dissecting the complexity of the connection between sensory input and motor reaction.

Consider the familiar every-day situation in which one is given the task of translating a word from one language to another. The word is known in both languages, i.e. both words are in memory. Suppose further that the task is communicated verbally (spoken). Both the task and the word in the first language are well understood. The acoustic stimulus disappears as soon as its communication is completed. It may be the case that the response, the word in the second language, is produced very rapidly and correctly. It may though be the case (equally familiar) that the first word is recognized; there is a strong sense that also the translated word is known, but the retrieval of the corresponding word is not possible. One knows that one knows, yet one does not know. One can go on with the effort of the retrieval of the translated word for quite a while, despite the absence of the provoking stimulus. Even the situation in which the the entire episode, word-task, withers away from our consciousness, only to surface resolved hours or days later, is quite common.

During this long search period, conscious or unconscious, the first (the given) word must have been available. But it must have been available in a sense that goes beyond the fact that it was contained in our memory, i.e. we knew it. After all there are very many words we know in the first language. What is special about this particular word, of course, is that it has been tagged by the stimulus -- the sound of the spoken word.

This naive resolution of the problem posed by the familiar cognitive situation has quite significant implications. Somewhere in the skull, between the locus of the fully pre-processed stimulus and before the beginning of a generation of a response, there must be stations storing, passively, many memories. Those are the things we know and remember. They are, most likely, stored in the synaptic structure of each station. Each of these stations must be able to maintain one memory out of the passive stock (the one tagged by the stimulus) in a special status, for relatively long times. It must be able to maintain it in a status which will make it available for future attempts to perform the task.

In fact, this type of consideration leads also to the conclusion that the number of such stations must be relatively high. One seems to be able to maintain several items concurrently in the tagged status. Those may be related to several items involved in a given task; items involved in different tasks which interweave during long time intervals of inattentive processing and also possibly the active presence of the tasks themselves to which we return briefly below. Consequently, the cortex must be modular. Both anatomy and physiology provide hints as to the geometry (in terms of cortical volume, localization and number of neurons). See also discussion of the experiments in Section exp.attr.

2. The concept of attractors -- active and passive memory

The attractor picture of a cortical module is, briefly, as follows: The module consists of a relatively large ensemble of neurons (of order $10^5$ cells), in which the probability that any two neurons be connected by a synapse is high (a few percent suffice). Such an ensemble of cells is geometricly localized in about 1mm$^2$ of cortical surface. Experience (training) structures (in a Hebbian way) the set of connections, i.e. the set of synaptic efficacies. The resulting synaptic structure is such that when a stimulus activates a subset of the neurons in the module, i.e. raises their spike rates and then is turned off, the activity of the neurons in the ensemble may, deping on the stimulus, either: Decay rapidly back to spontaneous levels -- stimulus ignored. Or, for other classes of stimuli, maintain for long times a stimulus-selective subset of the neurons at elevated rates in the absence of the stimulus. In other words, the synaptic structure provides sufficient structured feedback so that the afferents (inputs) due to the distribution of activity among neurons with elevated spike rates, maintains the same set of neurons active, leaving the others at spontaneous levels. The collective nature of this dynamical state of affairs, i.e. the fact that the elevated activity of each neuron is maintained by many others, makes the attractors impressively immune to disorder in the synaptic structure as well as to dynamical noise. Both of which are unavoidable in cortical conditions.

What determines which configurations of neurons in the assembly can collectively maintain each other in the elevated activity state, is the synaptic matrix. This is passive memory. What determines which of the possible, self-maintaining configurations actually reverberates in the module, is the stimulus. The short appearance of the stimulus `tags' one of the passive memories by activating the particular attractor associated with this stimulus. The activated configuration in the assembly is an attractor in the sense that each of these configurations is activated by a wide class of stimuli, in some sense close to each other. The active configuration then is the representative of the class. This is what often goes under the name of content addressable memory or, less commably, as associative memory, in contrast to physical address referencing of memory used in digital computers.

The attractor type of memory activation contrasts with the computer in yet another sense. In the computer, when a piece of information is to be acted on, it is taken from its address and put in a special section of the processor for action. This, in some sense, in analogous to the activation of a passive memory. In the neural module, however, not only is the addressing done by content and not by physical address, but the activation leaves the item in the module. In some sense the module is both the memory and the register. Thus while in the computer the activation of a memory item is signalled by the special location (the register), in the cortical module, we argue, the activated memory is distinguished by the activity. The place remains invariant. Moreover, the computer register can hold any information configuration for processing, the attractor module will hold only the representatives of classes that had been learned into the synaptic structure.

I am not emphasizing these distinctions to imply a preference. It simply that in a system like the cortex the register option is not available, because the register will also be made of neurons and to maintain an item for later processing can rely only on synapses and we are back in square one.

It is important to make a clear distinction at this point between the tagged persistent memories and concepts such as short, intermediate and long term memory. The latter refer usually to the stock of passive memories, i.e. to the duration of the synaptic programming or to the duration of its accessibility. They relate to the ability of the system (the brain) to manipulate incoming tasks. The tagged persistent item introduces an additional ``temporal" category: the persistent activity distribution excited by a stimulus. A memory of this type (sometimes referred to as working memory, see e.g. can belong to any of the three temporal categories. This basic distinction is sometimes overlooked, and even by Hebb himself. See e.g. second quote from Hebb and discussion in Section hebb.par.

The simplest conceivable carrier of such a tagging signal is the persistent distribution of elevated spike rates (Hebbian reverberations) among the neurons in the module (Hebb's cell assembly). One may contemplate other stimulus selective taggings of stored memories, but those would be much more difficult to observe. Since persistent spike distributions in the absence of the provoking stimulus are governed mainly by the synaptic structure in the local assembly, the tokens maintained there during the prolonged performance of the task will be prototypal. It is likely that the structure of such reverberations will not dep on details of the stimulus, such as the tone, the pitch or the modulation of the acoustic signal communicating the original word. In theoretical models the dynamics of multi neuron systems, when maintaining activity distributions in the absence of the stimulus, gives rise to a global depence on the stimulus: Large classes of stimuli will provoke the same persistent spike distribution for all stimuli in a class. Stimuli which are different enough provoke different persistent activities. In this sense the activity distributions in the reverberations can be considered as representations of the class of stimuli that provoke it.

It may be useful to clarify the role that is ascribed in the present context to the word internal `representation', given that it is at the center of so much debate in the community of cognitive science. In the computational situation described above, in which the performance of the task on the stimulus (the word to be translated) is to take place long after the stimulus has disappeared, seems to leave little choice. When the task is to be finally carried out, it must have an operand. That token, which survives somewhere in the cortex, is a representation of the set of equivalent stimuli. Such a token seems to be logically required and a candidate for it is experimentally observed, as we recount below in Section exp.attr.

The indepence of the reverberation in the local assembly of the details of the stimulus does not imply that the internal representation does not dep on the task. Since the stimulus contains the word as well as the task, the persistent token may, in principle, dep on both. Yet the fact that the same word can be involved in many different tasks -- rhyming, opposite, synonyms etc, suggests that the task may be represented also, or only, elsewhere. The linking of two separate representations is still an open problem to be investigated both experimentally and theoretically.

3. The cortical processing cut

Admitting the presence of such reverberating stations in the cortex splits a potential feed-forward picture, describing cognitive (psychological) behavior. The split takes place inside the cortex and the boundary lines may lie at different distances from different sensory mechanisms. Roughly speaking, there is a three-way division:

  1. The formation (learning) and the function of the organic system leading from the external world to the persistence stations: i.e. the preprocessing of the input required for the formation of different internal representations (the structure of the persistent taggings) for significantly different stimuli;
  2. The formation and the activation of the internal representations (persistent taggings, Hebb's reverberations, see below) by the pre-processed afferent stimuli, and the interaction between activated reverberations in different stations (modules);
  3. The organization and the functioning of the decoding of the cortical reverberations into computationally driven reactions. {itemize}

This split of the cognitive computational machine is advantageous for experimental as well as for theoretical study. The above comments imply that the persistent tagging represents (at least for a while) some abstract feature of the stimulus involved in the task. Hence the neuro-physiologist can search for these local modules (Hebb's assemblies) in the cortex of the performing mammal, in the course of the performance of a prescribed behavioral paradigm. The tentative acceptance of spike rate distributions allows an easy read-out, by the neuro-physiologist, of the relevant representations. This he can do by single unit recordings to be analyzed off-line. He can count on the fact that the internal representations he will observe will not dep on particular details in the manifestation of the stimulus, provided the stimulus has been correctly classified (interpreted) by the subject animal. The latter can be monitored by the animal's response in the course of the experimental paradigm.

From the point of view of cognitive science one representation may be as good as any other. The opportunity provided by the Hebbian cut is that it allows a direct, empirical expression for the representations to replace a metaphorical one. As is argued in Section exp.attr, below, the quantitative properties of the representations discussed here can be measured. A few tens of milliseconds following the disappearance of the stimulus, what is in these attractors is all there is to be for the completion of the mental computation. Given a direct and measurable commitment for the representations the speculations of cognitive science can proceed with reference to a well defined body of data on the neuro-physiological level. The realistic neural underpinning is required to inform the speculation about the potential as well as the limitation of the infrastructure.

The above properties are not common to all modeling paradigms of internal representations. For example, in a feed-forward description of computation, a particular pattern of neural activation persists only for as long as the stimulus is on. A given neural activity distribution, provoked during the imposition of the stimulus, cannot be supposed to be available for computation at a later time. Moreover, the activity distribution may be sensitive to the particular details of the imposed stimulus. Thus the activity distribution may be richer than what is actually used for continuing the computational process. It may therefore not provide sufficient constraints on the computation. The attractor, in contrast, contains measurable information for long times and that information is the same for all stimuli which are classified by the same attractor.

Moreover, the attractor dynamics distinguishes naturally between the course of an unfamiliar stimulus and a familiar one. The former being a stimulus significantly different from the ones that one has learned to classify. Attractor dynamics leaves all neurons in the module at very low activity levels, despite the fact that during the presentation of the stimulus as many neurons may be excited as for a familiar stimulus. This distinction is not naturally available for alternative paradigms of representation.

The collection of such representations and of their depence on the task can provide invaluable information about how computation is organized in the cortex. But before proceeding with the elaboration of the experimental and theoretical account and perspectives, I return to the subject of this essay's , to Donald Hebb.

4. Multi-component Hebbian paradigm

Allusions to Hebb abound in the preceding text. They have not been explicitly formalized because I have been trying to emphasize the intuitive appeal and the almost imperative nature of the local internal representations. Yet I believe that whatever is valid in this picture must have been clearly perceived by Donald Hebb many years ago. Someone joining the field in the last decade finds innumerable references to Hebb's work. But the general tenor of these references is of a synaptic engineering type. Almost any type of synaptic learning in neural networks genuflects in Hebb's direction. Yet Hebb was not a neuro-chemist nor a neuro-physiologist. He was a psychologist searching for a neurally based infrastructure to psychology, to supplement, or replace, the mythological one.

The Hebbian paradigm is multi-dimensional. It is composed of a prescription for synaptic modifications: synapses are modified by afferent stimuli in a way that ts to stabilize the pattern of activity provoking the synaptic modifications. The stable neural activity distributions are excited in the local assembly by each of the learned stimuli. This is an unsupervised learning mode which aims at producing synaptic structures which can sustain a selected set of activity distributions in the local assembly, to use Hebb's language. The role of the resulting synaptic structure is to sustain the local activity produced by a stimulus in the absence of the provoking stimulus. To maintain the activity provoked by the stimulus in the presence of the stimulus does not require any synaptic modification.

Hebb's paradigm is not about the activity generated in one assembly by the activity present in another. It is not a feed-forward picture, for good or for bad. It can be summarized as a process generating the feed-back connectivity required for maintaining reverberations (persistent spike distributions) in a local network by the activity in the same network. The citations below make this point quite clearly.

Let us assume that the persistence or repetition of a reverberatory activity (or ``trace") ts to induce lasting cellular changes that add to its stability... When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased.

Elsewhere Hebb says "It seems that short-term memory may be a reverberation in the closed loops of the cell assembly and between cell assemblies, whereas long term memory is more structural, a lasting change of synaptic connections. [p. 110, emphases in the original]

This is not inted as a hagiography of Donald Hebb, nor proof by dogma, nor a decomposition of texts. Rather, it is an attempt to salvage a profound idea from excessive fragmentation which has obscured the potential functional three-way split of the cognitive computational machine. In particular, it has caused an underestimation of phenomena necessary for mental processing, phenomena which can relatively easily be observed neuro-physiologically and which can provide precious information for deciphering some basic ideas pertinent to cortical computation.

Note that in the second quote the distinction between long term memory and active memory is implied, yet the terminology is not adopted. Clearly, the reverberation is an active state of the assembly, while the structure of the synaptic organization is not. Moreover, the reverberation must be sustained by the underlying synaptic structure, and hence it is a particular expression of the properties of the synaptic structure. A given synaptic structure may persist for short, intermediate or long terms.

To conclude this speculative part it may be of value to point out an additional bonus promised by this outlook. It is known anatomically, physiologically and neurologically that as one proceeds along the elaboration path in the cortex, one always finds back projections, as far back as into the primary sensory areas. On the other hand, it is a very familiar experience to have a given sensory power notably improved when the content of the observed stimulus is known. For example, when vision is impeded by distance or haze so that a given object cannot be discerned (or read), receiving a cue as to the nature of the object (or the written text) often produces a clear perception of the target. In other words, suppose that the sensory information about an object is not sufficient to produce recognition. Suppose further that other information about the object is given (vocally, for example) some time prior to the observation of the object. The information in the vocal signal had been recognized well, and hence has excited a reverberation in some module. The back-projections from this module to the more primary areas in which the visual signal is being processed may provide sufficient additional specific information to make recognition possible. The special role of the reverberation is to make the contingent information available long after the signal producing it has disappeared.

Hebb's idea about reverberation in cortical assemblies seems to have been motivated by observations of Lorente de No on neural diagrams of Golgi stained cortical slices made by Ramon y Cajal. The neural diagrams discussed by Lorente de No contained a small number of neurons, because the staining method makes a small fraction of the neurons in the slice visible. Hence the feed-back circuits observed seemed relatively simple and suggested simple flows. Hebb was aware of the fact that the circuits were too simple and would probably not be able to sustain a reverberation for sufficiently long time. And that to ensure long living reverberations a much larger number of neurons would be required.

5. Experimental evidence for Hebbian reverberations and beyond

The fact that local reverberations are almost a logical necessity does not remove the need to test their existence empirically. This has been done in the last few years in a particularly convincing way by Miyashita et al., in a culmination of a program started some twenty years ago by Fuster and Niki. In these experiments monkeys are trained to perform delayed image matching (delayed match to sample DMS) of visual images. The images are supposed meaningless for the monkey and mutually uncorrelated in their geometrical structure. For this purpose the images are generated by a computer using a graphic procedure with several stochastic components. Typical images are given in Fig. miya.image. Following training, testing proceeds according to the following protocol: 1. an image appears on the screen for a short period (200$ms$); 2. The screen remains blank for a prolonged period (as long as 16 seconds); 3. a second image appears briefly; 4. the monkey should react selectively upon whether the first and second image have been the same or different.

Training is performed, prior to the insertion of recording electrodes, by presenting the monkeys long sequences of pairs of images as described above, and rewarding them for correct responses. In different experiments the sequence of first images is varied. For example, the first images can be drawn at random from a store of generated images; or, it can be a fixed sequence shown in a fixed order; the set of images can be divided into pairs, the members of each pair follow each other in a fixed order as first stimulus during training, the pairs are selected at random. The second stimulus is always randomly selected with about 50% chance of being equal to the first. Note that the existence and the structure of the attractor deps only on the first stimulus. The second stimulus, the one used for matching serves to maintain the attention to the task. The ordering of first stimuli applies exclusively to the training phase. During testing, first stimuli are drawn at random.

FIGURES AVAILABLE ONLY IN HARD COPY

Figure 1. Several visual images used in the experiment.

By impressive use of circumstantial evidence Miyashita et al. succeed in identifying a small part (about 1mm$^2$) of anterior ventral temporal (AVT) cortex where stimulus selective persistent activity is manifested during the delay period, i.e. in the interval between the presentations of the first and second images, in which the stimulus is absent. The fact that the selective activity distribution can persist for as long as 16 seconds, in a rather noisy environment, is convincing evidence for the local maintenance of a reverberation by the feed-back in the synaptic structure.

FIGURES AVAILABLE ONLY IN HARD COPY

Figure 2. Reverberation dynamics. Four types of neuronal behavior observed in single units. At the top of each window 12 spike rasters demonstrate the reproducibility of the delay activity on the single unit level, despite intervening presentations of other stimuli. Each row of dots is the representation of spike times recorded from a given neuron in a single trial, with the same image for first stimulus. The similar density of spikes, in the delay period (the wide central interval ) in all 12 rasters, despite the fact that other images intervened as first stimulus between them, is the reproducibility underpinning the representation concept. Bottom: spike rate histograms. a) Neuron active in presence of stimulus and persisting in its absence; b) Same neuron unaffected by stimulus, active in delay period (a different stimulus leading to the same attractor); c) Same neuron, third stimulus, neuron active in presence of stimulus, weakly active in reverberation, a different attractor; d) Same neuron inhibited in presence of a fourth stimulus and weakly active in delay period. Same attractor as in (c). Under each window is the time course of the trial's protocol: pre-stimulus; warning; first stimulus; delay; second stimulus.

The main findings of these experiments, as seen by a theorist, are:

  1. Self-sustained, long lived, stimulus selective spike rate distributions (reverberations) exist in the cortex;
  2. The locus of the reverberations is locally modular, i.e. within a given cortical region they appear to concentrate in one restricted part. Namely, the module could be considered as a spatially cohesive group of neurons, in contrast to a sparsely distributed collection. This fact concords well with the anatomical observations , which indicate that the probability of synaptic connectivity between neurons in cortex falls off with separation. But within a range of 1mm that probability is still of the order of several percent. Thus, despite the fact that single synaptic contacts in cortex are typically weak (on the order of 100 pre-synaptic inputs are required to elicit a spike), this level of connectivity allows for the maintenance of robust attractors (Some studies find a small fraction of strong synapses. Those are ascribed to multiple contacts between certain pairs of neurons. If that fraction is significant, it may lead to some interesting internal dynamics inside the attractor.;)
  3. The reverberations are reproducible on the single unit level, i.e. the rates recorded on the same electrode from the same cell dep only on the stimulus leading to the reverberation. In other words, during testing the first stimuli of the trials are presented in random order, hence between two presentations of the same image as first stimulus many other images intervene as first stimuli. Yet the delay activity is indepent of this history. The reverberations can be considered as internal representations of the stimulus, in the sense discussed in the section above;
  4. The reverberations (representations) are not single neuron properties. Single neurons cannot maintain selective activity rates. They are most likely collective emergent properties of the local module (the assembly). (See also discussion of apparent paradox concerning single units, below). In fact, these experiments demonstrate explicitly that neurons driven by the stimulus may not be active during the reverberation and vice versa: neurons unaffected during the presentation of the stimulus may be driven in the delay period.
  5. The internal representations are distributed: many neurons participate in the representation of each memory and different representations share neurons;
  6. The local internal representations are attractors (as was mentioned in item the above), a whole class of similar stimuli leads to the same reverberation;
  7. The representations in this particular module of cortex are of prototypes, e.g. they are blind to color, size, angular orientation. This seems rather typical to this part of cortex, see [tanaka];
  8. The representations code in their spike rate distributions, among other things, also for temporal correlations in the training phase. In other words, what is represented deps not only on what is learned but also on how it is learned. This is an embryo of context sensitivity on the neuro-physiological level (see discussion of the implications for cognitive psychology Section {emp.pred}). To be more specific, one may suppose, as is the case in the theoretical model (see e.g. [abt]), that each of the correlated attractors contains information (expressed in neural activities) about the image activating it as well as about the images activating the attractors correlated with it. Hence, each attractor `knows' about its neighbors in the temporal sequence;
  9. In a different learning protocol, in which the sequence of first stimuli is presented as a set of permanently ordered pairs, the internal representation becomes identical for both members of each pair. So does the behavior of the monkey in the behavioral paradigm. This does not imply that the monkey cannot detect the difference between the two pictures, but only that in the module used for generating the response for this specific task the two representations merge.

FIGURES AVAILABLE ONLY IN HARD COPY

Figure 3. Correlated reverberations. Correlation coefficients (Kal rank coefficients) of spike activities in a neural population in the delay period as a function of the positional separation of the stimuli exciting the reverberations in the training sequence. (From ref. [miya2] Fig. 3c.) Full circles represent correlations of delay activity distribution for learned images (used in training). Empty circles refer to activity distributions provoked by `new' images (not used in training). The different curves represent different samplings of neurons in the module selected for the computation of the correlations. The stars are irrelevant to the present discussion.

The above presentation of the empirical situation requires several notes of clarification. On the one hand, we have emphasized the distributive and collective nature of the internal local representations and on the other, the reproducibility of recordings of single units. This apparent contradiction disappears if one keeps in mind that the enhanced, persistent activity of any individual neuron can only be sustained due to the support of its fellow neurons participating in the same reverberation, via the synaptic local feed-back. The activity of the single neuron, therefore, carries information about the stimulus only if it is accidentally selective between the different distributed reverberations. Among the active neurons there may be several that have the same activity in different representations. On the other hand, the fact that the reverberations are dominated in their detail by the synaptic structure, and not by fine details of the stimulus, implies that every time the same stimulus is presented, the same reverberation is aroused. Consequently, the internal representations, for whatever they represent, may be perused and catalogued recording one neuron at a time, as is in fact done in the experiments of Miyashita et al.

This precludes, of course, observing phenomena related to correlations in spike emission times if those manifest themselves in different neurons. To observe those, multi-unit probes are required. Yet, I feel that the information, representational, computational and cognitive, contained in rate distributions is far from exhausted. On the other hand, the relative facility of access to this type of information as well as of its analysis and modeling, compared to multi-electrode data, makes it a very attractive subject of investigation.

6. Empirical and cognitive implications

A significant aspect of the generation of random images in the Miyashita experiments is that it is quite likely that the correlations between internal representations of stimuli are due only to their context depence, i.e. their frequent contiguous appearance in the training phase. The effort invested in the generation of the images is recompensed by the fact that in some sense the registered correlations between the attractors are the minimal correlations among internal representations: those due purely to the constraints imposed on the learning process by the sequential training and the existence of attractors.

The attractor picture and the observed correlations it creates among internal representations have a rather universal feature. The dynamical tagging of a given memory may produce persistent activities in several modules in the cortex. The representation of one stimulus in different modules may represent different features of the stimulus, such as color, shape etc... Different modules may be involved in the generation of different types of reactions. One gets the impression, from the experiments of the Miyashita group, that the observed module in AVT is directly related to the pair association [miya3], while it is not as directly related to the matching task. The universality inted here is that this does not matter, in the sense that wherever the attractor related to the reaction is, it must represent similar correlations of the internal representations, because those dep only on the fact that one is learning one attractor while reverberating in a previous one.

This observation suggests that one may use the lessons of these experiments to speculate directly about human cognitive psychology. The the fact that correlations form between the attractors representing semantically meaningless, uncorrelated stimuli implies that priming phenomena should be observed among stimuli of this kind. (Priming is the experimental observation that the time for recognition of an incomplete pattern is shortened if the presentation of the stimulus to be recognized is preceded by the presentation of a cognitvely related stimulus.) The only condition is that they be presented in an ordered sequence during training, indepently of whether we can observe the attractor cortical network involved in the cognitive task or not. That priming should take place in a situation of correlated attractors can be concluded intuitively, and is confirmed in model networks. It is the mere observation that if the assembly is in a given reverberation, due to the priming stimulus, the test stimulus to be recognized will find it easier (and hence faster) to provoke a transition to another reverberation attractor, the more similar the activity distribution in the latter attractor is to the priming attractor.

Thus, purely on the phenomenological, pre-theoretical level, given the observation of correlated attractor representations, one is lead to consider for example:

  1. Testing priming effects in humans with semantically meaningless stimuli, essentially imitating the Miyashita experiments, but measuring reaction time changes upon priming. One would expect a decrease in the priming effect with the distance in the training sequence, i.e. with decrease of the correlation between the internal representations.
  2. Investigating the assumption that false alarms are also caused by attractor correlations. (False alarms is an experiment in which a subject is required to identify whether a given stimulus belongs to a subset of stimuli or not. The effect is that when the test stimulus is correlated (cognitively) with one of the items in the subset, the number of wrong `yes' answers increases.) This hypothesis can be tested most clearly upon sequentially memorized semantic-free stimuli.
  3. Testing the effects of the intensity of temporal correlations on the internal attractor correlations. In other words, one can prepare a set of stimulus sequences with increasing proportion of sequences of fixed order, the others being random. One should expect a threshold behavior in the priming effect as a function of this proportion.

Similarly, the attractor interpretation of the experiments suggests informative extensions of the experiments in primate `cognitive-neuro-physiology':

  1. An immediate consequence of the framework proposed is that simple attractors (internal representations) must be formed prior to the correlated ones, as an intermediate stage in forming correlated attractors. Hence, one should find a point in the training process at which internal representations exist, but express no correlations;
  2. A related experiment to the one above is to train at length with patterns presented in a random sequence. Internal representations should form, but they should be uncorrelated. Their correlations with the activity distributions driven by the stimuli themselves, would be invaluble to the solidification of the modeling effort;
  3. One can perform on monkeys the priming experiments, suggested for humans above, while observing attractor transitions in the cortical module under investigation;
  4. One can ext the experiments on pair association [miya3], which have been interpreted as attractor fusing. This can be done by measuring the correlations between pair representations in situations of partial ordering of the pairs in the training phase. Specifically, in the experiment first stimulus images belonging to a pair were always shown, during training, contiguous and in the same order, while pairs were selected at random. One can conceive of keeping the strict ordering within each pair and having each pair followed by a given pair with some probability p and choosing the subsequent pair at random with probability 1-p;
  5. It is of importance to establish the structure of the distribution of activity in the module following the response;
  6. But most important would be the identification of the additional representation modules in the cortex. These would be in different parts of the cortex, or even outside it, as different dimensions of a stimulus seem to be stored at different stations along the cortical elaboration path. See e.g. Damasio and Damasio.

7. Toy models and realistic modeling

Modern physics has had it as a very powerful methodology that very structured phenomena in a complicated system are investigated in toy models. In other words, phenomena like super-conductivity, magnetism, liquid crystals etc., are not searched for by starting from the well established dynamical laws of systems of nuclei and electrons. Instead, some essential features of the elements, deemed relevant to the structured, emergent phenomenon under investigation, are represented in a simple tractable model. If the dynamics of the toy model actually produces the expected structure, the robustness of the phenomenon to the reintroduction of the omitted complexity of the underlying elements is investigated. This iteration serves both to justify the toy as well as to study further details of the emerging structures.

The discussion in the sections above has left us with a double task:

  1. The demonstration of a theoretical framework in which a plurality of stimulus selective persistent activities can be embedded in a single assembly of neuron-like elements;
  2. The demonstration that in such a framework correlations (context sensitivity) emerge between reverberations (internal representations) corresponding to uncorrelated learned stimuli.

The first task has been solved by the Hopfield model which has served as the basic toy model in describing the emergence of a diversity of structured attractors, robust to many types of random damage and noise. This was done by employing an explicit form for a synaptic matrix, that has a learning flavor, in the sense that the set of synaptic efficacies is constructed, in an additive fashion, from the correlations of activities of neural pairs in afferent patterns that are to be the attractors of the network. This is in the grain of the Hebbian paradigm. Namely, external stimuli to be learned impose activity distributions on an assembly, which in the learning process develops a set of synaptic efficacies that can maintain, autonomously such activity distributions as reverberations. It is after all the synaptic matrix, and only the synaptic matrix that can maintain the delay activity in the absence of the stimulus. The possibility of generating a synaptic matrix which ows a network with a large variety of different robust attractors in a single module, was the main achievement of this model. Moreover, the formulation of the model allowed for the detailed computation of many properties of the dynamical response of the assembly to external afferent stimuli.

Furthermore, this toy model has also served to generate metaphors for several psychiatric and neurological pathologies. Hoffmann has used it to describe a distinction between mania and schizophrenia. Virasoro has used the properties of this model under a random destruction of synapses (a lesion) to capture phenomena such as prosopagnosia.

Yet this model has left unanswered a whole set of questions of detail. A representative list includes:

  1. The model predicts high spike rates in attractors, while recordings produce rates much below saturation. In fact, the attractors in the model are activity distributions in which about half the neurons in the assembly are quiescent while the other half emits spikes at saturation rates. This allowed modeling in terms of binary discrete variables for the neurons;
  2. The recorded coding levels (The proportion of neurons in the assembly with elevated spike rates) in a given reverberation, are much lower than the 50% implied by the toy model;
  3. The model produced auto-associative networks, i.e. with attractors as close as possible to the memorized stimuli, while experiment did not;
  4. It predicted a bi-modal distribution of rates in an attractor, unlike the empirical observation.

Early criticisms of the Hopfield model have concentrated on the type of synaptic matrices used. Symmetric, fully connected matrices, with random distribution of excitatory and inhibitory synapses on the axons of each neuron, and infinite analog depth (The ability to maintain with high precision a large number of different, closely spaced values.) for each synapse, have made the analysis by theoretical physicists easier. Those are unrealistic impositions, since it is unlikely that the cortex will generate symmetric synaptic structures; cortex is connected at most at a level of 10% ; excitation and inhibition find their places on different neurons -- Dale's rule. Yet these criticisms have been shown to be relatively innocuous. The synaptic dilution and the limited analog depth of the synapses have been treated by Sompolinsky and shown to affect the performance of the network only mildly. If fact, in some cases the performance per remaining material resource (such as storage capacity per surviving synapse) was found even to improve in the less ideal system.

Also the question of the low coding levels has been found to be relatively simple. Though its resolution has brought to light the fact that if one looked for a network with uniform thresholds for the neurons, the behavior of the network is strongly depent on the choice of the representation for the neural states. If one insists on representing neurons by two state variables, there is a clear advantage, in representing these states by (0,1) over (--1,1).

A more elaborate modification of the dynamics of the original toy model was required in order to account for the relatively low spike rates in the attractors observed in experiment. It required a more detailed treatment of the single neuron dynamics, arriving at a description of neurons in terms of coupled systems of afferent currents and efferent spike rates. This description has produced networks with attractors operating far below the saturation of the neurons composing the network. The modified networks, with low (arbitrary) coding levels and low spike rates preserved the main features of the original toy model: robust diversity of attractors, classifying stimuli auto-associatively. Two outstanding issues remained: auto-associativity and bi-modality.

The difficult problem of modifying a network to form attractors with correlations of the Miyashita type from uncorrelated stimuli learned in a fixed order, found a solution with a flavor of the Miyashita training scenario, at the level of the toy model. Indicating once again the usefulness of such models as drawing boards for new ideas. Auto-associative ANN's are based on the idea that the synaptic matrix codes for the correlations of activities of pairs of neurons as induced by a given afferent stimulus. The neural pair correlations in different stimuli are coded indepently of each other. It was shown that when synaptic modifications, induced by training on a sequence of stimuli presented in a fixed order, record { also} the correlations of the activities of pairs of neurons induced by one stimulus with that of its immediate predecessor in the sequence, then the resulting attractors display correlations of the Miyashita type. The resulting attractors, each classified by the uncorrelated stimulus that had been learned and that excites it, are correlated for as far as five apart in the training sequence. This theoretic result has manifested itself in a dramatic way, in that each persistent delay activity (attractor) has a finite similarity index with exactly five of the nearest stimuli in the sequence. It is just the approximate range of significant correlations observed in the experiment. But this surprising { five} was deduced in the rather artificial context of the toy model.

Note that two types of correlations enter this discussion, and they should not be confused: One is the correlations of activities of pairs of neurons, during the presentation of stimuli for learning. They drive the Hebbian learning. Then, when learning had generated the synaptic matrix, the network's dynamics is controlled by that matrix. In particular, this synaptic matrix determines the structure of the attractors (delay activity distributions). The correlations between these attractors are the second type. They are the ones measured by Miyashita.

The promise of the result was in that

  1. Synaptic information about single neighbor contiguity in the training sequence, i.e. the inclusion in the synaptic efficacies (learning) of activity correlations of pairs of neurons, one active as driven by a given stimulus and the other by the preceding one, was sufficient to induce correlations of the corresponding internal representations to a distance compatible with experiment;
  2. The attractor picture underlying the model makes the information about a preceding stimulus in a training sequence naturally available in the persistent attractor, at the time of presentation of the current stimulus. Recall that the network is supposed to code for activity correlations in successive { first} stimuli. Any two such stimuli are presented with a separation of many seconds. But, the existence of attractors allows the information of one, first, stimulus to be around for as long as is needed for the subsequent stimulus to be presented.;
  3. The form of the correlations (magnitude of coefficients and its rate of decay as function of separation in the training sequence) was found to be indepent (in a certain range) of the value of the contiguity amplitude parameter, which is the relative strength of the contribution to the synaptic efficacy due to neighboring images in the training sequence to the contribution due to each image separately.

[FIGURES AVAILABLE ONLY IN HARD COPY]

Figure 4. Confrontation with theoretical model (4000 neurons): Kal rank coefficients of attractors as a function of the separation of the stimuli in the training sequence. To each stimulus corresponds an attractor (reverberation), the one excited by it. The attractors are labeled by the serial position number (SPN) of the corresponding stimuli in the training sequence. The learned stimuli forming the synaptic matrix are uncorrelated, since the images presented for learning were uncorrelated. The activity distributions in the attractors are. The error bars are standard errors in the sample of neurons. a. Correlations for regular sample second curve from top, b. Correlations in enhanced sample, top curve. Full curve, model results; dotted curve, experiment.

In fact, the phenomenon persists, when the formal neurons are replaced by quasi-realistic, integrate-and-fire, neurons. An ANN operating with a synaptic matrix containing information on temporal contiguity in the training process, preserves the main features of the attractor correlations in the systems of discrete neurons. The correlation coefficients (Kal rank coefficients, as used in the experiments) of the network of realistic neurons , which also includes reactive, separated inhibition, agree quantitatively quite well with the measured values. Moreover, the more realistic model presents some additional features which brings the model even closer to the biological experience:

The attractors expressing the Miyashita correlations do not exhibit a simple bi-modal distribution of spike activity among the neurons in the assembly. All previous ANN models produced sharp bi-modality, either because the neurons were discrete, i.e. quiet or at saturation frequencies, or because the models implemented auto-associativity. Experiments manifest attractors, but not simple bi-modality. The model predicts a large, stimulus selective, peak at very low spike frequencies, and a wide distribution of rates among the active neurons. Consequently, the combination of realistic neurons and attractor correlations (i.e. the departure from auto-associativity) gives a potential response to the problem of the nature of the rate distribution.

To conclude this discussion we show one more measurement carried out both on the model and on the performing monkey. We show the distribution of activities produced in a given neuron in the reverberations provoked by the presentation of the complete set of learned stimuli. One sees the rate of spike emission by this neuron, in the delay period, for each of the stimuli plotted in the order in which they had been learned.

This is not the right context in which to develop a more detailed discussion of the confrontation of theory with experiment. The discussion has been opened here only to indicate that the insights gained by the interpretation of the Miyashita experiments in the language of attractor dynamics is accompanied by a candidate model which captures the experimental results to a very impressive degree of detail. Such a model can consequently serve as a drawing board for the development of future paradigms in cognitive psychology.

[FIGURES AVAILABLE ONLY IN HARD COPY]

Figure 5. Average delay discharge rate vs serial position separation on a given cell. (a) Fig. 3a (b) Model. This displays the level of activation of the particular neuron in the reverberation stimulated by each of the hundred stimuli in the learned sequence. The existence of two peaks indicates that this neuron participates in the representation of two uncorrelated stimuli. The side wings of each peak are due to the correlations of the attractors, developed in learning a fixed sequence.

8. Predictive theories

The attractor description as a language and as a set of models is a strongly predictive framework. It produces several detailed experimental predictions even prior to the elaboration of detailed models. The models enlarge the predictive set of commitments of the approach. The detailed studies lead to the following predictions:

  1. The appearance of the uncorrelated reverberations, either due to short training periods, or to training with random sequences of stimuli, should be accompanied by narrower distributions of spike rates.
  2. The relation between the coding rate of the stimuli, the fraction of the module's neurons driven into high rates when the stimulus is present, to the coding rate in the corresponding attractor;
  3. Statistics of the distribution of spike rates expressed on a given neuron by the entire set of stimuli memorized in the assembly (ibid);
  4. If pair associations are formed by the training process, then when the pair attractors become correlated, by partially ordering the pairs the correlations of the pair attractors will be very high and will ext far down the sequence.


    The list can be continued.

9. Some provisos and defensive outlook

The above picture sounds too good. It may still have to undergo modifications. The main exposed flanks we perceive at this stage are:

  1. It may be the case that the Miyashita correlated reverberations are not autonomous, in the sense that they are not really coded in the observed area, but reflect attractor activity in one or several other areas. Such modules could then drive afferently the module observed. This is a question which should be given serious attention;
  2. The specific predictions follow from models with a prescribed synaptic matrix. This may be too restrictive. The models represent in a plausible way the learning process hypothesized. What gives it some credence is the relative robustness of the results to variations in the structure of the synaptic matrix, provided the logic of learning a sequence of stimuli with fixed order is maintained. In particular, there is the robustness under variation of the contiguity amplitude parameter.
  3. The neural elements in the model networks may still be somewhat too idealized. This may be at the origin of some systematic differences in the details of the attractor correlation coefficients and rate distributions.

Even if some modifications are required, the reverberation picture is too fertile to be rapidly discarded. We have received it from Hebb aged some fifty years. It has only recently been given a direct empirical (neuro-physiological) dimension and has been owed with a precise mathematical model. One is tempted to start drawing a host of speculative conclusions from the new framework exposed by Miyashita's monkeys. The context provides fertile ground for adventures in cognitive psychology and even for some aspects of linguistics ranging from binding, which may be related to syntax, to priming, which may be extrapolated to semantics. It may even suggest a substrate for psychology itself.

Yet the lessons learned from these experiments include the one which advises restraint. It is just these experiments which indicate that our imagination concerning brain computation is still too much constrained by formal mathematics, by computer languages and by artificial intelligence. In this connection it is well worth recalling the wisdom of John Von Neumann, writing 40 years ago: {quote}

...Thus the outward forms of our mathematics are not absolutely relevant from the point of view of evaluating what the mathematical or logical language truly used by the central nervous system is... the above remarks about reliability and logical and mathematical depth prove that whatever the system is, it cannot fail to differ considerably from what we consciously and explicitly consider as mathematics. [The Computer and the Brain, Yale 1954 p. 82, emphases in the original]

It is most likely that atting for a while longer to the details of the contact between modeling and experiment would keep options open which a premature harvest of speculation would close.

Acknowledgments

I am indebted to Prof. Peter Hillman for a critical reading of an earlier version of this manuscript and to an anonymous referee who has helped me improve the paper very significantly.

References

Miyashita Y and Chang HS 1988 Neuronal correlate of pictorial short-term memory in the primate temporal cortex, Nature, { 331} 68

Miyashita Y 1988 Neuronal correlate of visual associative long-term memory in the primate temporal cortex, {\em Nature}, { 335} 817

Sakai K and Miyashita Y 1991 Neural organization for the long-term memory of paired associates, {\em Nature}, { 354} 152.

Amit DJ 1993 In defense of single electrode recordings, {\em NETWORK}, { 3} 385

McNaughton BL, Barnes CA and Andersen P 1981 Synaptic efficacy and EPSP summation in granule cells of rat fascia dentata in vitro, {\em J Neurophysiol} { 46} 952

Sayer RJ, Redman SJ, Andersen P 1989 Amplitude fluctuations in small EPSPs recorded from CA1 pyramidal cells in the guinea pig hippocampal slice, {\em J Neurosci} { 9} 840

Mason A, Nicoll A and Stratford K 1991 Synaptic transmission between individual pyramidal neurons of the rat visual cortex in vitro, {\em J Neurosci} { 11} 72

Tanaka K, 1992 Inferotemporal cortex and higher visual function, {\em Current Biology} {\em 2} 502

O'Keefe J and Speakman A Single unit activity in the rat hippocampus during a spatial memory task, {\em Exp. Brain Res.} { 68} 1

Amit DJ Brunel N and Tsodyks MV 1993 Correlations of Hebbian reverberations, {\em Jour. of Neurosci.} in press.

Hebb DO 1949 {\em The Organization of Behavior} (Wiley, NY)

Hebb DO and Donderi DC 1987 {\em Textbook of Psychology} fourth edition, (Laurence Erlbaum Ass, Hisdale NJ)

{deNo}Lorente de {\'No} 1949 in Fulton JF editor, {\em Physiology of the Nervous System} (Oxford University press, NY)

Fuster JM 1973 Behavioral electrophysiology of the prefrontal cortex, {\em J. Neuropysiol.}, { 36} 61

Niki H 1974 Prefrontal unit activity during delay alternation in the Monkey, {\em Brain Res.} { 68 } 185.

Braitenberg V and Schutz A 1991 {\em Anatomy of Cortex} (Berlin: Springer-Verlag)

Anisfeld M and Knapp M 1968 Association, synonymity, and directionality in false recognition. {\em Journal of Experimental Psychology} { 77}, 171

Damasio AR and Damasio H 1991 Cortical systems underlying knowledge retrieval: evidence from human lesion studies, Background manuscript for the Dahlem Conference on Exploring Brain Function: Models in Neuroscience, Berlin

Hopfield JJ 1982 Neural networks and physical systems with emergent selective computational abilities, {\em Proc. Natl. Acad. Sci. USA} { 79}, 2554

Amit DJ 1989 {\em Modeling Brain Function} (Cambridge University Press, NY)

Hoffman RE 1987 Computer simulations of neural information processing and the schizophrenia-mania dichotomy, {\em Archives of General Psychiatry} { 44} 178

Abeles M, Vaadia E and Bergman H 1990 Firing patterns of single units in the prefrontal cortex and neural-network models, {\em NETWORK}, { 1} 13

Virasoro AM 1988 Categorization in neural networks and prosopagnosia, {\em Physics Reports} { 184} 99

Tsodyks MV and Feigel'man MV 1988, The enhanced storage capacity in neural networks with low activity level, {\em Europhys. Lett.}, { 46} 101

Buhmann J, Divko R and Schulten K 1989, Associative memory with high information content, {\em Phys. Rev.} { A39} 2689

Sompolinsky H 1986 Neural networks with nonlinear synapses and static noise, {\em Phys. Rev.}, { A34}, 2571 and The theory of neural networks: The Hebb rule and beyond, in L. van Hemmen and I. Morgenstern eds. {\em Heidelberg Colloquium on Glassy Dynamics}, (Springer-Verlag, Heidelberg)

Amit DJ and Tsodyks MV 1991 Quantitative study of attractor neural network retrieving at low spike rates I: Substrate -- spikes, rates and neuronal gain {\em NETWORK} { 2} 259

Amit DJ and Tsodyks MV 1991 Quantitative study of attractor neural network retrieving at low spike rates II: Low-rate retrieval in symmetric networks {\em NETWORK} { 2} 275

Griniasty M, Tsodyks MV and Amit DJ 1993 Conversion off temporal correlations between stimuli to Spatial correlations between attractors, {\em Neural Computation} { 5} 1

Cugliandolo L 1994 Correlated attractors from uncorrelated stimuli, {\em Neural Computation} { 6} 220

On leave from Racah Institute of Physics, Hebrew University, Jerusalem