Below is the unedited penultimate draft of:
Halford, G., Wilson, W.H., & Phillips, S. (19XX). Processing Capacity Defined by Relational Complexity: Implications for Comparative, Developmental, and Cognitive Psychology Behavioral and Brain Sciences, XX (X): XXX-XXX.
The final published draft of the target article, commentaries and Author's Response currently available only in paper.
For information about subscribing or purchasing offprints of the published version, with commentaries and author's response, write to: journals_subscriptions@cup.org (North America) or journals_marketing@cup.cam.ac.uk (All other countries).

Processing Capacity Defined by Relational Complexity: Implications for Comparative, Developmental, and Cognitive Psychology

Graeme Halford
Department of Psychology
University of Queensland
4072 AUSTRALIA
gsh@psy.uq.oz.au

William H. Wilson
School of Computer Science and Engineering
University of New South Wales
Sydney
New South Wales
2052 AUSTRALIA
billw@cse.unsw.edu.au
http://www.cse.unsw.edu.au/~billw/

Steven Phillips
Cognitive Science Section
Electrotechnical Laboratory
1-1-4 Umezono
Tsukuba
Ibaraki 305
JAPAN
stevep@etl.go.jp
http://www.etl.go.jp/etl/ninchi/stevep@etl.go.jp/welcome.html

Keywords

Capacity, complexity, working memory, central executive, resource, cognitive development, comparative psychology, neural nets, representation of relations, chunking

Abstract

It is argued that working memory limitations are best defined in terms of the complexity of relations that can be processed in parallel. Relational complexity is related to processing loads in problem solving, and discriminates between higher animal species, as well as between children of different ages. Complexity is defined by the number of dimensions, or sources of variation, that are related. A unary relation has one argument and one source of variation, because its argument can be instantiated in only one way at a time. A binary relation has two arguments, and two sources of variation, because two argument instantiations are possible at once. Similarly, a ternary relation is three dimensional, a quaternary relation is four dimensional, and so on. Dimensionality is related to number of chunks, because both attributes on dimensions and chunks are independent units of information of arbitrary size. Empirical studies of working memory limitations indicate a soft limit which corresponds to processing one quaternary relation in parallel. More complex concepts are processed by segmentation or conceptual chunking. Segmentation entails breaking tasks into components which do not exceed processing capacity, and which are processed serially. Conceptual chunking entails "collapsing" representations to reduce their dimensionality and consequently their processing load, but at the cost of making some relational information inaccessible. Parallel distributed processing implementations of relational representations show that relations with more arguments entail a higher computational cost, which corresponds to empirical observations of higher processing loads in humans. Empirical evidence is presented that relational complexity discriminates between higher species, is related to processing load in reasoning and in sentence comprehension, and that the complexity of relations processed by children increases with age. Implications are considered for neural net models, and for theories of cognition and cognitive development.

1.0

We propose that information processing capacity limitations in humans and higher animals should be defined, not in terms of items, but by the complexity of relations that can be processed in parallel. The core argument will be that relational complexity is associated with processing load in human problem solving, with age differences in children's understanding of concepts, and differences between higher animal species. It is not argued of course that relational complexity is the only factor that influences difficulty, because domain expertise, skill with problem solving heuristics, memory availability, and perceptuo-motor factors are also important. Nevertheless, relational complexity is essential to the explanation of some established findings spanning a wide range of literature. We will also explore possible explanations for this limitation in neural net (parallel distributed processing, connectionist) models of cognition. First we will consider previous approaches to working memory, then present our own formulation.

1.1 Processing capacity and working memory

Processing capacity has often been treated as working memory capacity, defined as information that is stored for later processing (Hitch, 1980). Consequently capacity limitations have been defined in terms of number of items, or units of information (Miller, 1956). However storage and processing functions of working memory are partly distinct, because short term storage of information does not necessarily interfere with concurrent processing (Baddeley & Hitch, 1974; Baddeley, 1986; 1990; Halford, 1993; Halford, Bain, & Maybery, 1984; Halford, Maybery, O'Hare, & Grant, 1994; Klapp, Marshburn, & Lester, 1983). Because of this, Baddeley (1986) postulated three systems, a visuo-spatial scratchpad, a phonological loop, and a central executive. The first is concerned with storage of visual-imaginal information, while the second is involved in short-term serial recall or "memory span" tasks. Actual processing is the function of the central executive.

The distinction between information that is stored for later processing, and information that is actually being processed, can be illustrated by the task of mentally adding 79 and 86. The storage of a partial result for later processing that would occur in this task is legitimately called working memory, because the term can include storage for later process. However such storage is distinct from actual processing, in which information constrains a decision, and is not simply stored. For example, when we ask "what number results from adding 9 to 6?" we are not merely storing the addends, but are using them to carry out a computation, and they constrain our decision.

Working memory capacity has also been defined in terms of limits on activation (Anderson, Reder, & Lebiere, 1996; Just, Carpenter, & Hemphill, 1996), but this does not provide a general metric for processing complexity. Just et al. (1996) assess complexity in terms of the number of new goals generated, and Case (1985; 1992) uses a similar metric, based on number of embedded goals in a control structure. However, as explained in 6.1.3, the relational complexity metric can subsume the levels of embedding metric and applies more widely. Working memory capacity of children has been assessed using tests that combine processing and storage (Case, 1985; 1992; Pascual-Leone, 1970) but this makes it difficult to assess whether successful prediction is due to processing or storage components of working memory. Anderson et al. (1996) assess processing complexity by the number of symbols in an equation, but this does not necessarily reflect the difficulty of the underlying processes, and does not provide a metric that is general to different types of tasks.

Storage complexity can be measured relatively directly, by number of items or chunks stored as, for example, in memory span tests. Computational complexity of algorithms can also be measured (Garey & Johnson, 1979; Tsotsos, 1990), but as will be discussed in Section 5, it is not clear that this can be translated into a metric for processing complexity in human cognition. The problem can however be approached in the following way. Any cognitive process can be expressed as a function which maps input(s) to output(s). Capacity to perform the process corresponds to capacity to compute the function. A function is a special case of a relation (see 2.3.2), so relational complexity has potential as a measure of processing demand. With processing capacity defined this way, it is not just the number of items or amount of information, but the relations between entities, that is the structure, that is the limiting factor. This was first realised by assessment of the empirical cognition literature, discussed in Sections 3 and 6. However two approaches to modeling higher cognitive processes in neural nets (Halford et al., 1994; Shastri & Ajjanagadde, 1993a) have independently identified a limitation that is consistent with this one.

We are concerned with processing capacity in higher cognitive processes including reasoning, memory operations and language comprehension. Processes such as vision which are performed by modules have higher processing capacity, but are specialised for particular types of input, such as information represented on the retina (Fodor, 1983; Fodor & Pylyshyn, 1988). Modular processes do not impose measurable processing demands (as defined in 2.1), and are not associated with individual differences in intelligence (Anderson, 1992). Modular processes cannot be greatly modified by higher cognitive processes, so thought cannot be used to "re-program" the visual system. By contrast, higher cognitive processes can be modified "on line" and the way a task is performed can be influenced strategically without relearning (Clark & Karmiloff-Smith, 1993). We have attempted to provide a principled account of higher cognitive processes by identifying them with the properties of relational knowledge, defined in 2.2, and we propose that the processing limitations we are considering apply to cognitive functions that have these properties.

2.0 Relations and Processing Demand

The way relational complexity influences processing load can be illustrated by the difficulty of the following sentence:

"The boy the girl the man saw met slept." (1)

The problem here does not reside solely with storage, either of the original sentence or the results of partial processing, but also reflects the amount of information about which decisions must be made. This sentence entails an integrated structure, discussed in 6.1.4, that requires boy, girl, and man to be assigned to roles corresponding to subjects of three verbs and objects of two verbs. Strategies are not generally available to English speakers for serial processing of centre-embedded clauses, and the relative lack of semantic cues or syntactic case markers means people receive little help in deciding which nouns fill which case roles. The result is that we have to decide who saw, who met, and who slept, and identify the objects of "saw" and "met", all together, because we cannot positively identify subjects or objects of any of the verbs until we comprehend the whole sentence. Sentences with this type of reduced relative clause are known to impose high resource demands (Just & Carpenter, 1992; Kimball, 1973).

Another example is provided by Sweller (1993) who analysed the following problem: Suppose five days after the day before yesterday is Friday. What day of the week is tomorrow? Despite our expertise in reasoning about days of the week, this problem is frustratingly difficult. The reason is that, especially in the first sentence, numerous elements are related to each other and cannot be considered meaningfully in isolation. These relations have to be at least partially processed in order to segment the statement into subproblems that can be processed serially. The processing load is felt most keenly when we try to plan this procedure.

The processing load imposed by interacting components of a task can be captured with the concept of relational complexity. We will begin by considering relations between different numbers of factors. At a low level of complexity, consider a case where a cognitive process is constrained by a single factor; for example, our choice of restaurant might depend on the amount of money we have. We can express this as a binary relation, between money and restaurant-choice; that is, a set of ordered pairs, in which each amount of money is associated with a particular restaurant (or with a subset of restaurants). The money-restaurant relation could be modified by another factor, such as importance: The more important the occasion the more expensive our choice of restaurant, though importance might have more influence when we have plenty of money than when we have little. Here we have an interaction between two determining factors. This situation can be represented as a ternary relation, comprising a set of ordered triples in which each amount of money, and each level of importance, is associated with a restaurant. These variables could in turn interact with a third factor, such as hunger, which might make us profligate, but only when we are not really poor. We now have an interaction between three determining factors. This can be expressed as a quaternary relation, comprising a set of ordered 4-tuples, in which each amount of money, level of importance, and state of hunger, is associated with a restaurant choice.

It is clear that the problem becomes more complex as the number of interacting factors increases. This complexity can be measured by the dimensionality of the relation, or number of variables that are related. Problems which entail a binary relation are simpler than those that entail a ternary relation, which are simpler than those which entail a quaternary relation, and so on. The idea of relational complexity is analogous to the number of factors in an experimental design. An experimental design can be thought of as a set of relations between independent and dependent variables. A one-way experimental design is equivalent to a binary relation between one independent and one dependent variable. A two-way experimental design is equivalent to a ternary relation, between two independent and one dependent variables. Experimental designs with more factors permit more complex interactions, but at the cost of more observations (participants) being required. This is analogous to the processing load imposed by problems of high relational complexity.

The complexity of a relation may be defined by the number of arguments. A binary relation has two arguments: For example, the binary relation "bigger-than" has two arguments, a larger and a smaller entity. A ternary relation has three arguments: For example, "love-triangle" is a ternary relation, and has arguments comprising three people, two of whom love a third. Quaternary relations have four arguments, and so on. Each argument can be instantiated in more than one way. For example, each argument of "bigger-than" can be instantiated in an infinity of ways, such as bigger-than(horse,mouse), . . ., bigger-than(whale,dolphin), . . , and so on. Consequently, each argument provides a source of variation, or dimension, and thereby makes a contribution to the complexity of the relation. Our next step is to link relational complexity to the concepts of demand (or load) and resources which have been commonly used to account for performance limitations in cognitive psychology.

2.1. Processing complexity

This may depend on the strategy used by the person in a particular set of circumstances, because different performers use different strategies, and even the same person uses different strategies at different times. The optimum cognitive strategy for real human performers may not correspond to the algorithm that is best theoretically, or to any algorithm that would be used in artificial intelligence, because human cognition operates in ways that are very different in some respects from theoretically optimum algorithms. Strategies can be constrained to some extent by experimental procedures, but if this cannot be done with confidence then the strategy used must be determined by empirical investigation, a point to which we will return in section 6.0. Therefore processing complexity measures need to be specific to the actual cognitive process used.

2.1.1. Complexity of a cognitive process

is the number of interacting variables that must be represented in parallel to perform that process. Complexity of processing may also vary over time within one task, so the critical value is the complexity of the most complex step. Tasks can vary in the number of steps they require but this does not necessarily affect processing load, because a task with many steps might impose only a low demand for resources at any one time (e.g., counting peas in a box). Processing demand can be high where steps are embedded in a hierarchical structure, but this entails higher-order relations, and relational complexity is also high (to be discussed in 2.2.5).

2.1.2. Processing complexity of a task

is the number of interacting variables that must be represented in parallel to perform the most complex process entailed in the task, using the least demanding strategy available to humans for that task.

2.1.3. Processing demand

is the effect exerted by task complexity on a performer, and it reflects the cognitive resources required to perform a task. The core proposal of this paper is that demand is a function of relational complexity. That is, the more interacting variables that have to be processed in parallel, the higher demand will be. Demand is synonymous with "load" and "effort" and the three terms tend to be used interchangeably in the psychological literature. It is the psychological counterpart of computational cost which will be discussed in Section 5. Demand can be manipulated experimentally, with other aspects of the task controlled, and a number of examples of this will be considered in Section 6.

2.1.4. Resources

allocated to a task vary as a function of demand and performance. More resources must be allocated to higher demand tasks to maintain performance. The methodology for dealing with demand and resources is now highly developed, and is reviewed by Gopher (1994) and Halford (1993, Chapter 3). Resources utilized can be measured by physiological arousal indicators (Kahneman, 1973), by neural activity assessed using brain imaging techniques (Carpenter & Just, 1996), by subjective feeling of effort assessed through self-report, and by decrement in competing tasks (Navon & Gopher, 1980). Resources invested by an individual in a given task will vary over time as a function of the conditions of performance.

2.1.5. Processing capacity

is the limit of resources available. It will vary across individuals and may change over the lifespan (discussed in 6.3). Within a short time frame it is essentially constant, but can be influenced by factors such as physiological state, diurnal rhythms and drugs.

In order to put this argument on a more formal basis, we will consider the nature of relational knowledge.

2.2 Relational knowledge

Given that processing complexity can be related to number of arguments of a relation, the nature of relational knowledge becomes essential to the theory of processing capacity. Phillips, Halford and Wilson (1995; submitted) have argued, on the basis of the work of Codd (1990), that essential properties of higher cognitive processes, including data structures such as lists and trees, can be captured by a model based on processing relations. Our proposed explanation for capacity limitations in higher cognitive processes depends on the complexity of neural net models of relational knowledge, which are considered in Section 4. Therefore we need to specify the properties that relational knowledge must have for a neural net model to be considered adequate. For our present purposes it will be appropriate to say that relational knowledge consists of relational schemas, which we will now define.

2.2.1 Relational schemas

are cognitive representations that include elements and relations between elements, and represent situations or activities in the world. In general, an n-ary relation R is a subset of the cartesian product of n sets: S1x S2x... xSn. Thus if (a1,a2,...,an) [propersubset] R we say that r(a1,a2,...,an) holds; for example, (cat, mouse) [propersubset] LARGER-THAN signifies larger-than(cat, mouse). An n-ary relation comprises a set of n-tuples, where each tuple is a relational instance. We shall refer to R and "larger-than" as relation-symbols. Tuples like (a1,a2,...,an) and (cat, mouse) we refer to as relational instances. However, it may not be clear, for example, whether (cat, mouse) is being considered as an instance of the relation "larger-than" or of the relation "chases". For this reason, we frequently use the term relational instance for an expression of the form r(a1,a2,...,an) or larger-than(cat, mouse). Strictly speaking, r(a1,a2,...,an), larger-than(cat, mouse), and larger-than(mouse, cat) are propositions (see section 2.2.2). Thus we use the term "relational instance" to refer both to the tuple, and to the tuple labelled with the relation symbol. This is not an uncommon practice in cognitive science. Where it would be unclear whether r(a1,a2,...,an) is being considered as a proposition or a relational instance, we shall refer to (for example) "the relational instance r(a1,a2,...,an)".

2.2.1.1. Representation of a relation

requires a symbol to specify the relation R, a representation of the arguments a1,a2,...,an, and a set of bindings between symbol and arguments that maintain the truth of the relation. The binding must constrain the fillers for each argument role so that appropriate members of the cartesian product are bound. For example, in bigger-than(-,-), it is conventional for the entity in the first argument position to be bigger than the argument in the second argument position. Thus bigger-than(cat,mouse), bigger-than(mountain,molehill) should be bound but not bigger-than(mouse,cat). The symbol and arguments must retain their identity in the binding. That is, they must not be fused into a whole in which the components cannot be identified.

Representations of relations are valid when they conform to the structural correspondence principle, which holds that relations in the representation must correspond to relations in the world. Formally:

A relational schema comprising elements Es and relations Rs corresponds to an aspect of the world with elements Ew and relations Rw if there exists a function f that assigns each member of ES in the schema to a member of Ew in the world, in such a way that for any rs(,,...) in the schema (rs [propersubset] Rs, [propersubset] Es) there is an rw(,,...) in the world (rw [propersubset] Rw, = f ()). These criteria have been specified in more detail by Halford and Wilson (1980) and by Holland, Holyoak, Nisbett and Thagard (1986) and their neural net implementation has been specified by Halford, Wilson, et al. (1994). Structural correspondence is a soft constraint and in some cognitive models it can be overridden by other constraints if of sufficient strength (e.g., Hummel & Holyoak, in press). However learning and induction processes will tend to bring relational schemas into correspondence with the aspect of the world they represent (Holland et al., 1986). This can be done by reducing the strength of representations with inappropriate bindings.

2.2.1.2. Representation of a relational instance

r(a1,a2,...,an) requires a binding between the relation symbol r, and the fillers ai, each bound to one argument role. Thus bigger-than(whale,dolphin) is a relational instance, whereas the bigger-than relation includes this instance plus appropriate others such as bigger-than(cat,mouse), bigger-than(mountain,molehill) etc.

A relational instance in isolation can be represented by specifying the relation symbol r plus each of the role-filler bindings. The relational instance loves(John,Mary) requires a representation of "loves" plus a binding of John to the lover role and Mary to the loved role. We can write this as:

loves + lover.John + loved.Mary

Here the "." symbol signifies the role-filler binding and the "+" serves to concatenate the bindings to the relation-symbol.

This is not sufficient to represent a relation (as distinct from relational instances) because ambiguity occurs as soon as more than one relational instance is represented. Suppose we now add the instance loves(Tom,Wendy), represented as:

loves + lover.Tom + loved.Wendy

When we put the representations of both relational instances together we have:

loves + lover.John + loved.Mary + loves + lover.Tom + loved.Wendy

This represents the fact that John and Tom are lovers and that Mary and Wendy are loved, but it does not distinguish between John loving Mary and John loving Wendy, and is similarly ambiguous with respect to Tom.

The ambiguity can be removed by indicating that John and Mary belong to one instance of loves and Tom and Wendy to another. One way is to index each instance with a unique identifier. This in effect separates (or brackets) the "location" of the representation of each instance. For example, the loves relation could be represented as:

1.(loves + lover.John + loved.Mary) +

2.(loves + lover.Tom + loved.Wendy).

One implication of this method is that, in general, the index does not indicate its contents (the index "1" does not indicate that John and Mary are involved). Thus, potentially all instances must be processed to determine the filler for a given role (in the worst case one may have to search all instances to find the one in which John is the lover of Mary).

An alternative approach is to define each relational instance as a unique n-tuple. This can be done by representing bindings between the relation symbol and the fillers for each argument. For example, we can represent:

loves.John.Mary + loves.Tom.Wendy

This representation is based on symbol-argument-argument bindings, each of which comprises an intact relational instance, together with the symbol for the relation ("loves" in this case). The binding between a symbol and its arguments represents the link specified by the relational instance. Thus representing the tuple loves.John.Mary identifies this relational instance unambiguously, and directly represents the link, identified as "loves", between John and Mary.

2.2.1.3. Relations and working memory.

Working memory has often been associated with storage of items of information, as in span tasks, which entail storing strings of items. Therefore it might appear that working memory would store relational instances, that is interconnected sets of items, rather than relations. In some cases this would be sufficient. For example, to understand "John loves Mary" we need only represent relational instance loves(John,Mary). However in some cases working memory entails variables. Consider, for example, reasoning about velocity, defined as V= s/t (where s is distance and t is time). We know that, for example, if we cover the same distance in half the time, velocity is doubled, even though we know no specific values. Thus we are reasoning about the interaction of three variables, velocity, distance and time, and we are dealing with a ternary relation, rather than a relational instance.

2.2.2 A proposition

is defined as the smallest unit of knowledge that can have a truth value. If the proposition is true, then there will be a corresponding relational instance. For example, the proposition bigger-than(cat, mouse) is an instance of the bigger-than relation. However a proposition need not be true, and a proposition that is false is not a relational instance: bigger-than(mouse,cat) is false, and is not an instance of bigger-than as defined above.

True propositions and false propositions correspond to different subsets of the cartesian product. The relation > is a subset of S1xS2: the subset {(a,b) | a [propersubset] S1, b [propersubset] S2 and a > b}. Consider the false proposition >(mouse,cat). The pair (mouse, cat) is not a part of the relation >, (there are no cases of mice being bigger than cats) but the proposition >(mouse,cat) is representable, since (mouse, cat) [propersubset] S1xS2. Therefore false propositions can be represented, and correspond to subsets of the cartesian product. Learning and induction will tend to weaken representations of false propositions, and will tend to incorporate semantic constraints. Thus a proposition such as owns(car,Tom) will tend to be excluded. However, false propositions do occur in real cognitive processes and provision must be made for them to be represented.

2.2.3 Truth value

of a proposition can be assessed by matching against semantic memory, using a mechanism described in 4.2.1. Truth value can be represented as a higher-order relational instance. For example: false(bark(cats)), meaning that it is false that cats bark. There is a psychological bias to represent true propositions. For example, in mental models theory (Johnson-Laird, Byrne and Schaeken, 1992) only true contingencies are represented to reduce load on working memory.

Quantifiers are not explicitly represented in relational schemas or in mental models, but the closest psychological property is strength. A strong proposition is one that has a high probability of being true (propositions like "dogs are bigger than cats" that have no definite truth value, in the sense that they are not universally either true or false.[1])

2.2.4 Symbolisation

means that a link between the arguments of a relation or relational instance is explicitly symbolised (e.g., one link between "whale" and "dolphin" is explicitly symbolised by "bigger-than"). The link is one of the requirements for identifying a relational instance, which in turn is necessary for a relational instance to be an argument to another relation, thereby forming higher-order relations. It also allows us to distinguish between a number of different links between the same argument sequence.

2.2.5 Higher-order relations and hierarchical structures

. Higher-order relations have relational instances as arguments, whereas first order relations have entities as arguments. Higher-order relations can represent connectives; for example implies(>(a,b),<(b,a)) or and(dog(Fido),pet(Fido)). Higher-order relations can be used to represent hierarchical structures, for example: cause(shout-at(John,Tom),hit(Tom,John)).

The repeated variable constraint operates with hierarchical structures. In cause(shout-at(x,y),hit(y,x)) x and y must be bound to the same entities in both cases. For example: cause(shout-at(John,Tom), hit(Tom,John)) has the required structure but cause(shout-at(John,Tom), hit(John,Tom)) does not. The repeated variable constraint is implemented by ensuring that the first relational instance is in structural correspondence with cause(shout-at(x,y),hit(y,x)). Notice that a mapping that conforms to the structural correspondence principle (in 2.2.1.1) can be formed between them, whereas for the second relational instance such a mapping would be inconsistent (John and Tom must each be mapped to both x and y).

2.2.6 Omni-directional access

means that, given all but one of the components of a relational instance, we can access (i.e. retrieve) the remaining component. For example, given the relational instance mother-of(woman,child), and given mother-of(woman,?) we can access "child", whereas given mother-of(?,child) we can access "woman", and given ?(woman,child) we can access "mother-of".

Omni-directional access is also potentially true of relations. A relational instance r(a1,a2,...,an) contains n+1 objects - the relation symbol and the n arguments. Given any n components, we can retrieve one or more candidates for the n+1st component. However where more than one relational instance is represented, the answers obtained will not always be unique. For example, given the relation "mother-of", the query mother-of(woman,?) may yield child, toddler, infant, baby, teenager, etc. Access may not be equally efficient in each direction. For example, arithmetic addition corresponds to the ternary relation +{ . . (3,2,5), . . ,(5,4,9), . . }. It might be easier to access a sum given two addends, that is +(3,2,?), than to access an addend given the sum and the other addend, that is +(3,?,5), but access in both directions is possible. Having learned our addition tables we can perform subtraction, but perhaps not as efficiently as addition. Another example from the psychological literature is discussed in 6.2.5.1.

2.2.7 Role representation

means that relational knowledge entails representation of argument roles or slots, independent of specific instances. Thus bigger-than(-,-) entails representation of roles for a larger and a smaller entity. Roles must be distinguishable but they need not be represented explicitly: the role an argument fills can be identified by its position relative to the other arguments. Thus in "loves.John.Mary" (discussed in 2.2.1.2) we know John is in the lover role by his position in the binding.[2] The argument positions in a proposition correspond to the sets indicated by their position in the cartesian product. Given the relational instance r(a1,a2,...,an) for a relational R that is a subset of S1x S2x... xSn, each ai corresponds to a given Si.

2.2.8 Operations on relations

are adapted from those defined in the theory of relational databases (Codd, 1990). They include select, project and join, plus the usual set operators intersection, union and difference (Phillips, et al, 1995). These operations permit information stored in relational knowledge structures to be accessed and manipulated in flexible and powerful ways. The select and project operations together permit access to any element within a relation. Informally, if one thinks of a relation as a table, where rows correspond to relational instances and column names correspond to the role names, then select and project return rows and columns of a table, respectively. The join operation corresponds to combining two tables at a specified pair(s) of columns.

The operators are best described by example. Suppose the relation:

Taller = {(John,Mary),(Mary,Tom)}, where Taller is a subset of Person x Person. Formally, a select operation parameterized with condition C, takes a relation R and returns a new relation R' such that all instances of R' satisfy C. For example, select<Person1,John> Taller -> {(John,Mary)}, returns a single instance binary relation with John at role Person1. A project operation parameterized with attribute (role) name(s) A takes a relation R and returns only arguments at attribute A for each instance in R. For example, project<Person2> Taller -> {(Mary),(Tom)} (i.e., a unary relation with two instances). Taken together, select and project provide omni-directional access to all relational elements.

For example, the query "who is taller than Mary" is realised as:

project<Person1>(select<Person2=Mary> Taller)) -> {(John)}, which we write as Taller(-,Mary) -> John, for short.

There are a number of different join operators which take two relations and return a new relation. The outer (or natural) join is analogous to the cartesian product. It returns a relation containing every unique pairwise combination of instances from the argument relations. In this way it permits the construction of higher arity relations - the outer join of two unary relations is a binary relation. Of more interest for our purposes is the equi-join operator, which only joins instances having the same arguments at the specified roles. It provides a way of implementing transitive inference.

For example, equi-join<Person2,Person1>(Taller, Taller) -> {(John,Mary,Tom)} is an equi-join of the relation Taller with itself along roles Person2 and Person1. Projecting onto the first and third positions of the resulting relation results in {(John,Tom)} (i.e., the inference "John is taller than Tom"). Finally, since relations are sets the standard set operators intersect, union and difference apply in the usual way.

2.2.9 Decomposability of relations

means that relations can be composed of simpler relations. A decomposable relation is one which can be written as a conjunct of instances of relations of lower arities. For example, the ternary relation monotonically-larger(a,b,c) can be decomposed into >(a,b) & >(b,c) & >(a,c). This is discussed in more detail in Appendix A. Relations can be decomposed using operators select and project defined in 2.2.8.

2.2.10 Relational systematicity

means that certain relations imply other relations. For example >(a,b) -> <(b,a), whereas sells(seller,buyer,object) -> buys(buyer,seller,object). Implication can be handled as a higher-order relation as noted in 2.2.5.

2.2.11 An attribute

is a relation with one argument. An attribute value is an instance of a unary relation (e.g., ripe(apple23) indicates that apple23 satisfies the unary relation ripe).

2.2.12 Analogy, planning and modifiability

. Analogy is a structure-preserving map between a base or source and a target, and representation of relations is at the core of analogies (Gentner, 1983; Gick & Holyoak, 1983; Holyoak & Thagard, 1989). Planning is the main process in strategy development, and entails the organization of a sequence of actions to achieve a goal. Planning has been explicitly modeled by VanLehn and Brown (1980), Greeno, Riley and Gelman (1984) and Halford, Smith, Dickson, Maybery, Kelly, Bain, and Stewart (1995). The development of the strategy is guided by a concept or mental model of the task, which entails representing relations between components of the task.

Higher cognitive processes can be modified "on-line" without necessarily relearning all over again. A performance which distinguishes between higher and lower animal species is the ability to acquire the reversal learning set, that is having learned to choose A in preference to B, the animal must switch to B without relearning (Bitterman, 1960; Bitterman, 1975). If the animal learns the exclusion relation between A and B (i.e. one and only one of A and B is correct) the reversal can be effected by changing the mapping of the stimuli into the relational schema, so the stimulus that was formerly mapped to the positive element of the schema is remapped to the negative element, and vice verse (Halford, 1993). Clark and Karmiloff-Smith (1993) have pointed out that modifiability is a criterial attribute of human cognition. Relational representations can achieve this by switching between relations, because when a relation is changed mappings between input and output change. For example, the binary operation of arithmetical addition, a ternary relation, entails a set of mappings between addends and sum, +{ . . (3,2 -> 5), . . (4,7 -> 11) . . }. Multiplication entails a different set of mappings *{ . . (3,2 -> 6), . . (4,7 -> 28) . . }. A switch to a different relation activates a different set of mappings and modifies the performance.

Relational knowledge is symbolic, content-independent, flexible and modifiable, and can serve the functions of higher cognitive processes. Our next step is to consider the psychological properties that are associated with different levels of relational complexity.

2.3 Psychological interpretations of orders of relational complexity

Each level of relational complexity, from unary to quaternary, corresponds to a distinct category of cognitive tasks. Empirical indicators for each level will be considered in Section 6.

2.3.1 A unary relation

has relational instances r(x) that can be interpreted as propositions with one argument, as variable-constant bindings, or as zero-variate functions. A proposition with one argument can represent a state, such as happy(John), an action, such as ran(Tom), an attribute, such as big(dog), or class membership, such as dog(Fido).

2.3.2 Binary relations

have relational instances r(x,y) that can sometimes be interpreted as univariate functions; f(a) = b is a special case of a binary relation, in which the mappings are unique; it is a set of ordered pairs, (a,b) such that for each a there is precisely one b such that (a,b) [propersubset] f. A unary operator is a special case of a univariate function; for example, the unary operator change-sign comprises the set of ordered pairs {(x, -x)}.

2.3.3 Ternary relations

have relational instances r(x,y,z) that can be bivariate functions, and binary operations. A bivariate function is a special case of a ternary relation. It is a set of ordered triples (a,b,c) such that for each (a,b) there is precisely one c such that (a,b,c) [propersubset] f. A binary operation is a special case of a bivariate function: a binary operation on a set S is a function from the set S x S of ordered pairs of elements of S into S; i.e. S x S -> S. For example, the binary operation of arithmetic addition consists of the set of ordered pairs of {. . , (3,2,5), . . , (5,3,8), . . , . . }; i.e. {(x,y,z) | x + y = z, x,y,z (say) natural numbers}.

With a ternary relation, it is possible to seek x such that r(x,y,z) is a relational instance given y,z , and similarly for y given x,z , or z given x,y. It thus becomes possible to compute the effects on x of variations in y,z and so on. This emergence of three-way comparisons in ternary relations is analogous to the emergence of interactions in two-way experimental designs.

2.3.4 Quaternary relations

have relational instances of the form r(w,x,y,z). Proportion, a/b = c/d, is a quaternary relation, and entails relations between the four terms, a,b,c,d. Given any three terms, plus the knowledge that proportion is entailed, we can predict the remaining term or, given all of a,b,c,d, we can decide whether proportion is entailed (a case of omni-directional access, see 2.2.6). With a quaternary relation all the comparisons that are possible with ternary relations can be made, as well as four-way comparisons; the effect on w of variations in x,y,z, the effects on x of variations in w,y,z, and so on.

Quaternary relations also encompass trivariate functions and ternary operations. A trivariate function is a special case of a quaternary relation. It is a set of ordered 4-tuples (a,b,c,d) such that for each (a,b,c) there is precisely one d such that (a,b,c,d) [propersubset] f. Compositions of binary operations, such as a(b + c) = d correspond to quaternary relations.

2.3.5 Dimensionality of relations

: each argument xi of R(x1, . . , xn) for an n-ary relation R can be instantiated in more than one way, and therefore represents a source of variation, or dimension. An n-ary relation may be thought of as a set of points in n-dimensional space, and can represent interaction between n variables. The number of arguments, n, corresponds to the number of dimensions in the space defined by the relation. This is the basis for our proposed complexity metric. The relation symbol can be predicted in principle from the arguments; for example, given ?(3,2,5), where ? is known to be an arithmetic operation, we know the operation must be addition, whereas given *(?,2,6) we know the first multiplicand must be 3, and so on. Prediction of the relation symbol may depend on constraints, such as knowing the relevant domain, as in this example where it depended on knowing the domain was arithmetic operations. A suitable set of relational instances must be available, and in the worst case all relational instances must be known. Because the relation symbol can be predicted, at least in principle, from the arguments, there are only n independent sources of variation in an n-ary relation, and the number of dimensions equals the number of arguments.

Algorithms that embody these properties will be discussed in Section 4. However our next step is to define processing capacity in terms of relational complexity.

3.0 Processing capacity

The amount of information that can be processed in parallel has long been recognized as a critically important datum in cognitive psychology. The most notable attempt to estimate this parameter was made by Miller (1956), who suggested that human capacity was limited to a small number of chunks.

3.1 Chunks

Miller's (1956) concept of a chunk may be defined as a unit of information of arbitrary size, so a digit, an alphabetic character, and an English word may all constitute one chunk, although they vary in information content. The paradox is that the limitation seems to be, not in the amount of information, but in the number of independent units.

3.2 Chunks and dimensions

There is some correspondence between the properties of chunks and the properties of dimensions. Each chunk is a separate signal and fills a separate slot in a message. Attributes or values on different dimensions are at least partly independent of each other, by definition (even nonorthogonal dimensions must convey some independent information, or they are redundant).[3] Thus chunks, like dimensions, represent units of information that are at least partly independent. Their similarity can be illustrated by the proposition played(John,cricket,oval,Sunday) which has four roles or slots corresponding to player, game, location and day. It seems equally appropriate to regard each filler of these roles as a different chunk, or as a value on a different dimension.

The amount of information (in the information theory sense, Attneave, 1959) conveyed by a chunk depends on the number of alternatives for that slot. For example, "cat" conveys Log22 = 1 bit of information if there are two equally likely alternatives (e.g., cat or dog). If however there were 32 (equally likely) alternatives, the chunk "cat" conveys Log232 = 5 bits. An attribute on a dimension also represents varying amounts of information, depending on the number of alternative values on that dimension. Therefore the amount of information conveyed by a chunk or dimension is arbitrary. Thus both chunks and dimensions are independent units of information of arbitrary size.

3.3. Number of dimensions processed in parallel

Given the link between dimensions and chunks in 3.2, the number of dimensions that can be processed in parallel can be estimated by determining the number of chunks that can be processed in parallel. Miller (1956) proposed that approximately seven chunks were processed in parallel, but difficulties have arisen with his empirical data base (Baddeley, 1986; Halford, Maybery, & Bain, 1988; Schweickert & Boruff, 1986).

More recent research has revised Miller's estimate downwards. Broadbent (1975) examined temporal patterns in recall from semantic memory, and found that items tended to be recalled in groups of approximately three. He suggested that the "magical number seven" proposed by Miller might have reflected the combined output of two systems, each with a capacity of 3-4 items. Fisher (1984) studied visual scanning and found a modal value of four items processed in parallel, with a range of three to five. Halford, Maybery and Bain (1988) assessed the capacity of primary memory, or the information that is currently active (James, 1890) at 4-5 items. A number of other studies reviewed by Schneider and Detweiler (1987) also indicate that approximately four chunks are processed in parallel. Given the identification of chunks with dimensions in 3.2, this implies that approximately four dimensions can be processed in parallel, and humans should be limited to processing quaternary relations in parallel. Most studies indicate a range of values, indicating that this should be a soft limit. Neural net models of relations agree in predicting a soft limit, as will be discussed in Section 5.

An attempt has been made to assess the number of dimensions that can be processed in parallel using interpretation of interactions, because the factors in an interaction cannot be interpreted meaningfully alone, and so there is a constraint to process them in parallel (Halford, Wilson, et al., 1994). An interaction between N factors corresponds to a relation between N independent variables and the dependent variable, as discussed in 2.0, so ability to process four dimensions implies, prima facie, understanding of three-way interactions. Academic staff and graduate students who were experienced in interpreting statistical interactions were asked what was the most complex interaction they could interpret unambiguously and with confidence, ignoring scale effects and nonlinearity. Ten answered 2-way, 14 3-way, and 6 4-way. The variations in estimates probably reflect errors due to imprecision of the test, but the mode is 3-way, suggesting four dimensions are processed. While no single study might be definitive there is a degree of consensus across studies with a wide range of methodologies that approximately four dimensions are processed in parallel.

If it should be possible to process two relations in parallel but independently the sum of their processing demands would be less than for a single relation with the same number of arguments. If we consider a k-ary relation R on a set S with s members, then each component xi in a tuple (x1, x2, . . . ,xk) [propersubset] R might be filled in s ways: the number of possible tuples is sk. Therefore the number of tuples is 2s2 for two binary relations but s4 for a quaternary relation. When we consider neural net representation of relations based on symbol-argument-argument bindings (in 4.1.1.2) the number of binding units is of nk+1, where n is the number of elements of each vector, and a similar argument applies. Therefore the limit will be reached more quickly with a single relation than with two relations having the same total number of arguments. Notice too that more links are defined in a quaternary relation than in two binary relations. Thus R(a,b,c,d) defines links between six pairs ab, ac, . . bc, . . ,cd, whereas R(a,b) and R(c,d) collectively define links between only two pairs, ab and cd.

However it is unlikely that two or more relations can be processed in parallel and independently in the central executive, or within any one system, because they would need to be coordinated to avoid conflict. They could be coordinated by integration into a higher-order relation, but this implies that they are effectively being processed as a single relation. Another way would be to superimpose two or more relations, and this can be done in the neural net models to be discussed in 4.1.1.2. For example the relational instances mother-of(mare,foal), loves(mare,foal), feeds(mare,foal) can all be superimposed. Notice however that superimposing relational instances in this way does not increase the number of dimensions being processed (there are only two arguments in this example). When we consider neural net implementations in 4.1.1 it will be apparent that such superposition adds little computational cost. Superimposed relational instances can be treated as a whole, but they can also be processed separately, using the retrieval process in 4.2.1. On the other hand the relation between them cannot be defined by superposition (e.g., "mother-of", "loves", and "feeds" can be fused into a whole equivalent to some kind of composite motherhood concept, or they can be processed as separate relational instances, but no relation is defined between the relational instances).

If four dimensions can be processed in parallel, the next question is how more complex concepts are processed. Many concepts are more complex than quaternary relations, so we must have some means of dealing with these concepts without exceeding our processing capacity.

3.4 Using capacity efficiently

We propose two mechanisms for reducing processing loads imposed by complex concepts. These are conceptual chunking and segmentation.

3.4.1 Conceptual chunking

is the recoding of concepts into fewer dimensions. In the limiting, and most typical case, they are recoded into a single dimension. In a mnemonic chunk items function as a unit (e.g., c,a,t becomes a chunk if the three letters form a single word cat). Similarly, elements that are formed into a conceptual chunk function as a whole in a relational structure, and relations between items within the chunk cannot be accessed.

We can illustrate conceptual chunking using the concept of velocity, defined as V= s/t (where s is distance and t is time). The relation between velocity, distance and time is three dimensional, but velocity can be expressed as a single dimension, such as the position of a pointer on a dial; velocity(60 km/h). In this single dimensional representation, no relation is defined between velocity, distance and time. If we want to compute (say) the way velocity varies as distance increases and time decreases, we must return to the three-dimensional representation. Thus conceptual chunks save processing capacity, but the cost is that some relations become temporarily inaccessible. There is also a psychological factor which limits chunking, because experience is required in which there is a constant mapping of elements into chunks (Logan, 1979). Chunking is a form of learning, which takes place over time.

Chunked concepts can be combined with further dimensions to represent higher level concepts, so acceleration can be defined as A = (V2 - V1)/t. Acceleration also can be chunked, and then Force (F) can be defined as F = MA (where M = mass). In this way we can bootstrap our way to higher and higher level concepts, without ever exceeding four dimensions processed in parallel. A major function of expertise is to provide ways of chunking that permit the important features of concepts to be represented without imposing excessive processing demands.

Where a role has only one possible filler it can be chunked without loss and the number of dimensions is reduced accordingly. Consider, for example, the relation: mother-of{(Jenny,Tom),(Jenny,Mary),(Jenny,Jill)}. The mother role is always filled by Jenny, so the representation can be collapsed to the unary relation Jenny-mother-of({(Tom),(Mary),(Jill)}.

The general principles of chunking are: (1) a chunk functions as a single entity, relation-symbol or argument, in a relation, (2) no relations can be represented between items within a chunk, (3) relations between the chunk and other items, or other chunks, can be represented.

In order to assess processing capacity, chunking can be inhibited by using novel structures, for which chunks have not been learned. This does not preclude using familiar domains. For example, in testing transitivity, size relations between unknown persons can be used (e.g., John > Tom, Tom > Peter). Size relations are a familiar domain, but the specific orderings (John, Tom, Peter, etc.) will not have been prelearned as chunks.

3.4.2 Segmentation

is breaking tasks into steps which do not exceed processing capacity, and which are processed serially. Examples include algorithms for arithmetic operations, counting, and ordering tasks. Arithmetic algorithms such as multidigit addition generally have to be taught, but there is some degree of autonomy in acquisition of counting and ordering algorithms. It is not possible to determine the precise number of elements in a large set solely by parallel processing, so we use the serial procedure of counting objects one (or, at most, a few) at a time. Children have some understanding of the principles of counting, and this guides the development of their strategies (Greeno et al., 1984). People's concept of an ordered set can provide a mental model for an ordering algorithm (Halford et al., 1995).

The development of counting and ordering strategies illustrate the principle that autonomous development of serial processing strategies requires planning, which depends on representing relations (VanLehn & Brown, 1980). If processing limitations prevent the structure of the task being represented, a strategy cannot be developed without didactic help, which can only be available for a small subset of the cognitive tasks we perform. Thus the "self-programming" property of higher cognitive processes depends on ability to represent relations.

3.4.3 Effective complexity determined by reduction technique

Because complexity can be reduced by conceptual chunking and segmentation, the number of arguments of a relation does not immediately translate into effective complexity. Also, simply increasing the number of arguments by conjunction does not necessarily contribute to the complexity of the resulting relation. The important point regarding relational complexity is the nature of the interaction between the relational elements. Effective relational complexity can be determined using a reduction technique.

More specifically, the effective complexity of a relation is the minimum dimensionality a relation can be reduced to without loss of information. So, if a ternary relation can be reduced to two binary relations without loss of information then effective relational complexity is binary, not ternary. One can determine if a relation can be reduced to a combination of lower order relations by a procedure of decomposing and recombining. If the resulting relation is the same as before then the relation can be decomposed into lower order relations without loss of information. Psychologically, effective relational complexity is the minimum dimensionality to which a relation can be reduced using decomposition and recombination procedures available to human performers.

For example, suppose the following three facts:

1. John played tennis at the school

2. John played soccer at the park

3. Mark played soccer at the park.

Intuitively, it would appear that this domain consists of a ternary relation over Person, Game and Location. That is, Played(Person, Game, Location) = {(John,tennis,school),(John,soccer,park),(Mark,soccer,park)}. So, at first we may claim that the relational complexity of this domain is ternary. However, this ternary relation can be decomposed into two binary relations by splitting the ternary relation along the Game attribute. The resulting relations are: Played(Person, Game) = {(John,tennis),(John,soccer),(Mark,soccer)}; and Is-played-at(Game, Location) = {(tennis,school),(soccer,park)}. Now, recombining these two binary relations by joining them along the common attribute Game results in the original ternary relation Played(Person,Game,Location) containing exactly the same elements. Therefore, the ternary relation is decomposable into two binary relations, and effective relational complexity is binary.

However, suppose now that the domain has changed to include a new fact: 4. Mark played soccer at the school. We will see that inclusion of this fact changes the relational complexity of the domain. The ternary relation Played, consists of elements (John,tennis,school),(John,soccer,park), (Mark,soccer,park) and (Mark,soccer,school). Splitting this relation along the Game attribute results in the two binary relations: Played(Person,Game) = {(John,tennis),(John,soccer),(Mark,soccer)}; and Is-played-at(Game, Location) = {(tennis,school),(soccer,park),(soccer,school)}. However, recombining these two relations results in the new triple: (John,soccer,school), formed by joining pairs (John,soccer) and (soccer, school). However, this triple is not an element of the ternary relation Played(Person, Game, Location) before decomposing, and is not recorded in any of the 4 facts for the new domain. Therefore, decomposing and recombining along the Game attribute has not recovered the original relation, so there has been a loss of information.

The same procedure applied to the Person and Location attributes also results in triples: (John,tennis,park) and (John,soccer,school), respectively, that are not elements of the ternary relation Played(Person,Game,Location). Therefore, this new domain cannot be decomposed into two binary relations, and so its effective relational complexity is ternary. One possible rejoinder to this sort of analysis is to claim that all relations can be decomposed into binary relations by simply creating unique symbol for each element (tuple) of the higher order relation. Under this scheme, the above domain could be decomposed into the single binary relation: Involved(Event,Participant) = {(JTS, John),(JTS, tennis), (JTS, school), (JSP, John), ...}, where JTS is a unique symbol for the event "John played tennis at the school", etc. However, a general encoding scheme requires processing the original ternary relation (e.g., (John, tennis, school) -> JST). Decomposition only becomes effective once the creation process is complete. Less general encoding schemes are possible utilizing only binary relations (e.g., (John, tennis) -> JT), but such schemes are inadequate for relations containing both (John, tennis, school) and (John, tennis, park). Furthermore, its implausible that appropriate encoding strategies are immediately available for novel cognitive tasks.

To relate this example to the analysis of variance analogy, notice that with facts 1-3, location is predicted solely by game (tennis at school, soccer in the park) independently of person. When fact 4 is added person and game jointly predict location, and there are 3 interacting variables.

The reduction checking technique can also be applied to higher-order relations. For example, intuitively one might expect that the relational complexity of transitive inferences (e.g., John is taller than Mary, and Mary is taller than Sue, so John is also taller than Sue) is binary, since such inferences operate over binary relations. However transitive inference is not simply a collection of binary relations. Transitive inferences have the structure: "(A R B) and (B R C), therefore (A R C)", where R is some binary relation and A, B and C are variables ranging over the arguments of R. Transitive inference entails a constraint between two premises and a conclusion, and it is an example of systematicity, as defined in 2.2.10. Reduction analysis shows that the structure of transitive inference is ternary, since it cannot be reduced to a collection of binary relations without loss of information.

The structure of transitive inference can be expressed as a higher-order ternary relation over binary relational instances. That is, Transitive inference(P1,P2,C) = {(aRb, bRc, aRc), (aRb, bRd, aRd), (aRc, cRd, aRd), (bRc, cRd, bRd)}, where P1, P2 and C are the first and second premise, and consequent attribute names (respectively); and a, b, c and d are symbols (place holders) to which elements of specific relational instances are aligned. Using the reduction checking technique, we show that transitive inference cannot be decomposed into binary relations.

Suppose we choose to split the relation along the P2 attribute, which results in the two binary relations: And(P1,P2) = {(aRb, bRc), (aRb, bRd), (aRc, cRd), (bRc, cRd)}; and Implies(P2,C) = {(bRc, aRc), (bRd, aRd), (cRd, aRd), (cRd, bRd)}. Rejoining these two relations along attribute P2 results in two ternary relational instances (aRc, cRd, bRd) and (bRc, cRd, aRd), which were not present in the original relation. (That is to say, it is not logically valid to conclude that, for example, Tom is taller than Mark given that John is taller than Bob, and Bob is taller than Mark, if we do not know the relationship between Tom and John or Bob.) Similarly, splitting and rejoining on attributes P1 and C, results in additional relational instances not present in the original relation. Thus, transitive inference is not, in general, decomposable into binary relations.

Indecomposable relations are ultimately significant because, if a relation is indecomposable, then subjects cannot recode the problem by decomposing (ternary) relations into simpler (binary) relations. A case of a indecomposable relation is given in 6.2.4.3. Even where decomposition is theoretically possible, participants might lack the required strategies (algorithms), and the need to cope with more relational instances (of a lower arity) might impose loads of its own (e.g., if higher-order relations are involved, see 6.1.3).

3.5. Effects of processing overload

A participant who cannot construct a representation of the dimensionality required for a task has three options:

1. The concept can be chunked to a lower dimensional representation. This will only be possible if appropriate chunks have been learned or can be constructed, and it results in loss of access to relations between chunked entities.

2. The task can be segmented into smaller steps that are performed serially. This however requires a strategy, autonomous planning of which depends on the participant's ability to represent relations in the task.

3. The participant can default to a lower level representation. This is analogous to performing an experiment with (say) a 3-way design, then analysing the data by a series of 2-way ANOVAs. Just as the analysis would lead to recovery of most of the relevant data in the experiment (all main effects and 2 way interactions would be recovered[4]), the performance would probably be correct in most respects. However, just as the hypothetical experimenter would miss the 3-way interactions, our hypothetical performer could not reason about high level relations in the task.

3.6 Capacity and content

It is important to consider whether processing capacity is a function of content. As we indicated in 1.1, the capacity limitations we have defined do not apply to modular processes such as vision. They apply to higher cognitive processes which entail processing explicit relational knowledge, defined in 2.2. However it is still reasonable to ask whether complexity might be influenced by content. We suggest that complexity effects of content variations can be attributed to processes such as conceptual chunking, segmentation, and use of higher order relations. Relations in a familiar domain can be more readily chunked, or higher-order relations may be known which enable the structure to be represented hierarchically, as illustrated in 2.2.5. It can then be segmented by processing one level of the hierarchy at a time, as described in 4.2.5.

An example of content effects is discussed in 6.1.4, but here we will consider two illustrative examples of the way relational complexity can be applied to different task contents and formats. Andrews and Halford (submitted) tested young children's ability to order colored blocks using premises such as "red above green" and "green above blue". In the construction condition the children simply built towers with green above blue, then red above green, and so on. In the prediction condition children had to say in advance which of two blocks, red or blue, would be higher in the tower. The construction condition was easier, apparently because of its concrete, "hands-on" nature. Notice however that in the construction task relations can be processed one at a time: children can first place green on blue, then red on green, etc. By contrast, in prediction they must mentally integrate two relations "red above green" and "green above blue" to yield "red above blue". When number of relations that had to be considered in a single decision was manipulated, with format controlled, it was found that this factor accounted for most of the variance in the task, and there was no significant residual effect of construction-prediction. Thus format was completely subsumed under relational complexity.

The relational complexity metric has been applied successfully to a wide variety of tasks, of which those discussed in Section 6 are a sample. It was applied successfully to children's mathematics by English and Halford (1995). Consider, for example, the following relations between rational numbers: 1/2 = 3/6; 1/2 < 4/6; 1/3 < 3/6; 5/7 > 5/8.

Proportion is a notoriously difficult concept for children learning mathematics, and there seems to be some mystification as to why. However, as this example illustrates, proportions entail a quaternary relation, the variables being the two numerators and two denominators. Therefore the task is likely to be difficult because four dimensions must be processed. This simply illustrates that relational complexity has proved a very serviceable metric for conceptual complexity, in tasks as varied as proportion and ordering blocks.

4.0 Algorithmic Design

The essence of the model is defined at the mathematical (computational) level, and it is designed to account for observed capacity limitations of higher cognitive processes. However research on neural net representation of relations has discovered limitations at least broadly consistent with those observed in psychological data. Integrating the psychological and neural net work on this question has the potential to deepen our understanding of the issue, and to produce more refined questions for future research. This section considers how relations can be represented in neural nets and Section 5 develops the argument that computational cost is a function of relational complexity. Thus the underlying reason for processing capacity limitations may be found in requirements for processing relations in neural nets. However if the reader does not wish to consider neural net modeling, at least in a first reading, then it is still possible to follow the paper by skipping to Section 6.

The representation of relations in neural nets is currently the subject of extensive research, but even models which differ in architecture are in reasonable agreement about the nature of capacity limitations.

4.1 Neural net models

of relational knowledge can be categorised in two ways, type of binding and type of architecture. The models of Hummel & Holyoak (in press), Plate (1995), Shastri and Ajjanagadde (1993a) and Smolensky (1990) use role-filler bindings, while the model of Halford, Wilson, et al. (1994) uses symbol-argument-argument bindings (defined in 2.2.1.2). Architectures can be divided mainly into models based on a product operation, either tensor product (Halford et al., 1994; Smolensky, 1990) or circular convolution (Plate, 1995) and models based on synchronous oscillation (Hummel & Holyoak, in press; Shastri & Ajjanagadde, 1993a). Other networks exist which learn to represent relations, for example, the recursive autoassociative memory (Pollack, 1988) and BoltzCONS (Touretsky, 1990). However, the large number of training examples needed to learn appropriate representations makes them unsuitable for models of working memory so they are not considered here.

4.1.1 Tensor and convolution models

represent bindings by performing some type of product operation on vectors representing bound entities. Tensor and convolution models can use either role-filler bindings or symbol-argument-argument bindings.

4.1.1.1 Role-filler bindings

. In the model of Smolensky (1990) the role-filler binding is represented by the outer product of role and filler vectors, whereas in the model of Plate (1995) it is represented by the circular convolution of the vectors. Thus loves(John,Mary) can be represented in essence by tr = vrole1[circlemultiply]vJohn + vrole2[circlemultiply]vMary or by vR = vrole1*vJohn + vrole2*vMary where [circlemultiply] and * represent tensor product and circular convolution respectively, and tr and vr represents the relation symbol r in the tensor and convolution models respectively. A tensor product net that can represent a role-filler binding is shown in Figure 1A, with an arithmetic example in Figure 1B. In these models all vectors representing roles are superimposed on a single set of units, and vectors representing fillers are superimposed on another set of units. A circular convolution of the vectors in Figure 1B is shown in Figure 1F. A circular convolution is like a compression (technically a projection) of the tensor product matrix, computed by adding along the curved lines as shown. For a lucid explanation of circular convolution see Plate (1995). The elements within the matrix are the binding units and their activations are computed in one shot, rather than by incremental adjustment over trials as occurs in learning algorithms. Therefore the matrix represents a dynamic binding in the sense that it represents the currently activated representation, rather than a product of past learning.

Using circular convolution the number of elements is constant (as illustrated in Figure 1F); for example, the number of elements in each of vrole1 and vJohn is the same as the number in vrole1*vJohn, whereas the tensor product of vectors with n and m elements contains nm elements (as illustrated in Figure 1A and Figure 1B). The implications of this for computational cost will be discussed in 5.2.3.

Retrieval from circular convolution representations is noisy (Plate, in press) and requires a cleanup memory, whereas tensor product representations produce an unambiguous output, provided all vectors used form an orthonormal basis (that is, they form a basis for the vector space they span, their lengths are 1, and inner products of distinct vectors are all zero). Though orthonormal vectors are convenient they do not enable crosstalk (interference between similar representations, or similar tasks) to be modeled. This can be done using sparse random vectors, in which similar entities share some units (Wilson, Street, & Halford, 1995). There is then the possibility of confusion and crosstalk.

4.1.1.2 Symbol-argument-argument bindings

were illustrated in 2.2.1.2. With this type of model a relational instance is effectively represented by computing the outer product of symbol and argument vectors (Halford, Wilson, et al., 1994). A collection of relational instances can be superimposed on the same representation, by adding up the outer products. Thus representations of loves(John,Mary) can be represented as Vloves[circlemultiply]VJohn[circlemultiply]VMary and loves(Tom,Wendy) can be represented as Vloves[circlemultiply]VTom[circlemultiply]VWendy. These representations can be superimposed by summing the outer products, yielding Vloves[circlemultiply]VJohn[circlemultiply]VMary + Vloves[circlemultiply]VTom[circlemultiply]VWendy.

The resulting sum of outer products is referred to as a tensor. Thus the relational instance r(a1,a2,...,an) would be represented in a tensor product space VR[circlemultiply]V1[circlemultiply]V2[circlemultiply]....[circlemultiply]Vn, where VR represents alternative relation symbols including r, and vi (i>0) represents concepts appropriate to the ith argument position. A unary relational instance r(a) can be represented in a rank 2 tensor product space VR[circlemultiply]V1. A binary relational instance r(a1,a2) can be represented in a rank three tensor product VR[circlemultiply]V1[circlemultiply]V2. The net in Figure 1A represents a unary relation by this method if one vector represents the relation symbol and the other vector represents the argument. Similarly for the arithmetical example in Figure 1B. Binary relations are illustrated by this method in Figure 1C and Figure 1D. The arithmetic examples of outer products in Figure 1B and Figure 1D show that each element in the matrix (rank 2 outer product) is the product of a component from each of the symbol and argument vectors. Arguments to a relation may also be regarded as role-fillers, and "argument" and "filler" are used interchangeably in this context depending on whether symbol-argument-argument or role-filler models are being considered.

Tensor product implementations of relations from unary to quaternary are shown schematically in Figure 2. In each case there is a vector representing the relation symbol and a vector representing each argument. A ternary relational instance r(a1,a2,a3) can be represented in a rank four tensor product space VR[circlemultiply]V1[circlemultiply]V2[circlemultiply]V3. Notice that the example used in Figure 2 represents arithmetic addition and multiplication which, as noted earlier, are ternary relations, superimposed on the same tensor product. If 2 addends (multiplicands) are entered in the argument units (using the retrieval procedure in 4.2.1), and the vector representing addition (multiplication) is entered in the symbol units, the output represents the sum (product). Changing the symbol vector changes the relation that is implemented, and is an example of modifiability as discussed in 2.1.12. Our simulations have shown addition and multiplication can be superimposed in this way on a rank 4 tensor product without interference. A quaternary relational instance r(a1,a2,a3,a4) is represented in a rank five tensor product VR[circlemultiply]V1[circlemultiply]V2[circlemultiply]V3[circlemultiply]V4, the example in Figure 2 being a composition of two binary operations.

In symbol-argument-argument models roles are determined positionally, a type of coding also used in language. This implies that roles do not need to be explicitly represented and role-filler bindings are unnecessary. The role to which an argument is assigned is defined by its position in the representation. In the tensor product implementation, roles are not represented by vectors, but separate sets of units are used for each argument vector. A procedure is required to ensure structural correspondence, so arguments are represented on the correct set of units. The criteria for valid representation mentioned in 2.2.1.1 are sufficient to ensure this.

To illustrate how roles need not be explicit if arguments are defined relative to each other, consider an instance of the ternary relation arithmetic addition, +(3,5,8). Now suppose we want to superimpose another instance +(2,4,6). This can be done using tensor product representations of symbol-argument-argument bindings as mentioned above. If we were to misalign the representations by superimposing +(2,6,4) on +(3,5,8), that is interchanging the second and third arguments, the error would be detected by the tests for structural correspondence (in 2.2.1.1). Thus a valid relational representation can be established without roles being explicitly represented. The arguments are assigned to the correct role position by ensuring that they are correctly related to each other (in the current example this means that they are in the correct order).

4.1.2 Synchronous oscillation models

In synchronous oscillation models units representing a role oscillate in phase with units representing the filler bound to that role, and out of phase with units representing other roles and fillers. The relational instance loves(John,Mary) would be represented by units representing the agent role of loves oscillating in synchrony with units representing John, while units representing the patient role of loves oscillated in synchrony with units representing Mary (see Figure 3). However units representing the agent role (filler) would oscillate out of synchrony with units representing the patient role (filler). The model of Shastri and Ajjanagadde (1993a) utilises synchronous oscillation for role-filler binding, but much of its power comes from additional nodes and connections. The analogy model of Hummel and Holyoak (in press) uses synchronous oscillation to perform mappings between analogs, but much of its power also comes from other systems, including a distributed semantic memory representation.

Relational instances can be superimposed on the representation, as illustrated in Figure 3. Thus kisses(John,Mary) and marry(John,Mary) can be superimposed on the representation of loves(John,Mary), by having corresponding roles of all relational instances oscillate in synchrony. Notice that no additional phases are required for the superimposed instances, and this is analogous to tensor product representations of symbol-argument-argument bindings discussed in 4.1.1.2, where relational instances can be superimposed on the same set of vector spaces. To foreshadow a point to be made in 5.1, Shastri and Ajjanagadde (1993a) have shown that the major limitation is in the number of distinct phases, rather than the amount of information represented in each phase. This corresponds to the limitation in human processing capacity, which is defined by the number of arguments a relation has, rather than by total information processed.

4.1.3 Comparison of models

Tensor product and synchronous oscillation models appear to be equally capable of representing higher cognitive processes, and the similarity of their properties is at first sight somewhat surprising. However Tesar and Smolensky (1994) have proposed that the architectures are formally reducible to one another, the primary difference being that tensor product models use spatial role vectors whereas synchronous oscillation models use temporal role vectors. Another possible explanation is that their similar properties are due to additional features designed to give them the power to simulate higher cognitive processes. As we will see, the similarity of their properties extends to the processing capacity limitations that are inherent in them.

It is important that, in order to account for working memory, a model must deal with relations. As noted in 2.2.1.2 representation of relations by role-filler bindings requires that each relational instance be stored separately, or be uniquely identified. We will now develop this point further by considering a ternary relation, the binary operation of addition. There are three roles, corresponding to the two addends and the sum, which we will represent as a1, a2 and s. Using the role-filler approach we could bind numbers to each role, thus:

a1.2 a2.3 s.5

a1.4 a2.5 s.9

a1.3 a2.4 s.7

a1.5 a2.2 s.7 and so on.

If we were to represent all addition facts in this way, then every number would be bound to every role, since any number can serve as first or second addend, or as sum. If all these role-filler bindings are entered into the same representation (e.g., by adding the resulting vectors, as in the models discussed above), and without specific identification of the tuples, then we cannot recover role-filler bindings or relational instances. If we ask "what number is bound to the first addend role?" the answer is "every number", and the same is true for the other two roles. Furthermore, we have only stored role-filler bindings rather than relational instances, so there are no links between addends and sum. Thus even the fact that a1.2 and a2.3 are associated with s.5 is not represented, so it is not possible to access a component of the instance, given the remaining component. Thus we cannot ask: if the addends are 2 and 3, what is the sum? Suppose, for example, we were to identify first addend roles that are bound to 2. We cannot then determine which of these cases have 3 bound to the second addend role, because no link has been stored between first and second addends. We cannot retrieve the sum, given the first and second addends, for the same reason.

The solution of identifying each relational instance also has its problems, first because individual identification of every relational instance is implausible when the number of instances is very large. A relational instance such as 2+3=5 is identified by its content (e.g., 2+3=?; ?+3=5) rather than by an index, such as a context vector, that identifies the relational instance.[5] It is implausible that every addition fact we know is individually identified. Second, notice also that, even were instance identification to be used, every role-filler binding in a relational instance would require the same identifier. Thus if we identify this instance as add2,3, we must bind the identifier to all three role-filler bindings, thus:

add2,3: (add2,3).a1.2 (add2,3).a2.3 (add2,3).s.5

In other words, the context in which the three role-filler bindings are learnt/memorized must be sufficiently stable as to result in the same identification vector across all three role-filler pairs. Notice also that this representation bears a close resemblance to symbol-argument-argument bindings. Furthermore, the identifier increases the computational cost of the representation, and appears to require an additional rank in the outer product (i.e. rank 3 rather than rank 2, as in Smolensky's (1990) model).

Contrast this with representations based on symbol-argument-argument bindings. Omitting the relation symbol (as with role-filler bindings), we would represent the same facts as:

2.3.5

4.5.9

3.5.8 etc.

The computational cost is high for one relational instance (a ternary relation requires a rank 4 tensor, including the relation symbol vector), but there is no increase in cost for further relational instances because they can be superimposed, the tuples are inherently identified by the bindings, the links between addends and sums are represented, and the other properties of relational knowledge are implemented, as explained in 4.2. Thus while the initial cost of symbol-argument-argument bindings is high, their power is considerable.

Although the synchronous oscillation models of Shastri and Ajjanagadde (1993) and the tensor product symbol-argument-argument binding model of Halford, Wilson, et al. (1994) are very different, they have a common property that is important to capacity limitations. This is that they both map dimensions of the relations (as defined in 2.3.5) to separate dimensions of the representation. In the synchronous oscillation model each argument is assigned to a separate phase in the oscillation (as illustrated in Figure 3). In the symbol-argument-argument binding model each argument is assigned to a separate vector space (illustrated in Figure 1C and Figure 1D). This means that the dimensions of the relation are mapped directly into phases of oscillation, or into vector spaces. The role-filler models based on tensor products[6] (Smolensky, 1990) or circular convolution (Plate, in press) do not have this property, because all roles are superimposed, as are all fillers (see 4.1.1.1). As we will see in Section 5, models which map dimensions of the relation to dimensions of the representation imply similar capacity limitations.[7]

4.2. Modeling relational knowledge

The manner in which these models implement the properties of relational knowledge defined in 2.2 will now be considered. The role-filler model of Smolensky (1990), based on tensor products, handles the storage and retrieval of relational instances, but in its original form it does not appear to incorporate the other features of relational knowledge. The symbol-argument binding model (Halford, Wilson, et al., 1994) was based on Smolensky's tensor product formalism, but with modifications and extensions to handle all features of relational knowledge. The circular convolution model of Plate (in press) handles storage and retrieval of relational instances and gives a good account of similarity, but it does not appear to handle conceptual chunking (see 4.2.4) nor does it provide a general solution to systematicity (see 4.2.9). Role-filler binding models based on synchronous oscillation (Hummel & Holyoak, in press; Shastri & Ajjanagadde, 1993a; 1993b) appear to have been designed to incorporate the properties of relational knowledge in section 2.2. We will emphasise those models that have been designed to implement the properties of relational knowledge.

4.2.1 Retrieval of information

Information stored in a tensor memory can be retrieved by representing a question as a tensor product and computing the inner product (dot product) of the question and memory tensors. The query is a partial relational instance and can be expressed as an outer product with one entity deleted. For example given r(a1,a2,a3) stored as part of the rank four tensor Tpqrs in the tensor product VR[circlemultiply]V1[circlemultiply]V2[circlemultiply]V3, then say, a3 can be retrieved by computing the generalised inner product vr[circlemultiply]va1[circlemultiply]va2[circlemultiply]_
*T, where vr is the vector representing the relation-symbol r, va1 is the vector representing the argument a1, and va2 is the vector representing the argument a2. Generalised inner products are described in Appendix B: the _ signifies the component of the tensor which is "retrieved" by computing this generalised inner product. Effectively, the query r(a1,a2,?) has been used as input to the tensor memory, and a3 has been obtained as output. The details of how this generalised inner product may be computed are contained in Appendix B, where it is listed as operation (3).

An analogous procedure is specified for synchronous oscillation models by Shastri and Ajjanagadde (1993a, section 4.3).

4.2.2 Truth value

of a proposition can be assessed by matching against memory, in what is essentially a recognition process. The proposition bark(cats) can be represented by a rank 2 tensor product, which can be matched against semantic memory by computing a generalised inner product (dot product) of the tensor with the representations in semantic memory (Humphreys, Bain, & Pike, 1989). The relational instance bark(cats) is treated as a query by representing it as the tensor vbark[circlemultiply]vcats as shown above and the dot product of this tensor and tensor representations in memory is computed, as in 4.2.1 (the procedure corresponds to operation (0) in Appendix B). This can be done in parallel for superimposed memories. If the product is non-zero, the proposition is recognized. Thus bark(cats) and bark(dogs) would produce zero and non-zero dot products respectively, so the latter is recognised whereas the former is not. For example:

vbark[circlemultiply]vcats * (vbark[circlemultiply]vdogs + . . + vsing [circlemultiply]vbirds) = 0 but

vbark[circlemultiply]vdogs * (vbark[circlemultiply]vdogs + . . + vsing [circlemultiply]vbirds) > 0

The procedure defined by Shastri and Ajjanagadde (1993a, section 4.4) provides a way of assessing truth value of a proposition in synchronous oscillation models.

4.2.3 Relation symbols

are represented as separate vectors in the vector space VR. In synchronous oscillation models the symbol can be represented as a unit firing in a separate phase in the oscillation, or by an additional node connected to role and filler nodes.

4.2.4 Conceptual chunking

serves to reduce the rank of a tensor product representation of a relation. It can be implemented by convolution, concatenation (illustrated in Figure 1E), superposition (in which vectors representing arguments are added), and by defining a special function that associates an outer product to a single vector. The outer product representation of r(a,b,c) can be reduced to r(a,b/c), by concatenating or convolving vectors b and c into a single vector. Features of b and c can still influence the computation of the relation with a because activation can be propagated from units in b and c to a (Figure 1E), but the representation functions as a binary relation, and neither the relation between b and c, nor the three-way relation between a,b and c is directly accessible.

Unchunking can be achieved by differentiating vectors into other vectors. Algorithms for this have been defined in the STAR analogical reasoning model (Halford, Wilson, et al., 1994; Halford, Wilson, & McDonald, 1995; Halford, Wilson, & Phillips, 1996). In general, lower rank representations can be differentiated yielding more complex relations. For example, in Figure 2E, if the vector representing b/c were differentiated into separate vectors representing b and c, and if all four vectors including the relation-symbol vector (not shown) were then appropriately interconnected, as for a Rank 4 tensor product, a ternary relation could be represented.

A chunked representation is wholistic in that features are represented but are not differentiated into dimensions. Many concepts are wholistic initially and progress to dimensional representation (Smith, 1989). This is like unchunking in that it entails differentiation of a vector into two or more vectors, and to representation of the relation between them.

It is unclear how Plate's (1995) circular convolution model would handle conceptual chunking, at least without significant additions. Chunking involves a compression of a relational instance into an unstructured whole, so the relations between components become inaccessible. However the circular convolution is already a compression (a projection of the tensor product) and it is not clear how a further compression that incorporates the psychological properties of conceptual chunking could be achieved. A further problem is that circular convolution relies on component vectors randomly generated from a guassian or uniform distribution. This has the effect that there is no similarity (as measured by the dot product) between chunked and unchunked representations. Thus, for example, features from b and c would not, in general, could not influence that computation of the relation with a in R(a,b/c).

Hummel and Holyoak (in press) represent the equivalent of a chunk in a synchronous oscillation model by having units that represent part or all of a proposition. For example loves(John,Mary) can be represented as features, as roles and fillers, or as an intact proposition. In the latter case it can be an argument to a proposition such as knows(Sam,loves(John-Mary)).

4.2.5 Higher-order relations and hierarchical structures

can be modeled by representing higher-order relational instances with chunked lower-order relational instances as arguments. Consider the relational instance;

cause(shout-at(John,Tom),hit(Tom,John)).

The relational instance shout-at(John,Tom), normally represented as a rank 3 outer product in our model, is chunked into a single vector shout-at1, as described in 4.2.4, and hit(Tom,John) is chunked similarly as hit1. The higher-order relation cause(shout-at1,hit1) is then represented as a rank 3 outer product.

The repeated variable constraint

requires that fillers be bound to the correct roles, as pointed out in 2.2.5. The STAR analogy model (Halford et al., 1996) can achieve this by ensuring hierarchical structures are in correspondence. Consider the relational instances;

cause(shout-at(John,Tom),hit(Tom,John)),

cause(shout-at(Mary,Wendy),hit(Wendy,Mary))

These would be represented as chunked relational instances, as described above. The model maps one level of the hierarchy at a time, then moves to another, usually lower, level and recursively matches corresponding arguments of source and target. Thus the model would first map cause(shout-at1,hit1) to cause(shout-at2,hit2). It would then unchunk shout-at1 and shout-at2 and map shout-at(John,Tom) to shout-at(Mary,Wendy). The model has a bias to maintain the mappings of John to Mary and Tom to Wendy when processing other parts of the structure. It would map hit(Tom,John) to hit(Wendy,Mary), consistent with previous mappings, thereby maintaining structural consistency. It would also compute a goodness-of-mapping score that reflects degree of structural correspondence. The score would be higher for this mapping than for the inconsistent mapping;

cause(shout-at(John,Tom),hit(Tom,John)) to;

cause(shout-at(Mary,Wendy),hit(Mary,Wendy))

The model enforces the repeated variable constraint as a consequence of maintaining structural consistency. Because the person bound to the agent role of "shout-at" is bound to the object role of "hit" in the source, this constraint is maintained in the target because of biases in the algorithm to ensure structural correspondence between base and target.

Shastri and Ajjanagadde (1993a, section 4.5) provides a synchronous activation based mechanism that enforces the repeated variable constraint.

4.2.6 Omni-directional access

is implemented by the retrieval process in 4.2.1 because a query can be composed of a relational instance with any component missing. Thus a ternary relation represented as vR[circlemultiply]v1[circlemultiply]v2[circlemultiply]v3 can be queried by any of the following means:

vR[circlemultiply]v1[circlemultiply]v2[circlemultiply]_ * vR[circlemultiply]v1[circlemultiply]v2[circlemultiply]v3 = v3

vR[circlemultiply]v1[circlemultiply]_ [circlemultiply]v3 * vR[circlemultiply]v1[circlemultiply]v2[circlemultiply]v3 = v2

vR[circlemultiply]_[circlemultiply] v2[circlemultiply]v3 * vR[circlemultiply]v1[circlemultiply]v2[circlemultiply]v3 = v1

_[circlemultiply]v1[circlemultiply]v2[circlemultiply]v3 * vR[circlemultiply]v1[circlemultiply]v2[circlemultiply]v3 = vR

The procedure for answering wh-queries specified by Shastri and Ajjanagadde (1993a, section 4.7) essentially embodies the omni-directional access property.

4.2.7 Role representation

The role that an argument fills can be indicated by its position relative to other arguments, as discussed in 2.2.7, and its implementation in symbol-argument-argument bindings is described in 4.1.1.2. The synchronous oscillation model of Shastri and Ajjanagadde (1993a) uses role-filler bindings as described in 4.1.2.

4.2.8 Decomposability of relations

The relation represented can be decomposed into the derived relations by replacing the vector in any role with a special vector, namely the sum of all the basis vectors used to represent fillers on that axis of the tensor. Thus a representation of the ternary relation R(x,y,z) can be reduced to representation of R3 = (x,y) by entering this special vector on the units representing z. Such "collapsing" of a representation to a lower rank has been employed in models of memory (Humphreys et al., 1989) and of analogical reasoning (Halford, Wilson, et al., 1994).

In the tensor product representation of an n-ary relation with instances r(a1,a2,...,an), the effect of variations in any proper subset of {a1,a2,...,an} on the remaining argument(s) can be computed. For example, suppose that one wishes to use the fixed value b in the final role (the an role) and consider the induced (n-1)ary relation Ran=b = {(a1,a2,...,an-1) | (a1,a2,...,b) [propersubset] R}. This effect can be achieved by clamping the value in the an role of the tensor network to be b. This can, of course, be done with any role, not just the an role, and can be iterated so that, eventually, any desired set of roles are fixed in this way. For synchronous oscillation models, the representation of partially instantiated relations in Shastri and Ajjanagadde (1993a, section 3.1) effectively decomposes relations in analogous manner.

4.2.9 Relational Systematicity

can be handled by using higher-order relations, as in 4.2.5. For example implies(>(a,b),<(b,a)) can be represented by the tensor product of vectors representing implies and chunked representations of >(a,b) and <(b,a). Systematicity is achieved in the synchronous oscillation model of Shastri and Ajjanagadde (1993a, Section 4.2) by connections that ensure that corresponding arguments oscillate in synchrony (e.g., that the first role-representation in >(a,b) oscillates in synchrony with the second role-representation in <(b,a)).

The circular convolution model of Plate (1995) incorporates systematicity, but there is some doubt as to the generality of the procedure used. In order to enable the model to recognize the structural similarity between "Spot bit Jane, causing Jane to flee from Spot" and "Felix bit Mort, causing Mort to flee from Felix" (by contrast with the superficially similar, but structurally dissimilar "Rover bit Fred, causing Rover to flee from Fred"), Plate used contextualised representations. These entailed adding the property "flee-from" to the representation of Spot, Felix, and Rover, and the property "bite-object" to the representation of Jane, Mort, and Fred. This handles some structurally similar higher-order relational instances, but depends on representing dogs as entities people flee from and people as entities dogs bite. This approach lacks plausibility in relational instances such as "Jane smiled at John, causing John to like Jane" because it is implausible that smiling should be part of the representation of Jane (she may not always smile, even at John), or that liking should be part of the representation of John (he may not always like people). The circular convolution model appears to require additional means of representing structure in order to handle systematicity, and the computational cost of these additions is unknown.

4.2.10 Dimensionality of relations

The dimensionality of a relation was defined in 2.3.5 as the number of arguments. In symbol-argument-argument tensor product representations, a separate vector is used for the relation-symbol and for each argument, so rank is one more than dimensionality. We have used the convention of specifying number of arguments by n , and we will use the convention of specifying ranks by k. Therefore in this type of model k = n + 1. The components of a representation are the relation symbol and the argument representations, so the number of components = k. Furthermore if k-1 ranks are known there is at least some potential to predict the kth rank (illustrated in 2.3.5) so dimensionality = k-1 = n. Even if relations are represented by formalisms other than tensor products, symbol and arguments must be represented independently of each other, or be individually identified, so that they retain their identity when linked (bound) to other components. Note also that, in the context of neural net models, the dimensionality of a relational concept should not be confused with the number of elements in a vector, which is also sometimes referred to using the term "dimension". As noted in 4.1.3 the models of Halford, Wilson, et al., (1994) and Shastri and Ajjanagadde (1993a) map dimensions of relations directly into dimensions of representations, whereas other models do not.

4.2.11 Analogy, planning and modifiability

Analogy can be successfully modeled using the tensor product representations of relations outlined in 4.1 (Halford, Wilson, et al., 1994; Halford, et al., 1995; 1996). A sophisticated model of analogy based on synchronous oscillation has been presented by Hummel and Holyoak (in press).

With symbol-argument-argument models based on tensor products, relations can be modified on line by changing the relation symbol, which selects a new set of relational instances. Relational instances are stored as outer products of symbol and argument vectors, and outer products are summed to form a tensor. We will illustrate with arithmetic addition and multiplication. Addition would be stored as vadd[circlemultiply]v2[circlemultiply]v3[circlemultiply]v5 + vadd[circlemultiply]v3[circlemultiply]v6[circlemultiply]v9 + . . . , while multiplication would be stored as vmult[circlemultiply]v2[circlemultiply]v3[circlemultiply]v6 + vmult[circlemultiply]v3[circlemultiply]v6[circlemultiply]v18 + . . . Changing the symbol vector from vadd to vmult selects a new set of relational instances, and changes the mappings between addends and sum or product.

4.2.12 Strength

can be represented by multiplying the outer product representing the relational instance by a scalar, before adding it to the tensor. Typically the scalar would have a value between 0 and 1 indicating how frequently the relational instance is found to be true: for example bigger-than(dog, cat) would have a scalar a little less than 1 to take account of the small minority of dogs (e.g., chihuahuas) that are smaller than cats.

4.2.13 Operations on relations.

Operations on relations can be implemented using tensors. For completeness we provide one tensor implementation for each of the relational operators. The simplest implementation assumes local (with all unit values 0 except for a single unit with value 1) argument vector representations. Relaxing this assumption introduces other properties such as cross-talk, but at the expense of not implementing exact analogues of relational operators. Under the local vector assumption, then, the set operators union, intersection and difference are implemented by pairwise addition, multiplication and subtraction (respectively) of binding units with the same index. The upper limit on activation eliminates multiple occurrences of the same element (consistent with set union), and the lower limit prevents subtraction of nonexisting elements (consistent with difference).

The select operator retrieves relational instances satisfying condition C. Since C is an arbitrary boolean expression, the select operator is a very general and powerful operator. For our purposes, however, we consider a restricted version, where C has the form of a conjunction of filler-role pairs: (a1,A1) ^ ... ^ (am,Am) (i.e., select with filler a1 at role A1 and filler a2 at role A2, etc). The corresponding tensor implementation is to compute the outer product of the fillers a1 to am at the specified tensor axes A1 to Am, respectively. Axes with unspecified fillers use a special filler vector I = (1,...,1). Thus, the rank of the tensor (TC) representing the condition C is the same as the rank of the tensor (TR) representing relation R. Next, we perform a pairwise multiplication TC . TR, resulting in a tensor (Ts) representing the selected relational instances.

The project operator returns the relation between components at the specified roles. The equivalent tensor operation is summation onto the corresponding axes. Formally, given a relation R with attributes (roles) A1, ..., Ak and a corresponding tensor T with axes labeled A1, ..., Ak, then projectA R, is implemented as [Sigma]<A1, ..., Ak>-A, where A is the list of projected attributes (or tensor axes) and <A1, ..., Ak>-A is the difference of the two lists (i.e., sum onto the axes not in the list of projected attributes). The rank of the resulting tensor is the same as the arity of the projected relation.

Often one wants to cue a k-ary relational memory with k-1 components to retrieve the target at the kth role. At the relational level, the target is retrieved by successive application of the select and project operators. The select operator retrieves instance(s) containing all k-1 components at the specified roles, and the retrieved instance(s) is applied to the project operator which returns the target at the nth role. At the tensor level, this combination of select and project is realised by taking the inner product of the k-1 components with the tensor, resulting in a vector representing the retrieved target(s). This tensor operation is specified in Section 4.2.1.

The outer join of two relations is simply the outer product of the corresponding tensors. The equi-join, however, is more complicated as it requires only joining those instances that share a common argument at the specified roles. Suppose the relation Taller = {(John,Mary),(John,Bob), (Mary,Tom)}, and its corresponding tensor T = vJohn [circlemultiply] vMary + vJohn [circlemultiply] vBob + vMary [circlemultiply] vTom (the relation symbol is not part of the join operation). One way of implementing the equi-join of the Taller relation with itself is to first, cue the tensor T, with a possible argument at the Person1 role (e.g., vMary . T). This results in the vector vTom. Then, cue the tensor again, but at the Person2 role (i.e., T . vMary). This results in the vector vJohn. Provided one maintains the cue vector vMary and the two retrieved vectors vJohn and vTom, one can construct the tensor representing the equi-join as vJohn [circlemultiply] vMary [circlemultiply] vTom. Other instances are retrieved in the same manner by cueing with different fillers at the joined roles and adding the result to the tensor representing the equi-join.

To summarise section 4, the properties of relational knowledge can be incorporated in neural net models based on either symbol-argument-argument bindings using tensor products, or on role-filler bindings using synchronous oscillation. Representing these properties effectively depends on a number of additional features, but the basic properties of the representations are important. Despite their differences, both types of model have the property that the dimensions of a relation, the symbol and arguments (fillers) are mapped into dimensions of the representation, either vector spaces or phases of oscillation. This means that components of the relation are represented as intact entities, which retain their identity in the binding, and this is a major factor in the computational cost of representing relations.

5.0 Relational complexity and processing load

So far we have been considering properties of human cognitive functioning, with the aim of accounting for processing capacity limitations observed in psychological data. We have defined cognitive complexity intuitively in terms of the number of interacting variables represented in parallel, and have conceptualised it in terms of the number of arguments in a relation. However we wish to explain processing capacity limitations, and two approaches to neural net modeling of relational knowledge have independently identified possible explanations (Halford, Wilson, et al., 1994; Shastri & Ajjanagadde, 1993a). To explain how processing loads are imposed by relations we need to consider computational complexity which refers to the amount of computation required to perform a task.

Complexity analysis at the computational level is a very general, potentially algorithm-independent method of determining the inherent difficulty of a particular problem. The classic results from computer science have been to identify two very broad classes of problems, called P and NP. In the most general terms, complexity of the former is a polynomial function of some measure of the input, while for the latter it is an exponential (or worse) function. Intuitively, an NP-complete problem is intractable, in the sense that the time required by any known algorithm to solve the problem grows explosively with the size of the problem (the n). NP-complete problems can be approached in a number of ways including using an approximate/heuristic algorithm, by avoiding large instances of the problem, or by considering only subclasses of the problem. An algorithm-independent analysis is performed by showing that the problem can be transformed in polynomial time to another problem of known complexity. The power of this method was demonstrated by Tsotsos (1990) with respect to vision, by showing that certain problems in vision transform to NP-complete problems.

However while analysis at this level has been successful with vision, it does not seem to capture processing capacity limitations in cognitive tasks such as reasoning and language. The paradox is that while aspects of vision are intractable by this analysis (Tsotsos, 1990), vision tasks do not appear to impose the kind of demand defined in 2.1.3, and which has been observed in higher cognition processes. The computationally complex tasks of vision appear to be performed without measurable processing demands of the kind discussed in 2.1.3, the standard explanation being that the visual system is a module with high capacity for specialised input, as noted in 1.1. By contrast, many computationally simple tasks in higher cognition impose high processing demands. Ordinary arithmetic, for example, requires relatively little computation, but imposes a high cognitive demand on the human performer, and even such computationally simple tasks as transitive inference problems, in which for example >(a,b) and >(b,c) has to be integrated into monotonically-larger(a,b,c) impose a measurable processing load on adult humans. Tsotsos (1990) shows that visual search is inherently complex because of the combinatorial explosion that occurs as the number of elements to be searched increases. However no such combinatorial explosion occurs in ordinary arithmetic operations or transitive inferences. We must seek the explanation for observed processing demands of such tasks in the architectures employed in higher cognitive processes, for which algorithm complexity is more relevant.

Algorithmic complexity analysis considers how many steps, or how much space is required to compute a given problem for a given algorithm. In the former case, complexity depends on the function which links the number of computational steps required to the size (or length) of the input. A linear time algorithm will complete in [Theta](n) steps (i.e. of the order of n steps), where n is size of problem (e.g. number of inputs)[8]. A polynomial-time algorithm will complete in [Theta](p(n)) steps for some polynomial p(n). [9] An exponential-time algorithm will complete in [Theta](cp(n)) steps, so the number of steps grows explosively with size of input. Problem complexity is the least of the complexities of all algorithms to solve the problem.

Both synchronous oscillation and tensor product/convolution models predict limitations on the complexity of relational schemas that can be activated in parallel, though the bases of the limitations are somewhat different. We will examine both types of model in an attempt to find explanations for the limitations observed in the psychological data reviewed in 3.3.

5.1 Synchronous oscillation models

are limited by the number of distinct oscillations. This is determined by the ratio of the period of oscillation to the window of synchrony (which is related to the duration of peak, and is approximately the maximum temporal spacing between peaks that are recognized as in phase). This ratio is estimated by Shastri and Ajjanagadde (1993a) to be about five, and by Hummel and Holyoak (in press) to be four to six (to illustrate, notice that in Figure 3, approximately 5 distinct oscillations would be possible). Given the criteria for relational knowledge in 2.2, five distinct entities would permit quaternary relations (a relation symbol and four arguments) to be represented without crosstalk. However Shastri and Ajjanagadde suggest that up to 10 entities could be related with crosstalk. Psychological data discussed in 3.3 appears to correspond to Shastri and Ajjanagadde's prediction of capacity without crosstalk, possibly because performance criteria used in experiments (e.g., low error rates to facilitate analysis of latencies) would tend to preclude crosstalk. However, as noted in 4.1.2, the power of models by Shastri and Ajjanagadde, and by Hummel and Holyoak, depend on additional features, and there does not appear to be any way of calculating the cost of these using computational complexity theory.

5.2 Tensor product models

entail a computational cost in space and time. We will consider tensor product representations of relations using symbol-argument-argument bindings as proposed in 4.1.1.2, focusing on the process of accessing the kth component of a relation, given the k-1 other components, as described in 2.2.6 and 4.2.6. Then we will consider role-filler bindings, first based on tensor products to facilitate comparison, and because existing circular convolution models do not appear to incorporate all properties of relational knowledge (as noted in 4.2). Then we will consider circular convolution models insofar as they can be directly compared with tensor product models. Computational cost can be considered from either a parallel or a sequential processor point of view, but the former is more appropriate to emphasise here.

5.2.1 Complexity for symbol-argument-argument bindings

We will consider time complexity then space complexity for the symbol-argument-argument binding models.

5.2.1.1. Time complexity

In the parallel processing model, one assumes that there is a processor for each unit of the vectors representing symbol and argument(s), a processor for each binding unit, and some addition units, to be described below. In order to access the kth component of a relational instance, given arguments a1,...,ak-1, it is necessary to propagate the component values to all the relevant binding units (1 step); each binding unit then multiplies its binding memory contents with the values propagated for the k-1 arguments - this requires k-1 multiplications. Then it is necessary to add up all these products. How long this takes depends on the rank of the tensor and on the length of the vectors - let us suppose all vectors are of length n (so that there are nk binding units in the tensor network). It will be necessary to add up nk-1 products to form each component of the symbol output. This is done by the addition units referred to above. The most rapid way to add many items with many processors is to cascade the additions (see Figure 4 for a binary cascade adder).

This arrangement adds 23 items in 3 steps. In general m items can be added together in ceiling(logb(m)) steps, where b is the fan-in of the adders (two in the diagram). If we assume enough processors in the addition unit pool, then the addition step requires

ceiling(logb(nk-1)) = ceiling((k-1)logb(n)) steps, for a total of

k + ceiling((k-1)logb(n)) (2) steps.

If b is made large enough, then the second term can be made small, though always at least one.

Neurons may have of the order of 10,000 input connections, but as not all connections may be appropriate to the computation, 10,000 should probably be regarded as the upper limit. A two-level cascade of 10,000-to-1 adders permits addition of 100 million binding units, so we could approximate the second term by the constant 2. Thus at most k+2 steps are required to access a missing component of a relational instance.

Similar computations for a sequential implementation to access the kth component of a relational instance, given arguments a1,...,ak-1 yields the expression (2k-1)nk for the number of steps. The important finding however is that with full parallel implementation the tensor product representation is not expensive in terms of time (number of steps), but its spatial complexity is quite large, as shown below.

5.2.1.2. Space complexity.

The basic requirement is for the nk binding units of the rank k tensor and the kn input/output units. In addition to this, in a parallel implementation there would be a need for cascade adders for each side of the tensor network. Assuming that a two-stage cascade is adequate (i.e. that b2 >= nk-1, so that b >= n(k-1)/2), each cascade would use at most b + 1 adders, and there would be n cascades per side (one for each component) and k sides - at most nk(b + 1) adders in all. The nk binding units dominate the space complexity, providing of course that b is not larger than necessary, that is, not significantly larger than n(k-1)/2. Therefore the limiting factor with this representation is the number of binding units, which increases exponentially with dimensionality.

The representation of a single relational instance is quite expensive in terms of space, requiring nk units to represent that instance, but all combinatorially possible other relational instances can then be represented in the same tensor. (Here n is the length of the vectors, and k is the number of vectors; i.e. one more than the number of arguments). Thus superposition does not incur additional computational cost.

To achieve dynamic binding, the binding units must be interpreted as activations as noted in 4.1.1.1, and activations demand processing resources (Just & Carpenter, 1992). Therefore the rank of relations will be limited by resources available, that is by capacity as defined in 2.1.5.

This is a soft limit, because tensor product representations have the property of graceful degradation (Wilson & Halford, 1994). More recent simulations in our laboratory have extended this finding to tensor product networks of ranks up to 7. For example, a rank 5 tensor of side 16 (i.e. n=16, k=5) with up to 93.75 percent of of the binding units deleted, reliably distinguished stored facts (relational instances) from nonfacts. Such a tensor has the same number of active binding units as an intact rank 4 tensor with side 16. Our results suggest that the robustness depends on the ratio of number facts stored to number of binding units: the lower the ratio the more robust the network. Provided the value of n, the number of components in each representation vector, is reasonably large (32 was typically adequate in our simulations, for tensors of rank 3 and up, and up to 4000 facts stored) it appears to be possible to simulate a rank k+1 tensor with the number of binding units available to an (intact) rank k tensor, for k = 2 to 6 (at least). Another way of looking at this is to say that the apparently very regimented architecture of a tensor product network is not necessary in order to achieve acceptable memory performance, as 85% or more of the binding units can be removed (i.e. caused to have zero output) with impunity.

5.2.2 Complexity for the role-filler method.

Relations can be represented using role-filler bindings, as explained in 4.1.1.1, provided relational instances are identified. We will assume this is done using a separate set of units, because of the implausibility of an identifying code. Smolensky (1990) used tensor products, and Plate (1994, Appendix I) used circular convolution. For the purposes of direct comparison with 5.2.1, we calculate the time and space complexity of accessing relations using the tensor product. Again, we consider the case of accessing the kth argument of a k-ary relation given k-1 arguments.

5.2.2.1. Time complexity.

Given that the complex cue has already been composed, then the role-filler method of two major steps: (1) determining the tensor (representing the relational instance) with the highest similarity (dot product) to the complex cue; and (2) retrieving the target from that tensor.

Assuming k roles and n fillers, then each relational instance requires nk binding units. Further, since each relatio