Keith E. Stanovich & Richard
F. West (2000) Individual Differences
in Reasoning: Implications for the Rationality Debate?
Behavioral and Brain Sciences 22 (5): XXX-XXX.
This is the unedited draft of a BBS target article that has been accepted for publication (Copyright 1999: Cambridge University Press U.K./U.S. -- publication date provisional) and is currently being circulated for Open Peer Commentary. This preprint is for inspection only, to help prospective commentators decide whether or not they wish to prepare a formal commentary. Please do not prepare a commentary unless you have received the hard copy, invitation, instructions and deadline information.
For information on becoming a commentator on this or other BBS target articles, write to: bbs@soton.ac.uk
For information about subscribing or purchasing offprints of the published version, with commentaries and author's response, write to: journals_subscriptions@cup.org (North America) or journals_marketing@cup.cam.ac.uk (All other countries).
Individual Differences
in Reasoning:
Implications for the
Rationality Debate?
Keith E. Stanovich
Department of Human Development and
Applied Psychology
University of Toronto
252 Bloor Street West
Toronto, ON
Canada M5S 1V6
kstanovich@oise.utoronto.ca
Richard F. West
School of Psychology
MSC 7401
James Madison University
Harrisonburg, VA 22807
USA
westrf@jmu.edu
![]() |
Keith E. Stanovich is Professor of Human Development and Applied Psychology at the University of Toronto. He is the author of over 125 scientific articles in the areas of literacy and reasoning, including Who Is Rational? Studies of Individual Differences in Reasoning (Erlbaum, 1999). He is a Fellow of APA and APS and has received the Sylvia Scribner Award from the American Educational Research Association for contributions to research. |
![]() |
Richard F. West is a Professor in the
School of Psychology at James Madison University, where he has been named
a Madison Scholar. He received his Ph.D. in Psychology from the University
of Michigan. The author of over 50 publications, his main scientific interests
are the study of rational thought, reasoning, decision making, the cognitive
consequences of literacy, and cognitive processes of reading.
|
Abstract
Much research in the last two decades
has demonstrated that human responses deviate from the performance deemed
normative according to various models of decision making and rational judgment
(e.g., the basic axioms of utility theory). This gap between the normative
and the descriptive can be interpreted as indicating systematic irrationalities
in human cognition. However, four alternative interpretations preserve
the assumption that human behavior and cognition is largely rational. These
explanations posit that the gap is due to (1) performance errors, (2) computational
limitations, (3) the wrong norm being applied by the experimenter and (4)
a different construal of the task by the subject. In the debates about
the viability of these alternative explanations, attention has been focused
too narrowly on the modal response. In a series of experiments involving
most of the classic tasks in the heuristics and biases literature, we have
examined the implications of individual differences in performance for
each of the four explanations of the normative and descriptive gap. Performance
errors are a minor factor in the gap, computational limitations underlie
non-normative responding on several tasks, particularly those that involve
some type of cognitive decontextualization. Unexpected patterns of covariance
can suggest when the wrong norm is being applied to a task or when an alternative
construal of the task is called for.
Keywords:
rationality, normative models, descriptive
models, heuristics, biases, reasoning, individual differences
Individual Differences
in Reasoning:
Implications for the
Rationality Debate?
1. Introduction
A substantial research literature--one comprising literally hundreds of empirical studies conducted over nearly three decades--has firmly established that people's responses often deviate from the performance considered normative on many reasoning tasks. For example, people assess probabilities incorrectly, they display confirmation bias, they test hypotheses inefficiently, they violate the axioms of utility theory, they do not properly calibrate degrees of belief, they overproject their own opinions onto others, they allow prior knowledge to become implicated in deductive reasoning, and they display numerous other information processing biases (for summaries of the large literature, see Baron, 1994, 1998; Evans, 1989; Evans & Over, 1996; Kahneman, Slovic, & Tversky, 1982; Newstead & Evans, 1995; Nickerson, 1998; Osherson, 1995; Piattelli-Palmarini, 1994; Plous, 1993; Reyna, Lloyd, & Brainerd, in press; Shafir, 1994; Shafir & Tversky, 1995). Indeed, demonstrating that descriptive accounts of human behavior diverged from normative models was a main theme of the so-called heuristics and biases literature of the 1970s and early 1980s (see Arkes & Hammond, 1986; Kahneman, Slovic, & Tversky, 1982).
The interpretation of the gap between descriptive models and normative models in the human reasoning and decision making literature has been the subject of contentious debate for almost two decades now (a substantial portion of that debate appearing in this journal; for summaries, see Baron, 1994; Cohen, 1981, 1983; Evans & Over, 1996; Gigerenzer, 1996a; Kahneman, 1981; Kahneman & Tversky, 1983, 1996; Koehler, 1996; Stein, 1996). The debate has arisen because some investigators wished to interpret the gap between the descriptive and the normative as indicating that human cognition was characterized by systematic irrationalities. Due to the emphasis that these theorists place on reforming human cognition, they were labelled the Meliorists by Stanovich (1999). Disputing this contention were numerous investigators (termed the Panglossians, see Stanovich, 1999) who argued that there were other reasons why reasoning might not accord with normative theory (see Cohen, 1981 and Stein, 1996 for extensive discussions of the various possibilities)--reasons that prevent the ascription of irrationality to subjects. First, instances of reasoning might depart from normative standards due to performance errors--temporary lapses of attention, memory deactivation, and other sporadic information processing mishaps. Second, there may be stable and inherent computational limitations that prevent the normative response (Cherniak, 1986; Goldman, 1978; Harman, 1995; Oaksford & Chater, 1993, 1995, 1998; Stich, 1990). Third, in interpreting performance, we might be applying the wrong normative model to the task (Koehler, 1996). Alternatively, we may be applying the correct normative model to the problem as set, but the subject might have construed the problem differently and be providing the normatively appropriate answer to a different problem (Adler, 1984, 1991; Berkeley & Humphreys, 1982; Broome, 1990; Hilton, 1995; Schwarz, 1996).
However, in referring to the various alternative explanations (other than systematic irrationality) for the normative/descriptive gap, Rips (1994) warns that "a determined skeptic can usually explain away any instance of what seems at first to be a logical mistake" (p. 393). In an earlier criticism of Henle's (1978) Panglossian position, Johnson-Laird (1983) made the same point: "There are no criteria independent of controversy by which to make a fair assessment of whether an error violates logic. It is not clear what would count as crucial evidence, since it is always possible to provide an alternative explanation for an error." (p. 26). The most humorous version of this argument was made by Kahneman (1981) in his dig at the Panglossians who seem to have only two categories of errors, "pardonable errors by subjects and unpardonable ones by psychologists" (p. 340). Referring to the four classes of alternative explanation discussed above--performance errors, computational limitations, alternative problem construal, and incorrect norm application--Kahneman notes that Panglossians have "a handy kit of defenses that may be used if [subjects are] accused of errors: temporary insanity, a difficult childhood, entrapment, or judicial mistakes--one of them will surely work, and will restore the presumption of rationality" (p. 340).
These comments by Rips (1994), Johnson-Laird (1983), and Kahneman (1981) highlight the need for principled constraints on the alternative explanations of normative/descriptive discrepancies. In this target article we describe a research logic aimed at inferring such constraints from patterns of individual differences that are revealed across a wide range of tasks in the heuristics and biases literature. We argue here--using selected examples of empirical results (Stanovich, 1999; Stanovich & West, 1998a, 1998b, 1998c, 1998d, 1999)--that these individual differences and their patterns of covariance have implications for explanations of why human behavior often departs from normative models1.
Panglossian theorists who argue that discrepancies between actual responses and those dictated by normative models are not indicative of human irrationality (e.g., Cohen, 1981) sometimes attribute the discrepancies to performance errors. Borrowing the idea of a competence/performance distinction from linguists (see Stein, 1996, pp. 8-9), these theorists view performance errors as the failure to apply a rule, strategy, or algorithm that is part of a person's competence because of a momentary and fairly random lapse in ancillary processes necessary to execute the strategy (lack of attention, temporary memory deactivation, distraction, etc.). Stein (1996) explains the idea of a performance error by referring to a "mere mistake"--a more colloquial notion that involves "a momentary lapse, a divergence from some typical behavior. This is in contrast to attributing a divergence from norm to reasoning in accordance with principles that diverge from the normative principles of reasoning. Behavior due to irrationality connotes a systematic divergence from the norm" (p. 8). Similarly, in the heuristics and biases literature, the term bias is reserved for systematic deviations from normative reasoning and does not refer to transitory processing errors ("a bias is a source of error which is systematic rather than random," Evans, 1984, p. 462).
Another way to think of the performance error explanation is to conceive of it within the true score/measurement error framework of classical test theory. Mean or modal performance might be viewed as centered on the normative response--the response all people are trying to approximate. However, scores will vary around this central tendency due to random performance factors (error variance).
It should be noted that Cohen (1981) and Stein (1996) sometimes encompass computational limitations within their notion of a performance error. In the present target article, the two are distinguished even though both are identified with the algorithmic level of analysis (see Anderson, 1990; Marr, 1982; and the discussion below on levels of analysis in cognitive theory) because they have different implications for covariance relationships across tasks. Here, performance errors represent algorithmic-level problems that are transitory in nature. Nontransitory problems at the algorithmic level that would be expected to recur on a readministration of the task are termed computational limitations.
This notion of a performance error as a momentary attention, memory, or processing lapse that causes responses to appear nonnormative even when competence is fully normative has implications for patterns of individual differences across reasoning tasks. For example, the strongest possible form of this view is that all discrepancies from normative responses are due to performance errors. This strong form of the hypothesis has the implication that there should be virtually no correlations among nonnormative processing biases across tasks. If each departure from normative responding represents a momentary processing lapse due to distraction, carelessness, or temporary confusion, then there is no reason to expect covariance among biases across tasks (or covariance among items within tasks, for that matter) because error variances should be uncorrelated.
In contrast, positive manifold (uniformly positive bivariate associations in a correlation matrix) among disparate tasks in the heuristics and biases literature--and among items within tasks--would call into question the notion that all variability in responding can be attributable to performance errors. This was essentially Rips and Conrad's (1983) argument when they examined individual differences in deductive reasoning: "Subjects' absolute scores on the propositional tests correlated with their performance on certain other reasoning tests....If the differences in propositional reasoning were merely due to interference from other performance factors, it would be difficult to explain why they correlate with these tests" (p. 282-283). In fact, a parallel argument has been made in economics where, as in reasoning, models of perfect market rationality are protected from refutation by positing the existence of local market mistakes of a transitory nature (temporary information deficiency, insufficient attention due to small stakes, distractions leading to missed arbitrage opportunities, etc.).
Advocates of perfect market rationality in economics admit that people make errors but defend their model of idealized competence by claiming that the errors are essentially random. The following defense of the rationality assumption in economics is typical in the way it defines performance errors as unsystematic: "In mainstream economics, to say that people are rational is not to assume that they never make mistakes, as critics usually suppose. It is merely to say that they do not make systematic mistakes--i.e., that they do not keep making the same mistake over and over again" (The Economist, December 12, 1998, p. 80). Not surprisingly, others have attempted to refute the view that the only mistakes in economic behavior are unpredictable performance errors by pointing to the systematic nature of some of the mistakes: "The problem is not just that we make random computational mistakes; rather it is that our judgmental errors are often systematic" (Frank, 1990, p. 54). Likewise, Thaler (1992) argues that "a defense in the same spirit as Friedman's is to admit that of course people make mistakes, but the mistakes are not a problem in explaining aggregate behavior as long as they tend to cancel out. Unfortunately, this line of defense is also weak because many of the departures from rational choice that have been observed are systematic" (pp. 4-5). Thus, in parallel to our application of an individual differences methodology to the tasks in the heuristics and biases literature, Thaler argues that variance and covariance patterns can potentially falsify some applications of the performance error argument in the field of economics.
Thus, as in economics, we distinguish systematic from unsystematic deviations from normative models. The latter we label performance errors and view them as inoculating against attributions of irrationality. Just as random, unsystematic errors of economic behavior do not impeach the model of perfect market rationality, transitory and random errors in thinking on a heuristics and biases problem do not impeach the Panglossian assumption of ideal rational competence. Systematic and repeatable failures in algorithmic-level functioning likewise do not impeach intentional-level rationality, but they are classified as computational limitations in our taxonomy and are discussed in Section 3. Systematic mistakes not due to algorithmic-level failure do call into question whether the intentional-level description of behavior is consistent with the Panglossian assumption of perfect rationality--provided the normative model being applied is not inappropriate (see Section 4) or that the subject has not arrived at a different, intellectually-defensible interpretation of the task (see Section 5).
In several studies, we have found very little evidence for the strong version of the performance error view. With virtually all of the tasks from the heuristics and biases literature that we have examined, there is considerable internal consistency. Further, at least for certain classes of task, there are significant cross-task correlations. For example, in two different studies (Stanovich & West, 1998c) we found correlations in the range of .25 to .40 (considerably higher when corrected for attenuation) among the following measures:
1. Nondeontic versions of Wason's (1966) selection task: The subject is shown four cards lying on a table showing two letters and two numbers (A, D, 3, 7). They are told that each card has a number on one side and a letter on the other and that the experimenter has the following rule (of the if P, then Q type) in mind with respect to the four cards: "If there is an A on one side then there is a 3 on the other". The subject is then told that he/she must turn over whichever cards are necessary to determine whether the experimenter's rule is true or false. Only a small number of subjects make the correct selections of the A card (P) and 7 card (not-Q) and, as a result, the task has generated a substantial literature (Evans, Newstead, & Byrne, 1993; Johnson-Laird, 1999; Newstead & Byrne, 1995).
2. A syllogistic reasoning task in which logical validity conflicted with the believability of the conclusion (see Evans, Barston, & Pollard, 1983). An example item is: All mammals walk. Whales are mammals. Conclusion: Whales walk
3. Statistical reasoning problems of the type studied by the Nisbett group (e.g., Fong, Krantz, & Nisbett, 1986) and inspired by the finding that human judgment is overly influenced by vivid but unrepresentative personal and case evidence and under-influenced by more representative and diagnostic, but pallid, statistical evidence. The quintessential problem involves choosing between contradictory car purchase recommendations--one from a large-sample survey of car buyers and the other the heartfelt and emotional testimony of a single friend.
4. A covariation detection task modeled on the work of Wasserman, Dorner, and Kao (1990). Subjects evaluated data derived from a 2 x 2 contingency matrix.
5. A hypothesis testing task modeled on Tschirgi (1980) in which the score on the task was the number of times subjects attempted to test a hypothesis in a manner that did not unconfound variables.
6. A measure of outcome bias modelled on the work of Baron and Hershey (1988). This bias is demonstrated when subjects rate a decision with a positive outcome as superior to a decision with a negative outcome even when the information available to the decision maker was the same in both cases.
7. A measure of if/only thinking bias (Epstein, Lipson, Holstein, & Huh, 1992; Miller, Turnbull, & McFarland, 1990). If/only bias refers to the tendency for people to have differential responses to outcomes based on the differences in counterfactual alternative outcomes that might have occurred. The bias is demonstrated when subjects rate a decision leading to a negative outcome as worse than a control condition when the former makes it easier to imagine a positive outcome occurring.
8. An argument evaluation task (Stanovich
& West, 1997) that tapped reasoning skills of the type studied in the
informal reasoning literature (Baron, 1995; Klaczynski, Gordon, & Fauth,
1997; Perkins, Farady, & Bushey, 1991). Importantly, it was designed
so that to do well on it one had to adhere to a stricture not to implicate
prior belief in the evaluation of the argument.
Patterns of individual differences have implications that extend beyond testing the view that discrepancies between descriptive models and normative models arise entirely from performance errors. For example, patterns of individual differences also have implications for prescriptive models of rationality. Prescriptive models specify how reasoning should proceed given the limitations of the human cognitive apparatus and the situational constraints (e.g., time pressure) under which the decision maker operates (Baron, 1985). Thus, normative models might not always be prescriptive for a given individual and situation. Judgments about the rationality of actions and beliefs must take into account the resource-limited nature of the human cognitive apparatus (Cherniak, 1986; Goldman, 1978; Harman, 1995; Oaksford & Chater, 1993, 1995, 1998; Stich, 1990). More colloquially, Stich (1990) has argued that "it seems simply perverse to judge that subjects are doing a bad job of reasoning because they are not using a strategy that requires a brain the size of a blimp" (p. 27).
Following Dennett (1987) and the taxonomy of Anderson (1990; see also, Marr, 1982; Newell, 1982), we distinguish the algorithmic/design level from the rational/intentional level of analysis in cognitive science (the first term in each pair is that preferred by Anderson, the second that preferred by Dennett). The latter provides a specification of the goals of the system's computations (what the system is attempting to compute and why). At this level, we are concerned with the goals of the system, beliefs relevant to those goals, and the choice of action that is rational given the system's goals and beliefs (Anderson, 1990; Bratman, Israel, & Pollack, 1991; Dennett, 1987; Newell, 1982, 1990; Pollock, 1995). However, even if all humans were optimally rational at the intentional level of analysis, there may still be computational limitations at the algorithmic level (e.g., Cherniak, 1986; Goldman, 1978; Oaksford & Chater, 1993, 1995). We would therefore still expect individual differences in actual performance (despite equal rational-level competence) due to differences at the algorithmic level.
Using such a framework, we view the magnitude of the correlation between performance on a reasoning task and cognitive capacity as an empirical clue about the importance of algorithmic limitations in creating discrepancies between descriptive and normative models. A strong correlation suggests important algorithmic-level limitations that might make the normative response not prescriptive for those of lower cognitive capacity (Panglossian theorists drawn to this alternative explanation of normative/descriptive gaps were termed Apologists by Stanovich, 1999). In contrast, the absence of a correlation between the normative response and cognitive capacity suggests no computational limitation and thus no reason why the normative response should not be considered prescriptive (see Baron, 1985).
In our studies, we have operationalized cognitive capacity in terms of well-known cognitive ability (intelligence) and academic aptitude tasks2 but have most often used the total score on the Scholastic Aptitude Test3,4. All are known to load highly on psychometric g (Carpenter, Just, & Shell, 1990; Carroll, 1993; Matarazzo, 1972), and such measures have been linked to neurophysiological and information processing indicators of efficient cognitive computation (Caryl, 1994; Deary, 1995; Deary & Stough, 1996; Detterman, 1994; Fry & Hale, 1996; Hunt, 1987; Stankov & Dunn, 1993; Vernon, 1991, 1993). Furthermore, measures of general intelligence have been shown to be linked to virtually all of the candidate subprocesses of mentality that have been posited as determinants of cognitive capacity (Carroll, 1993). For example, working memory is the quintessential component of cognitive capacity (in theories of computability, computational power often depends on memory for the results of intermediate computations). Consistent with this interpretation, Bara, Bucciarelli, and Johnson-Laird, (1995) have found that "as working memory improves--for whatever reason--it enables deductive reasoning to improve too" (p. 185). But it has been shown that, from a psychometric perspective, variation in working memory is almost entirely captured by measures of general intelligence (Kyllonen, 1996; Kyllonen & Christal, 1990).
Measures of general cognitive ability such as those utilized in our research are direct marker variables for Spearman's (1904, 1927) positive manifold--that performance on all reasoning tasks tends to be correlated. Below, we will illustrate how we use this positive manifold to illuminate reasons for the normative/descriptive gap.
Table 1 indicates the magnitude of the correlation between one such measure--Scholastic Aptitude Test total scores--and the eight different reasoning tasks studied by Stanovich and West (1998c, Experiments 1 and 2) and mentioned in the previous section. In Experiment 1, syllogistic reasoning in the face of interfering content displayed the highest correlation (.470) and the other three correlations were roughly equal in magnitude (.347 to .394). All were statistically significant (p < .001). The remaining correlations in the table are the results from a replication and extension experiment. Three of the four tasks from the previous experiment were carried over (all but the selection task) and displayed correlations similar in magnitude to those obtained in the first experiment. The correlations involving the four new tasks introduced in Experiment 2 were also all statistically significant. The sign on the hypothesis testing, outcome bias, and if/only thinking tasks was negative because high scores on these tasks reflect susceptibility to non-normative cognitive biases. The correlations on the four new tasks were generally lower (range .172 to .239) than the correlations involving the other tasks (.371 to .410). The scores on all of the tasks in Experiment 2 were standardized and summed to yield a composite score. The composite's correlation with SAT scores was .547. It thus appears that to a moderate extent, discrepancies between actual performance and normative models can be accounted for by variation in computational limitations at the algorithmic level--at least with respect to the tasks investigated in these particular experiments.
However, there are some tasks in the heuristics and biases literature which lack any association at all with cognitive ability. The so-called false consensus effect in the opinion prediction paradigm (Krueger & Clement, 1994; Krueger & Zeiger, 1993) displays complete dissociation with cognitive ability (Stanovich, 1999; Stanovich & West, 1998c). Likewise, the overconfidence effect in the knowledge calibration paradigm (e.g., Lichtenstein, Fischhoff, & Phillips, 1982) displays a negligible correlation with cognitive ability (Stanovich, 1999; Stanovich & West, 1998c).
Collectively, these results indicate that computational limitations seem far from absolute. That is, although computational limitations appear implicated to some extent in many of the tasks, the normative responses for all of them were computed by some university students who had modest cognitive abilities (e.g., below the mean in a university sample). Such results help to situate the relationship between prescriptive and normative models for the tasks in question because the boundaries of prescriptive recommendations for particular individuals might be explored by examining the distribution of the cognitive capacities of individuals who gave the normative response on a particular task. For most of these tasks, only a small number of the students with the very lowest cognitive ability in this sample would have prescriptive models for any of these tasks that deviated substantially from the normative model for computational reasons. Such findings also might be taken to suggest that perhaps other factors might account for variation--a prediction that will be confirmed when work on styles of epistemic regulation is examined in section 7. Of course, the deviation between the normative and prescriptive model due to computational limitations will certainly be larger in unselected or nonuniversity populations. This point also serves to reinforce the caveat that the correlations observed in Table 1 were undoubtedly attenuated due to restriction of range in the sample. Nevertheless, if the normative/prescriptive gap is indeed modest, then there may well be true individual differences at the intentional level--that is, true individual differences in rational thought.
All of the camps in the dispute about human rationality recognize that positing computational limitations as an explanation for differences between normative and descriptive models is a legitimate strategy. Meliorists agree on the importance of assessing such limitations. Likewise, Panglossians will, when it is absolutely necessary, turn themselves into Apologists to rescue subjects from the charge of irrationality. Thus, they too acknowledge the importance of assessing computational limitations. In the next section, however, we examine an alternative explanation of the normative/descriptive gap that is much more controversial--the notion that incorrect normative models have been applied to certain tasks in the heuristics and biases literature.
4. Applying the Wrong Normative Model
The possibility of incorrect norm application arises because psychologists must appeal to the normative models of other disciplines (statistics, logic, etc.) in order to interpret the responses on various tasks, and these models must be applied to a particular problem or situation. Matching a problem to a normative model is rarely an automatic or clear cut procedure. The complexities involved in matching problems to norms make possible the argument that the gap between the descriptive and normative occurs because psychologists are applying the wrong normative model to the situation. It is a potent strategy for the Panglossian theorist to use against the advocate of Meliorism and such claims have become quite common in critiques of the heuristics and biases literature:
"many critics have insisted that in fact it is Kahneman & Tversky, not their subjects, who have failed to grasp the logic of the problem" (Margolis, 1987, p. 158).These quotations reflect the numerous ongoing critiques of the heuristics and biases literature in which it is argued that the wrong normative standards have been applied to performance. For example, Lopes (1982) has argued that the literature on the inability of human subjects to generate random sequences (e.g., Wagenaar, 1972) has adopted a narrow concept of randomness that does not acknowledge broader conceptions that are debated in the philosophy and mathematics literature. Birnbaum (1983) has demonstrated that conceptualizing the well-known taxicab base-rate problem (see Bar-Hillel, 1980; Tversky & Kahneman, 1982) within a signal-detection framework can lead to different estimates than those assumed to be normatively correct under the less flexible Bayesian model that is usually applied. Gigerenzer (1991a, 1991b, 1993; Gigerenzer et al., 1991) has argued that the overconfidence effect in knowledge calibration experiments (Lichtenstein, Fischhoff, & Phillips, 1982) and the conjunction effect in probability judgment (Tversky & Kahneman, 1983) have been mistakenly classified as a cognitive biases because of the application of an inappropriate normative model of probability assessment (i.e., requests for single-event subjective judgments when under some conceptions of probability such judgments are not subject to the rules of a probability calculus). Dawes (1989, 1990) and Hoch (1987) have argued that social psychologists have too hastily applied an overly simplified normative model in labeling performance in opinion prediction experiments as displaying a so-called false consensus (see also Krueger & Clement, 1994; Krueger & Zeiger, 1993)."if a 'fallacy' is involved, it is probably more attributable to the researchers than to the subjects" (Messer & Griggs, 1993, p. 195).
"When ordinary people reject the answers given by normative theories, they may do so out of ignorance and lack of expertise, or they may be signaling the fact that the normative theory is inadequate" (Lopes, 1981, p. 344).
"in the examples of alleged base rate fallacy considered by Kahneman and Tversky, they, and not their experimental subjects, commit the fallacies" (Levi, 1983, p. 502).
"what Wason and his successors judged to be the wrong response is in fact correct" (Wetherick, 1993, p. 107).
"Perhaps the only people who suffer any illusion in relation to cognitive illusions are cognitive psychologists" (Ayton & Hardman, 1997, p. 45).
4.1 From the Descriptive to the Normative in Reasoning and Decision Making
The cases just mentioned provide examples of how the existence of deviations between normative models and actual human reasoning have been called into question by casting doubt on the appropriateness of the normative models used to evaluate performance. Stein (1996, p. 239) terms this the "reject-the-norm" strategy. It is noteworthy that this strategy is used exclusively by the Panglossian camp in the rationality debate, although this connection is not a necessary one. Specifically, the reject-the-norm-application strategy is exclusively used to eliminate gaps between descriptive models of performance and normative models. When this type of critique is employed, the normative model that is suggested as a substitute for the one traditionally used in the heuristics and biases literature is one that coincides perfectly with the descriptive model of the subjects' performance--thus preserving a view of human rationality as ideal. It is rarely noted that the strategy could be used in just the opposite way--to create gaps between the normative and descriptive. Situations where the modal response coincides with the standard normative model could be critiqued, and alternative models could be suggested that would result in a new normative/descriptive gap. But this is never done. The Panglossian camp, often highly critical of empirical psychologists ("Kahneman and Tversky...and not their experimental subjects, commit the fallacies" Levi, 1983, p. 502), is never critical of psychologists who design reasoning tasks in instances where the modal subject gives the response the experimenters deem correct. Ironically, in these cases, according to the Panglossians, the same psychologists seem never to err in their task designs and interpretations.
The fact that the use of the reject-the-norm-application strategy is entirely contingent on the existence or nonexistence of a normative/descriptive gap suggests that the strategy is empirically, not conceptually, triggered (normative applications are never rejected for purely conceptual reasons when they coincide with the modal human response). What this means is that in an important sense the norms being endorsed by the Panglossian camp are conditioned (if not indexed entirely) by descriptive facts about human behavior. The debate itself is, reflexively, evidence that the descriptive models of actual behavior condition expert notions of the normative. That is, there would have been no debate (or at least much less of one) had people behaved in accord with the then-accepted norms.
Gigerenzer (1991b) is clear about his adherence to an empirically-driven reject-the-norm-application strategy: "Since its origins in the mid-seventeenth century....When there was a striking discrepancy between the judgment of reasonable men and what probability theory dictated--as with the famous St. Petersburg paradox--then the mathematicians went back to the blackboard and changed the equations (Daston, 1980). Those good old days have gone....If, in studies on social cognition, researchers find a discrepancy between human judgment and what probability theory seems to dictate, the blame is now put on the human mind, not the statistical model" (p. 109).
One way of framing the current debate between the Panglossians and Meliorists is to observe that the Panglossians wish for a return of the "good old days" where the normative was derived from the intuitions of the untutored layperson ("an appeal to people's intuitions is indispensable," Cohen, 1981, p. 318); whereas the Meliorists (with their greater emphasis on the culturally constructed nature of norms) view the mode of operation during the "good old days" as a contingent fact of history--the product of a period when few aspects of epistemic and pragmatic rationality had been codified and preserved for general diffusion through education.
Thus, the Panglossian reject-the-norm-application view can in essence be seen as a conscious application of the naturalistic fallacy (deriving ought from is). For example, Cohen (1981), like Gigerenzer, feels that the normative is indexed to the descriptive in the sense that a competence model of actual behavior can simply be interpreted as the normative model. Stein (1996) notes that proponents of this position believe that the normative can simply be "read off" from a model of competence because "whatever human reasoning competence turns out to be, the principles embodied in it are the normative principles of reasoning" (p. 231). Although both endorse this linking of the normative to the descriptive, Gigerenzer (1991b) and Cohen (1981) do so for somewhat different reasons. For Cohen (1981), it follows from his endorsement of narrow reflective equilibrium as the sine qua non of normative justification. Gigerenzer's (1991b) endorsement is related to his position in the "cognitive ecologist" camp (to use Piattelli-Palmarini's, 1994, p. 183 term) with its emphasis on the ability of evolutionary mechanisms to achieve an optimal Brunswikian tuning of the organism to the local environment (Brase, Cosmides, & Tooby, 1998; Cosmides & Tooby, 1994, 1996; Oaksford & Chater, 1994, 1998; Pinker, 1997).
That Gigerenzer and Cohen concur here--even though they have somewhat different positions on normative justification--simply shows how widespread is the acceptance of the principle that descriptive facts about human behavior condition our notions about the appropriateness of the normative models used to evaluate behavior. In fact, stated in such broad form, this principle is not restricted to the Panglossian position. For example, in decision science, there is a long tradition of acknowledging descriptive influences when deciding which normative model to apply to a particular situation. Slovic (1995) refers to this "deep interplay between descriptive phenomena and normative principles" (p. 370). Larrick, Nisbett, and Morgan (1993) have reminded us that "there is also a tradition of justifying, and amending, normative models in response to empirical considerations" (p. 332). March (1988) refers to this tradition when he discusses how actual human behavior has conditioned models of efficient problem solving in artificial intelligence and in the area of organizational decision making. The assumptions underlying the naturalistic project in epistemology (e.g., Kornblith, 1985, 1993) have the same implication--that findings about how humans form and alter beliefs should have a bearing on which normative theories are correctly applied when evaluating the adequacy of belief acquisition. This position is in fact quite widespread:
"if people's (or animals') judgments do not match those predicted by a normative model, this may say more about the need for revising the theory to more closely describe subjects' cognitive processes than it says about the adequacy of those processes" (Alloy & Tabachnik, 1984, p. 140).Of course, in this discussion we have conjoined disparate views that are actually arrayed on a continuum. The reject-the-norm advocates represent the extreme form of this view--they simply want to read off the normative from the descriptive: "the argument under consideration here rejects the standard picture of rationality and takes the reasoning experiments as giving insight not just into human reasoning competence but also into the normative principles of reasoning" (Stein, 1996, p. 233). In contrast, other theorists (e.g., March, 1988) simply want to subtly fine-tune and adjust normative applications based on descriptive facts about reasoning performance."We must look to what people do in order to gather materials for epistemic reconstruction and self-improvement" (Kyburg, 1991, p. 139).
"When ordinary people reject the answers given by normative theories, they may do so out of ignorance and lack of expertise, or they may be signaling the fact that the normative theory is inadequate" (Lopes, 1981, p. 344).
One thing that all of the various camps
in the rationality dispute have in common is that each conditions their
beliefs about the appropriate norm to apply based on the centraltendency
of the responses to a problem. They all seem to see that single aspect
of performance as the only descriptive fact that is relevant to conditioning
their views about the appropriate normative model to apply. For example,
advocates of the reject-the-norm-application strategy for dealing with
normative/descriptive discrepancies view the mean, or modal, response as
a direct pointer to the appropriate normative model. One goal of the present
research program is to expand the scope of the descriptive information
used to condition our views about appropriate norms.
4.2 Putting Descriptive Facts to Work: The Understanding/Acceptance Assumption
How should we interpret situations where the majority of individuals respond in ways that depart from the normative model applied to the problem by reasoning experts? Thagard (1982) calls the two different interpretations the populist strategy and the elitist strategy: "The populist strategy, favored by Cohen (1981), is to emphasize the reflective equilibrium of the average person....The elitist strategy, favored by Stich and Nisbett (1980), is to emphasize the reflective equilibrium of experts" (p. 39). Thus, Thagard (1982) identifies the populist strategy with the Panglossian position and the elitist strategy with the Meliorist position.
But there are few controversial tasks in the heuristics and biases literature where all untutored laypersons disagree with the experts. There are always some who agree. Thus, the issue is not the untutored average person versus experts (as suggested by Thagard's formulation), but experts plus some laypersons versus other untutored individuals. Might the cognitive characteristics of those departing from expert opinion have implications for which normative model we deem appropriate? Larrick, Nisbett, and Morgan (1993) made just such an argument in their analysis of what justified the cost-benefit reasoning of microeconomics: "Intelligent people would be more likely to use cost-benefit reasoning. Because intelligence is generally regarded as being the set of psychological properties that makes for effectiveness across environments...intelligent people should be more likely to use the most effective reasoning strategies than should less intelligent people" (p. 333). Larrick et al. (1993) are alluding to the fact that we may want to condition our inferences about appropriate norms based not only on what response the majority of people make but also on what response the most cognitively competent subjects make.
Slovic and Tversky (1974) made essentially this argument years ago, although it was couched in very different terms in their paper and thus was hard to discern. Slovic and Tversky (1974) argued that descriptive facts about argument endorsement should condition the inductive inferences of experts regarding appropriate normative principles. In response to the argument that there is "no valid way to distinguish between outright rejection of the axiom and failure to understand it" (p. 372), Slovic and Tversky observed that "the deeper the understanding of the axiom, the greater the readiness to accept it" (pp. 372-373). Slovic and Tversky (1974) argued that this understanding/acceptance congruence suggested that the gap between the descriptive and normative was due to an initial failure to fully process and/or understand the task.
We might call Slovic and Tversky's argument the understanding/acceptance assumption--that more reflective and engaged reasoners are more likely to affirm the appropriate normative model for a particular situation. From their understanding/acceptance principle, it follows that if greater understanding resulted in more acceptance of the axiom, then the initial gap between the normative and descriptive would be attributed to factors that prevented problem understanding (for example lack of ability or reflectiveness on the part of the subject). Such a finding would increase confidence in the normative appropriateness of the axioms and/or in their application to a particular problem. In contrast, if better understanding failed to result in greater acceptance of the axiom, then its normative status for that particular problem might be considered to be undermined.
Using their understanding/acceptance principle, Slovic and Tversky (1974) examined the Allais (1953) problem and found little support for the applicability of the independence axiom of utility theory (the axiom stating that if the outcome in some state of the world is the same across options, then that state of the world should be ignored; Baron, 1993; Savage, 1954). When presented with arguments to explicate both the Allais (1953) and Savage (1954) positions, subjects found the Allais argument against independence at least as compelling and did not tend to change their task behavior in the normative direction (see MacCrimmon, 1968 and MacCrimmon & Larsson, 1979 for more mixed results on the independence axiom using related paradigms). Although Slovic and Tversky (1974) failed to find support for this particular normative application, they presented a principle that may be of general usefulness in theoretical debates about why human performance deviates from normative models. The central idea behind Slovic and Tversky's (1974) development of the understanding/acceptance assumption is that increased understanding should drive performance in the direction of the truly normative principle for the particular situation--so that the direction that performance moves in response to increased understanding provides an empirical clue as to what is the proper normative model to be applied.
One might conceive of two generic strategies for applying the understanding/acceptance principle based on the fact that variation in understanding can be created or it can be studied by examining naturally occurring individual differences. Slovic and Tversky employed the former strategy by providing subjects with explicated arguments supporting the Allais or Savage normative interpretation (see also Doherty, Schiavo, Tweney, & Mynatt, 1981; Stanovich & West, 1999). Other methods of manipulating understanding have provided consistent evidence in favor of the normative principle of descriptive invariance (see Kahneman & Tversky, 1984). For example, it has been found that being forced to take more time or to provide a rationale for selections increases adherence to descriptive invariance (Larrick, Smith, & Yates, 1992; Miller & Fagley, 1991; Sieck & Yates, 1997; Takemura, 1992, 1993, 1994). Moshman and Geil (1998) found that group discussion facilitated performance on Wason's selection task.
As an alternative to
manipulating
understanding, the understanding/acceptance principle can be transformed
into an individual differences prediction. For example, the principle might
be interpreted as indicating that more reflective, engaged, and intelligent
reasoners are more likely to respond in accord with normative principles.
Thus, it might be expected that those individuals with cognitive/personality
characteristics more conducive to deeper understanding would be more accepting
of the appropriate normative principles for a particular problem. This
was the emphasis of Larrick et al. (1993) when they argued that more intelligent
people should be more likely to use cost-benefit principles. Similarly,
need for cognition--a dispositional variable reflecting the tendency toward
thoughtful analysis and reflective thinking--has been associated with aspects
of epistemic and practical rationality (Cacioppo, Petty, Feinstein, &
Jarvis, 1996; Kardash & Scholes, 1996; Klaczynski et al., 1997; Smith
& Levin, 1996; Verplanken, 1993). This particular application of the
understanding/acceptance principle derives from the assumption that a normative/descriptive
gap that is disproportionately created by subjects with a superficial understanding
of the problem provides no warrant for amending the application of standard
normative models.
4.3 Tacit Acceptance of the Understanding/Acceptance Principle as a Mechanism for Adjudicating Disputes About the Appropriate Normative Models to Apply
It is important to point out that many theorists on all sides of the rationality debate have acknowledged the force of the understanding/acceptance argument (without always labeling the argument as such or citing Slovic & Tversky, 1974). For example, Gigerenzer and Goldstein (1996) lament the fact that Apologist theorists who emphasize Simon's (1956, 1957, 1983) concept of bounded rationality seemingly accept the normative models applied by the heuristics and biases theorists by their assumption that, if computational limitations were removed, individuals' responses would indeed be closer to the behavior those models prescribe.
Lopes and Oden (1991) also wish to deny this tacit assumption in the literature on computational limitations: "discrepancies between data and model are typically attributed to people's limited capacity to process information....There is, however, no support for the view that people would choose in accord with normative prescriptions if they were provided with increased capacity" (pp. 208-209). In stressing the importance of the lack of evidence for the notion that people would "choose in accord with normative prescriptions if they were provided with increased capacity" (p. 209), Lopes and Oden (1991) acknowledge the force of the individual differences version of the understanding/acceptance principle--because examining variation in cognitive ability is just that: looking at what subjects who have "increased capacity" actually do with that increased capacity.
In fact, critics of the heuristics and biases literature have repeatedly drawn on an individual differences version of the understanding/acceptance principle to bolster their critiques. For example, Cohen (1982) critiques the older "bookbag and poker chip" literature on Bayesian conservatism (Phillips & Edwards, 1966; Slovic, Fischhoff, Lichtenstein, 1977) by noting that "if so-called 'conservatism' resulted from some inherent inadequacy in people's information-processing systems one might expect that, when individual differences in information-processing are measured on independently attested scales, some of them would correlate with degrees of 'conservatism.' In fact, no such correlation was found by Alker and Hermann (1971). And this is just what one would expect if 'conservatism' is not a defect, but a rather deeply rooted virtue of the system" (pp. 259-260). This is precisely how Alker and Hermann (1971) themselves argued in their paper: "Phillips et al. (1966) have proposed that conservatism is the result of intellectual deficiencies. If this is the case, variables such as rationality, verbal intelligence, and integrative complexity should have related to deviation from optimality--more rational, intelligent, and complex individuals should have shown less conservatism" (p. 40).
Wetherick (1971, 1995) has been a critic of the standard interpretation of the four-card selection task (Wason, 1966) for over 25 years. As a Panglossian theorist, he has been at pains to defend the modal response chosen by roughly 50% of the subjects (the P and Q cards). As did Cohen (1982) and Lopes and Oden (1991), Wetherick (1971) points to the lack of associations with individual differences to bolster his critique of the standard interpretation of the task: "in Wason's experimental situation subjects do not choose the not-Q card nor do they stand and give three cheers for the Queen, neither fact is interesting in the absence of a plausible theory predicting that they should....If it could be shown that subjects who choose not-Q are more intelligent or obtain better degrees than those who do not this would make the problem worth investigation, but I have seen no evidence that this is the case" (Wetherick, 1971, p. 213).
Funder (1987), like Cohen (1982) and Wetherick (1971), uses a finding about individual differences to argue that a particular attribution bias is not necessarily produced by a process operating suboptimally. Block and Funder (1986) analyzed the role effect observed by Ross, Amabile, and Steinmetz (1977): that people rated questioners more knowledgeable than contestants in a quiz game. Although the role effect is usually viewed as an attributional error--people allegedly failed to consider the individual's role when estimating the knowledge displayed--Block and Funder (1986) demonstrated that subjects most susceptible to this attributional "error" were more socially competent, more well adjusted, and more intelligent. Funder (1987) argued that "manifestation of this 'error,' far from being a symptom of social maladjustment, actually seems associated with a degree of competence" (p. 82) and that the so-called error is thus probably produced by a judgmental process that is generally efficacious. In short, the argument is that the signs of the correlations with the individual difference variables point in the direction of the response that is produced by processes that are ordinarily useful.
Thus, Funder (1987), Lopes and Oden
(1991), Wetherick (1971), and Cohen (1982) all make recourse to patterns
of individual differences (or the lack of such patterns) to pump our intuitions
(Dennett, 1980) in the direction of undermining the standard interpretations
of the tasks under consideration. In other cases, however, examining individual
differences may actually reinforce confidence in the appropriateness of
the normative models applied to problems in the heuristics and biases literature.
4.4 The Understanding/Acceptance Principle and Spearman's Positive Manifold
With these arguments in mind, it is thus interesting to note that the direction of all of the correlations displayed in Table 1 is consistent with the standard normative models used by psychologists working in the heuristics and biases tradition. The directionality of the systematic correlations with intelligence are embarrassing for those reject-the-norm-application theorists who argue that norms are being incorrectly applied if we interpret the correlations in terms of the understanding/acceptance principle (a principle which, as seen in section 4.3, is endorsed in various forms by a host of Panglossian critics of the heuristics and biases literature). Surely we would want to avoid the conclusion that individuals with more computational power are systematically computing the nonnormative response. Such an outcome would be an absolute first in a psychometric field that is one hundred years and thousands of studies old (Brody, 1997; Carroll, 1993, 1997; Lubinski & Humphreys, 1997; Neisser et al., 1996; Sternberg & Kaufman, 1998). It would mean that Spearman's (1904, 1927) positive manifold for cognitive tasks--virtually unchallenged for one hundred years--had finally broken down. Obviously, parsimony dictates that positive manifold remains a fact of life for cognitive tasks and that the response originally thought to be normative actually is.
In fact, it is probably helpful to articulate
the understanding/acceptance principle somewhat more formally in terms
of positive manifold--the fact that different measures of cognitive ability
almost always correlate with each other (see Carroll, 1993, 1997). The
individual differences version of the understanding/acceptance principle
puts positive manifold to use in areas of cognitive psychology where the
nature of the appropriate normative model to apply is in dispute. The point
is that scoring a vocabulary item on a cognitive ability test and scoring
a probabilistic reasoning response on a task from the heuristics and biases
literature are not the same. The correct response in the former task has
a canonical interpretation agreed upon by all investigators; whereas the
normative appropriateness of responses on tasks from the latter domain
has been the subject of extremely contentious dispute (Cohen, 1981, 1982,
1986; Cosmides & Tooby, 1996; Einhorn & Hogarth, 1981; Gigerenzer,
1991a, 1993, 1996a; Kahneman & Tversky, 1996; Koehler, 1996; Stein,
1996). Positive manifold between the two classes of task would only be
expected if the normative model being used for directional scoring of the
tasks in the latter domain is correct5. Likewise, given that
positive manifold is the norm among cognitive tasks, the negative correlation
(or, to a lesser extent, the lack of a correlation) between a probabilistic
reasoning task and more standard cognitive ability measures might be taken
as a signal that the wrong normative model is being applied to the former
task or that there are alternative models that are equally appropriate.
The latter point is relevant because the pattern of results in our studies
has not always mirrored the positive manifold displayed in Table
1. We have previously mentioned the false-consensus effect and overconfidence
effect as such examples, and further instances are discussed in the next
section.
The statistical reasoning problems utilized in the experiments discussed so far (those derived from Fong, et al. 1986) involved causal aggregate information, analogous to the causal base rates discussed by Ajzen (1977) and Bar-Hillel (1980, 1990)--that is, base rates that had a causal relationship to the criterion behavior. Noncausal base-rate problems--those involving base rates with no obvious causal relationship to the criterion behavior--have had a much more controversial history in the research literature. They have been the subject of over a decade's worth of contentious dispute (Bar-Hillel, 1990; Birnbaum, 1983; Cohen, 1979, 1982, 1986; Cosmides & Tooby, 1996; Gigerenzer, 1991b, 1993, 1996a; Gigerenzer & Hoffrage, 1995; Kahneman & Tversky, 1996; Koehler, 1996; Kyburg, 1983; Levi, 1983; Macchi, 1995)--important components of which have been articulated in this journal (e.g., Cohen, 1981, 1983; Koehler, 1996; Krantz, 1981; Kyburg, 1983; Levi, 1983).
In several experiments, we have examined some of the noncausal base-rate problems that are notorious for provoking philosophical dispute. One was an AIDS testing problem modeled on Casscells, Schoenberger, and Grayboys (1978):
"Imagine that AIDS occurs in one in every 1000 people. Imagine also there is a test to diagnose the disease that always gives a positive result when a person has AIDS. Finally, imagine that the test has a false positive rate of 5 percent. This means that the test wrongly indicates that AIDS is present in 5 percent of the cases where the person does not have AIDS. Imagine that we choose a person randomly, administer the test, and that it yields a positive result (indicates that the person has AIDS). What is the probability that the individual actually has AIDS, assuming that we know nothing else about the individual's personal or medical history?"The Bayesian posterior probability for this problem is slightly less than .02. In several analyses and replications (see Stanovich, 1999; Stanovich & West, 1998c) in which we have classified responses of less than 10% as Bayesian, responses of over 90% as indicating strong reliance on indicant information, and responses between 10% and 90% as intermediate, we have found that subjects giving the indicant response were higher in cognitive ability than those giving the Bayesian response6. Additionally, when tested on causal base-rate problems (e.g., Fong et al., 1986), the greatest base-rate usage was displayed by the group highly reliant on the indicant information in the AIDS problem. The subjects giving the Bayesian answer on the AIDS problem were least reliant on the aggregate information in the causal statistical reasoning problems.
A similar violation of the expectation of positive manifold was observed on the notorious cab problem (see Bar-Hillel, 1980; Lyon & Slovic, 1976; Tversky & Kahneman, 1982)--also the subject of almost two decades-worth of dispute: "A cab was involved in a hit-and-run accident at night. Two cab companies, the Green and the Blue, operate in the city in which the accident occurred. You are given the following facts: 85 percent of the cabs in the city are Green and 15 percent are Blue. A witness identified the cab as Blue. The court tested the reliability of the witness under the same circumstances that existed on the night of the accident and concluded that the witness correctly identified each of the two colors 80 percent of the time. What is the probability that the cab involved in the accident was Blue?"
Bayes' rule yields .41 as the posterior probability of the cab being blue. Thus, responses over 70% were classified as reliant on indicant information, responses between 30% and 70% as Bayesian, and response less than 30% as reliant on indicant information. Again, it was found that subjects giving the indicant response were higher in cognitive ability and need for cognition than those giving the Bayesian or base-rate response (Stanovich & West, 1998c, 1999). Finally, both the cabs problem and the AIDS problem were subjected to the second of Slovic and Tversky's (1974) methods of operationalizing the understanding/acceptance principle--presenting the subjects with arguments explicating the traditional normative interpretation (Stanovich & West, 1999). On neither problem was there a strong tendency for responses to move in the Bayesian direction subsequent to explication.
The results from both of these problems indicate that the noncausal base-rate problems display patterns of individual differences quite unlike those shown on the causal aggregate problems. On the latter, subjects giving the statistical response (choosing the aggregate rather than the case or indicant information) scored consistently higher on measures of cognitive ability. This pattern did not hold for the AIDS and cab problem where the significant differences were in the opposite direction--subjects strongly reliant on the indicant information scored higher on measures of cognitive ability and were more likely to give the Bayesian response on causal base-rate problems.
We examined the processing of noncausal base rates in another task with very different task requirements (see Stanovich, 1999; Stanovich & West, 1998d)--a selection task in which individuals were not forced to compute a Bayesian posterior, but instead simply had to indicate whether or not they thought the base rate was relevant to their decision. The task was taken from the work of Doherty and Mynatt (1990). Subjects were given the following instructions: "Imagine you are a doctor. A patient comes to you with a red rash on his fingers. What information would you want in order to diagnose whether the patient has the disease Digirosa? Below are four pieces of information that may or may not be relevant to the diagnosis. Please indicate all of the pieces of information that are necessary to make the diagnosis, but only those pieces of information that are necessary to do so." Subjects then chose from the alternatives listed in the order: % of people without Digirosa who have a red rash, % of people with Digirosa, % of people without Digirosa, and % of people with Digirosa who have a red rash. These alternatives represented the choices of P(D/~H), P(H), P(~H), and P(D/H), respectively.
The normatively correct choice of P(H), P(D/H), and P(D/~H) was made by 13.4% of our sample. The most popular choice (made by 35.5% of the sample) was the two components of the likelihood ratio, (P(D/H) and P(D/~H); 21.9% of the sample chose P(D/H) only; and 22.7% chose the base rate, P(H), and the numerator of the likelihood ratio, P(D/H)--ignoring the denominator of the likelihood ratio, P(D/~H). Collapsed across these combinations, almost all subjects (96.0%) viewed P(D/H) as relevant and very few (2.8%) viewed P(~H) as relevant. Overall, 54.3% of the subjects deemed that P(D/~H) was necessary information and 41.5% of the sample thought it was necessary to know the base rate, P(H).
We examined the cognitive characteristics of the subjects who thought the baserate was relevant and found that the did not display higher SAT than those who did not choose the baserate. The pattern of individual differences was quite different for the denominator of the likelihood ratio, P(D/~H)--a component which is normatively uncontroversial. Subjects seeing this information as relevant had significantly higher SAT scores.
Interestingly, in light of these patterns
of individual differences showing lack of positive manifold when the tasks
are scored in terms of the standard Bayesian approach, noncausal base-rate
problems like the AIDS and cab problem have been the focus of intense debate
in the literature (Cohen, 1979, 1981, 1982, 1986; Koehler, 1996; Kyburg,
1983; Levi, 1983). Several authors have argued that a rote application
of the Bayesian formula to these problems is unwarranted because noncausal
base rates of the AIDS-problem type lack relevance and reference-class
specificity. Finally, our results might also suggest that the Bayesian
subjects on the AIDS problem might not actually be arriving at their response
through anything resembling Bayesian processing (whether or not they were
operating in a frequentist mode; Gigerenzer & Hoffrage, 1995), because
on causal aggregate statistical reasoning problems these subjects were
less
likely to rely on the aggregate information.
5. Alternative Task Construals
Theorists who resist interpreting the gap between normative and descriptive models as indicating human irrationality have one more strategy available in addition to those previously described. In the context of empirical cognitive psychology, it is a commonplace argument, but it is one that continues to create enormous controversy and to bedevil efforts to compare human performance to normative standards. It is the argument that although the experimenter may well be applying the correct normative model to the problem as set, the subject might be construing the problem differently and be providing the normatively appropriate answer to a different problem--in short, that subjects have a different interpretation of the task (see, for example, Adler, 1984, 1991; Broome, 1990; Henle, 1962; Hilton, 1995; Levinson, 1995; Margolis, 1987; Schick, 1987, 1997; Schwarz, 1996).
Such an argument is somewhat different from any of the critiques examined thus far. It is not the equivalent of positing that a performance error has been made, because performance errors (attention lapses, etc.)--being transitory and random--would not be expected to recur in exactly the same way in a readministration of the same task. Whereas, if the subject has truly misunderstood the task, they would be expected to do so again on an identical re-administration of the task.
Correspondingly, this criticism is different from the argument that the task exceeds the computational capacity of the subject. The latter explanation locates the cause of the suboptimal performance within the subject. In contrast, the alternative task construal argument places the blame at least somewhat on the shoulders of the experimenter for failing to realize that there were task features that might lead subjects to frame the problem in a manner different from that intended7.
As with incorrect norm application, the alternative construal argument locates the problem with the experimenter. However, it is different in that in the wrong norm explanation it is assumed that the subject is interpreting the task as the experimenter intended--but the experimenter is not using the right criteria to evaluate performance. In contrast, the alternative task construal argument allows that the experimenter may be applying the correct normative model to the problem the experimenter intends the subject to solve--but posits that the subject has construed the problem in some other way and is providing a normatively appropriate answer to a different problem.
It seems that in order to comprehensively
evaluate the rationality of human cognition it will be necessary to evaluate
the appropriateness of various task construals. This is because--contrary
to thin theories of means/ends rationality that avoid evaluating the subject's
task construal (Elster, 1983; Nathanson, 1994)--it will be argued here
that if we are going to have any normative standards at all, then we must
also have standards for what are appropriate and inappropriate task construals.
In the remainder of this section, we will sketch the arguments of philosophers
and decision scientists who have made just this point. Then it will be
argued that: 1) in order to tackle the difficult problem of evaluating
task construals, criteria of wide reflective equilibrium come into play;
2) it will be necessary to use all descriptive information about human
performance that could potentially affect expert wide reflective equilibrium;
3) included in the relevant descriptive facts are individual differences
in task construal and their patterns of covariance. This argument will
again make use of the understanding/acceptance principle of Slovic and
Tversky (1974) discussed in Section 4.2.
5.1 The Necessity of Principles of Rational Construal
It is now widely recognized that the evaluation of the normative appropriateness of a response to a particular task is always relative to a particular interpretation of the task. For example, Schick (1987) argues that "how rationality directs us to choose depends on which understandings are ours....[and that] the understandings people have bear on the question of what would be rational for them" (pp. 53, 58). Likewise, Tversky (1975) argued that "the question of whether utility theory is compatible with the data or not, therefore, depends critically on the interpretation of the consequences" (p. 171).
However, others have pointed to the danger inherent in too permissively explaining away nonnormative responses by positing different construals of the problem. Normative theories will be drained of all of their evaluative force if we adopt an attitude that is too charitable toward alternative construals. Broome (1990) illustrates the problem by discussing the preference reversal phenomenon (Lichtenstein & Slovic, 1971; Slovic, 1995). In a choice between two gambles, A and B, a person chooses A over B. However, when pricing the gambles, the person puts a higher price on B. This violation of procedural invariance leads to what appears to be intransitivity. Presumably there is an amount of money, M, that would be preferred to A but given a choice of M and B the person would choose B. Thus, we appear to have B > M, M > A, A > B. Broome (1990) points out that when choosing A over B the subject is choosing A and is simultaneously rejecting B. Evaluating A in the M versus A comparison is not the same. Here, when choosing A, the subject is not rejecting B. The A alternative here might be considered to be a different prospect (call it A'), and if it is so considered there is no intransitivity (B > M, M > A', A > B). Broome (1990) argues that whenever the basic axioms such as transitivity, independence, or descriptive or procedural invariance are breached, the same inoculating strategy could be invoked--that of individuating outcomes so finely that the violation disappears.
Broome's (1990) point is that the thinner the categories we use to individuate outcomes, the harder it will be to attribute irrationality to a set of preferences if we evaluate rationality only in instrumental terms. He argues that we need, in addition to the formal principles of rationality, those that deal with content so as to enable us to evaluate the reasonableness of a particular individuation of outcomes. Broome (1990) acknowledges that "this procedure puts principles of rationality to work at a very early stage of decision theory. They are needed in fixing the set of alternative prospects that preferences can then be defined upon. The principles in question might be called "'rational principles of indifference'" (p. 140). Broome (1990) admits that "many people think there can be no principles of rationality apart from the formal ones. This goes along with the common view that rationality can only be instrumental....[however] if you acknowledge only formal principles of rationality, and deny that there are any principles of indifference, you will find yourself without any principles of rationality at all" (pp. 140-141).
Broome cites Tversky (1975) as concurring in this view: "I believe that an adequate analysis of rational choice cannot accept the evaluation of the consequences as given, and examine only the consistency of preferences. There is probably as much irrationality in our feelings, as expressed in the way we evaluate consequences, as there is in our choice of actions. An adequate normative analysis must deal with problems such as the legitimacy of regret in Allais' problem....I do not see how the normative appeal of the axioms could be discussed without a reference to a specific interpretation" (Tversky, 1975, p. 172).
Others agree with the Broome/Tversky
analysis (see Baron, 1993, 1994; Frisch, 1994; Schick, 1997). But while
there is some support for Broome's generic argument, the contentious
disputes about rational principles of indifference and rational construals
of the tasks in the heuristics and biases literature (Adler, 1984, 1991;
Berkeley & Humphreys, 1982; Cohen, 1981, 1986; Gigerenzer, 1993, 1996a;
Hilton, 1995; Jepson, Krantz, & Nisbett, 1983; Kahneman & Tversky,
1983, 1996; Lopes, 1991; Nisbett, 1981; Schwarz, 1996) highlight the difficulties
to be faced when attempting to evaluate specific problem construals. For
example, Margolis (1987) agrees with Henle (1962) that the subjects' nonnormative
responses will almost always be logical responses to some other problem
representation. But unlike Henle (1962), Margolis (1987) argues that many
of these alternative task construals are so bizarre--so far from what the
very words in the instructions said--that they represent serious cognitive
errors that deserve attention: "But in contrast to Henle and Cohen, the
detailed conclusions I draw strengthen rather than invalidate the basic
claim of the experimenters. For although subjects can be--in fact, I try
to show, ordinarily are--giving reasonable responses to a different question,
the different question can be wildly irrelevant to anything that plausibly
could be construed as the meaning of the question asked. The locus of the
illusion is shifted, but the force of the illusion is confirmed not invalidated
or explained away" (p. 141)
5.2 Evaluating Principles of Rational Construal: The Understanding/Acceptance Assumption Revisited
Given current arguments that principles of rational construal are necessary for a full normative theory of human rationality (Broome, 1990; Einhorn & Hogarth, 1981; Jungermann, 1986; Schick, 1987, 1997; Shweder, 1987; Tversky, 1975), how are such principles to be derived? When searching for principles of rational task construal the same mechanisms of justification used to assess principles of instrumental rationality will be available. Perhaps in some cases--instances where the problem structure maps the world in an unusually close and canonical way--problem construals could be directly evaluated by how well they serve the decision maker in achieving their goals (Baron, 1993, 1994). In such cases, it might be possible to prove the superiority or inferiority of certain construals by appeals to Dutch Book or money pump arguments (de Finetti, 1970/1990; Maher, 1993; Skyrms, 1986; Osherson, 1995; Resnik, 1987).
Also available will be the expert wide reflective equilibrium view discussed by Stich and Nisbett (1980; see Stanovich, 1999; Stein, 1996). In contrast, Baron (1993, 1994) and Thagard (1982) argue that rather than any sort of reflective equilibrium, what is needed here are "arguments that an inferential system is optimal with respect to the criteria discussed" (Thagard, 1982, p. 40). But in the area of task construal, finding optimization of criteria may be unlikely--there will be few money pumps or Dutch Books to point the way. If in the area of task construal there will be few money pumps or Dutch Books to prove that a particular task interpretation has disastrous consequences, then the field will be again thrust back upon the debate that Thagard (1982) calls the argument between the populists and the elitists. But as argued before, this is really a misnomer. There are few controversial tasks in the heuristics and biases literature where all untutored laypersons interpret tasks differently from those of the experts who designed them. The issue is not the untutored average person versus experts, but experts plus some laypersons versus other untutored individuals. The cognitive characteristics of those departing from the expert construal might--for reasons parallel to those argued in section 4--have implications for how we evaluate particular task interpretations. It is argued here that Slovic and Tversky's (1974) assumption ("the deeper the understanding of the axiom, the greater the readiness to accept it" pp. 372-373) can again be used as a tool to condition the expert reflective equilibrium regarding principles of rational task construal.
Framing effects are ideal vehicles for demonstrating how the understanding/acceptance principle might be utilized. First, it has already been shown that there are consistent individual differences across a variety of framing problems (Frisch, 1993). Second, framing problems have engendered much dispute regarding issues of appropriate task construal. The Disease Problem of Tversky and Kahneman (1981) has been the subject of much contention:
Problem 1. Imagine that the U.S. is preparing for the outbreak of an unusual disease, which is expected to kill 600 people. Two alternative programs to combat the disease have been proposed. Assume that the exact scientific estimates of the consequences of the programs are as follows: If Program A is adopted, 200 people will be saved. If Program B is adopted, there is a one-third probability that 600 people will be saved and a two-thirds probability that no people will be saved. Which of the two programs would you favor, Program A or Program B?Many subjects select alternatives A and D in these two problems despite the fact that the two problems are redescriptions of each other and that Program A maps to Program C rather than D. This response pattern violates the assumption of descriptive invariance of utility theory. However, Berkeley and Humphreys (1982) argue that the Programs A and C might not be descriptively invariant in subjects' interpretations. They argue that the wording of the outcome of Program A ("will be saved") combined with the fact that its outcome is seemingly not described in the exhaustive way as the consequences for Program B suggests the possibility of human agency in the future which might enable the saving of more lives (see also, Kuhberger, 1995). The wording of the outcome of Program C ("will die") does not suggest the possibility of future human agency working to possibly save more lives (indeed, the possibility of losing a few more might be inferred by some people). Under such a construal of the problem, it is no longer non-normative to choose Programs A and D. Likewise, Macdonald (1986) argues that, regarding the "200 people will be saved" phrasing, "it is unnatural to predict an exact number of cases" (p. 24) and that "ordinary language reads 'or more' into the interpretation of the statement" (p. 24; see also Jou, Shanteau, & Harris, 1996).Problem 2. Imagine that the U.S. is preparing for the outbreak of an unusual disease, which is expected to kill 600 people. Two alternative programs to combat the disease have been proposed. Assume that the exact scientific estimates of the consequences of the programs are as follows: If Program C is adopted, 400 people will die. If Program D is adopted, there is a one-third probability that nobody will die and a two-thirds probability that 600 people will die. Which of the two programs would you favor, Program C or Program D?
However, consistent with the finding that being forced to provide a rationale or take more time reduces framing effects (e.g., Larrick et al., 1992; Sieck & Yates, 1997; Takemura, 1994) and that people higher in need for cognition displayed reduced framing effects (Smith & Levin, 1996), in our within-subjects study of framing effects on the Disease Problem (Stanovich & West, 1998b), we found that subjects giving a consistent response to both descriptions of the problem--who were actually the majority in our within-subjects experiment--were significantly higher in cognitive ability than those subjects displaying a framing effect. Thus, the results of studies investigating the effects of giving a rationale, taking more time, associations with cognitive engagement, and associations with cognitive ability are all consistent in suggesting that the response dictated by the construal of the problem originally favored by Tversky and Kahneman (1981) should be considered the correct response because it is endorsed even by untutored subjects as long as they are cognitively engaged with the problem, had enough time to process the information, and had the cognitive ability to fully process the information8.
Perhaps no finding in the heuristics and biases literature has been the subject of as much criticism as Tversky and Kahneman's (1983) claim to have demonstrated a conjunction fallacy in probabilistic reasoning. Most of the criticisms have focused on the issue of differential task construal, and several critics have argued that there are alternative construals of the tasks that are, if anything, more rational than that which Tversky and Kahneman (1983) regard as normative for examples such as the well-known Linda problem:
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Please rank the following statements by their probability, using 1 for the most probable and 8 for the least probable.Because alternative h is the conjunction of alternatives c and f, the probability of h cannot be higher than that of either c or f, yet 85% of the subjects in Tversky and Kahneman's (1983) study rated alternative h as more probable than f. What concerns us here is the argument that there are subtle linguistic and pragmatic features of the problem that lead subjects to evaluate alternatives different than those listed. For example, Hilton (1995) argues that under the assumption that the detailed information given about the target means that the experimenter knows a considerable amount about Linda, then it is reasonable to think that the phrase "Linda is a bank teller" does not contain the phrase "and is not active in the feminist movement" because the experimenter already knows this to be the case. If "Linda is a bank teller" is interpreted in this way, then rating h as more probable than f no longer represents a conjunction fallacy.a. Linda is a teacher in an elementary school
b. Linda works in a bookstore and takes Yoga classes
c. Linda is active in the feminist movement
d. Linda is a psychiatric social worker
e. Linda is a member of the League of Women Voters
f. Linda is a bank teller
g. Linda is an insurance salesperson
h. Linda is a bank teller and is active in the feminist movement
Similarly, Morier and Borgida (1984) point out that the presence of the unusual conjunction "Linda is a bank teller and is active in the feminist movement" itself might prompt an interpretation of "Linda is a bank teller" as "Linda is a bank teller and is not active in the feminist movement". Actually, Tversky and Kahneman (1983) themselves had concerns about such an interpretation of the "Linda is a bank teller" alternative and ran a condition in which this alternative was rephrased as "Linda is a bank teller whether or not she is active in the feminist movement". They found that conjunction fallacy was reduced from 85% of their sample to 57% when this alternative was used. Several other investigators have suggested that pragmatic inferences lead to seeming violations of the logic of probability theory in the Linda Problem9 (see Adler, 1991; Dulany & Hilton, 1991; Levinson, 1995; Macdonald & Gilhooly, 1990; Politzer & Noveck, 1991; Slugoski & Wilson, 1998). These criticisms all share the implication that actually committing the conjunction fallacy is a rational response to an alternative construal of the different statements about Linda.
Assuming that those committing the so-called conjunction fallacy are making the pragmatic interpretation and that those avoiding the fallacy are making the interpretation that the investigators intended, we examined whether the subjects making the pragmatic interpretation were subjects who were disproportionately the subjects of higher cognitive ability. Because this group is in fact the majority in most studies--and because the use of such pragmatic cues and background knowledge is often interpreted as reflecting adaptive information processing (e.g., Hilton, 1995)--it might be expected that these individuals would be the subjects of higher cognitive ability.
In our study (Stanovich & West, 1998b), we examined the performance of 150 subjects on the Linda Problem presented above. Consistent with the results of previous experiments on this problem (Tversky & Kahneman, 1983), 80.7% of our sample committed the conjunction effect--they rated the feminist bank teller alternative as more probable than the bank teller alternative. The mean SAT score of the 121 subjects who committed the conjunction fallacy was 82 points lower than the mean score of the 29 who avoided the fallacy. This difference was highly significant and it translated into an effect size of .746, which Rosenthal and Rosnow (1991, p. 446) classify as "large."
Tversky and Kahneman (1983) and Reeves and Lockhart (1993) have demonstrated that the incidence of the conjunction fallacy can be decreased if the problem describes the event categories in some finite population or if the problem is presented in a frequentist manner (see also Fiedler, 1988; Gigerenzer, 1991b, 1993). We have replicated this well-known finding, but we have also found that frequentist representations of these problems markedly reduce--if not eliminate--cognitive ability differences (Stanovich & West, 1998b).
Another problem that has spawned many arguments about alternative construals is Wason's (1966) selection task. Performance on abstract versions of the selection task is extremely low (see Evans, Newstead, & Byrne, 1993). Typically, less than 10% of subjects make the correct selections of the A card (P) and 7 card (not-Q). The most common incorrect choices made by subjects are the A card and the 3 card (P and Q) or the selection of the A card only (P). The preponderance of P and Q responses has most often been attributed to a so-called matching bias that is automatically triggered by surface-level relevance cues (Evans, 1996; Evans & Lynch, 1973), but some investigators have championed an explanation based on an alternative task construal. For example, Oaksford and Chater (1994, 1996; see also Nickerson, 1996) argue that rather than interpreting the task as one of deductive reasoning (as the experimenter intends), many subjects interpret it as an inductive problem of probabilistic hypothesis testing. They show that the P and Q response is expected under a formal Bayesian analysis which assumes such an interpretation in addition to optimal data selection.
We have examined individual differences in responding on a variety of abstract and deontic selection task problems (Stanovich & West, 1998a, 1998c). Typical results are displayed in Table 2. The table presents the mean SAT scores of subjects responding correctly (as traditionally interpreted--with the responses P and not-Q) on various versions of selection task problems. One was a commonly used nondeontic problem with content, the so-called Destination Problem (e.g., Manktelow & Evans, 1979). Replicating previous research, few subjects responded correctly on this problem. However, those that did had significantly higher SAT scores than those that did not and the difference was quite large in magnitude (effect size of .815). Also presented in the table are two well-known problems (Dominowski, 1995; Griggs, 1983; Griggs & Cox, 1982, 1983; Newstead & Evans, 1995) with deontic rules (reasoning about rules used to guide human behavior--about what "ought to" or "must" be done, see Manktelow & Over, 1991)--the Drinking-Age Problem (If a person is drinking beer then the person must be over 21 years of age) and the Sears Problem (Any sale over $30 must be approved by the section manager, Mr. Jones). Both are known to facilitate performance and this effect is clearly replicated in the data presented in Table 2. However, it is also clear that the differences in cognitive ability are much less in these two problems. The effect size is reduced from .815 to .347 in the case of the Drinking-Age Problem and it fails to even reach statistical significance in the case of the Sears Problem (effect size of .088). The bottom half of the table indicates that exactly the same pattern was apparent when the P and not-Q responders were compared only with the P and Q responders on the Destination Problem--the latter being the response that is most consistent with an inductive construal of the problem (see Nickerson, 1996; Oaksford & Chater, 1994, 1996).
Thus, on the selection task, it appears
that cognitive ability differences are strong in cases where there is a
dispute about the proper construal of the task (in nondeontic tasks). In
cases where there is little controversy about alternative construals--the
deontic rules of the Drinking-Age and Sears problems--cognitive ability
differences are markedly attenuated. This pattern--cognitive ability differences
large on problems where there is contentious dispute regarding the appropriate
construal and cognitive ability differences small when there is no dispute
about task construal--is mirrored in our results on the conjunction effect
and framing effect (Stanovich & West, 1998b).
6. Dual Process Theories and Alternative Task Construals
The sampling of results just presented (for other examples, see Stanovich, 1999) has demonstrated that the responses associated with alternative construals of a well-known framing problem (the Disease Problem), for the Linda Problem, and for the nondeontic selection task were consistently associated with lower cognitive ability. How might we interpret this consistent pattern displayed on three tasks from the heuristics and biases literature where alternative task construals have been championed?
One possible interpretation of this pattern is in terms of two-process theories of reasoning (Epstein, 1994; Evans, 1984, 1996; Evans & Over, 1996; Sloman, 1996). A summary of the generic properties distinguished by several two-process views are presented in Table 3. Although the details and technical properties of these dual-process theories do not always match exactly, nevertheless there are clear family resemblances (for discussions, see Evans & Over, 1996; Gigerenzer & Regier, 1996; Sloman, 1996). In order to emphasize the prototypical view that is adopted here, the two systems have simply been generically labeled System 1 and System 2.
The key differences in the properties of the two systems are listed next. System 1 is characterized as automatic, largely unconscious, and relatively undemanding of computational capacity. Thus, it conjoins properties of automaticity and heuristic processing as these constructs have been variously discussed in the literature. These properties characterize what Levinson (1995) has termed interactional intelligence--a system composed of the mechanisms that support a Gricean theory of communication that relies on intention-attribution. This system has as its goal the ability to model other minds in order to read intention and to make rapid interactional moves based on those modeled intentions. System 2 conjoins the various characteristics that have been viewed as typifying controlled processing. System 2 encompasses the processes of analytic intelligence that have traditionally been studied by information processing theorists trying to uncover the computational components underlying intelligence.
For the purposes of the present discussion, the most important difference between the two systems is that they tend to lead to different types of task construals. Construals triggered by System 1 are highly contextualized, personalized, and socialized. They are driven by considerations of relevance and are aimed at inferring intentionality by the use of conversational implicature even in situations that are devoid of conversational features (see Margolis, 1987). The primacy of these mechanisms leads to what has been termed the fundamental computational bias in human cognition (Stanovich, 1999)--the tendency toward automatic contextualization of problems. In contrast, System 2's more controlled processes serve to decontextualize and depersonalize problems. This system is more adept at representing in terms of rules and underlying principles. It can deal with problems without social content and is not dominated by the goal of attributing intentionality or by the search for conversational relevance.
Using the distinction between System 1 and System 2 processing, it is conjectured here that in order to observe large cognitive ability differences in a problem situation, the two systems must strongly cue different responses10. It is not enough simply that both systems are engaged. If both cue the same response (as in deontic selection task problems), then this could have the effect of severely diluting any differences in cognitive ability. One reason that this outcome is predicted is that it is assumed that individual differences in System 1 processes (interactional intelligence) bear little relation to individual differences in System 2 processes (analytic intelligence). This is a conjecture for which there is a modest amount of evidence. Reber (1993) has shown preconscious processes to have low variability and to show little relation to analytic intelligence (see Jones & Day, 1997; McGeorge, Crawford, & Kelly, 1997; Reber, Walkenfeld, & Hernstadt, 1991).
In contrast, if the two systems cue opposite responses, rule-based System 2 will tend to differentially cue those of high analytic intelligence and this tendency will not be diluted by System 1 (the associative system) nondifferentially drawing subjects to the same response. For example, the Linda Problem maximizes the tendency for the two systems to prime different responses and this problem produced a large difference in cognitive ability. Similarly, in nondeontic selection tasks there is ample opportunity for the two systems to cue different responses. A deductive interpretation conjoined with an exhaustive search for falsifying instances yields the response P and not-Q. This interpretation and processing style is likely associated with the rule-based System 2--individual differences in which underlie the psychometric concept of analytic intelligence. In contrast, within the heuristic-analytic framework of Evans (1984, 1989, 1996), the matching response of P and Q reflects the heuristic processing of System 1 (in Evans' theory, a linguistically-cued relevance response).
In deontic problems, both deontic and
rule-based logics are cuing construals of the problem that dictate the
same response (P and not-Q). Whatever is one's theory of responding in
deontic tasks--preconscious relevance judgments, pragmatic schemas, or
Darwinian algorithms (e.g., Cheng & Holyoak, 1989; Cosmides, 1989;
Cummins, 1996; Evans, 1996)--the mechanisms triggering the correct response
resemble heuristic or modular structures that fall within the domain of
System 1. These structures are unlikely to be strongly associated with
analytic intelligence (Cummins, 1996; Levinson, 1995; McGeorge, Crawford,
& Kelly, 1997; Reber, 1993; Reber, Walkenfeld, & Hernstadt, 1991),
and hence they operate to draw subjects of both high and low analytic
intelligence to the same response dictated by the rule-based system--thus
serving to dilute cognitive ability differences between correct and incorrect
responders (see Stanovich & West, 1998a for a data simulation).
6.1 Alternative Construals: Evolutionary Optimization Versus Normative Rationality
The sampling of experimental results reviewed here (see Stanovich, 1999 for further examples) has demonstrated that the response dictated by the construal of the inventors of the Linda Problem (Tversky & Kahneman, 1983), Disease Problem (Tversky & Kahneman, 1981), and selection task (Wason, 1966) is the response favored by subjects of high analytic intelligence. The alternative responses dictated by the construals favored by the critics of the heuristics and biases literature were the choices of the subjects of lower analytic intelligence. In this section we will explore the possibility that these alternative construals may have been triggered by heuristics that make evolutionary sense, but that subjects higher in a more flexible type of analytic intelligence (and those more cognitively engaged, see Smith & Levin, 1996) are more prone to follow normative rules that maximize personal utility. In a very restricted sense, such a pattern might be said to have relevance for the concept of rational task construal.
The argument depends on the distinction between evolutionary adaptation and instrumental rationality (utility maximization given goals and beliefs). The key point is that for the latter (variously termed practical, pragmatic, or means/ends rationality), maximization is at the level of the individual person. Adaptive optimization in the former case is at the level of the genes. In Dawkins' (1976, 1982) terms, evolutionary adaptation concerns optimization processes relevant to the so-called replicators (the genes), whereas instrumental rationality concerns utility maximization for the so-called vehicle (or interactor, to use Hull's, 1982, term), which houses the genes. Anderson (1990, 1991) emphasizes this distinction in his treatment of adaptionist models in psychology. In his advocacy of such models, Anderson (1990, 1991) eschews Dennett's (1987) assumption of perfect rationality in the instrumental sense (hereafter termed normative rationality) for the somewhat different assumption of evolutionary optimization (i.e., evolution as a local fitness maximizer). Anderson (1990) accepts Stich's (1990; see also Cooper, 1989; Skyrms, 1996) argument that evolutionary adaptation (hereafter termed evolutionary rationality)11 does not guarantee perfect human rationality in the normative sense: "Rationality in the adaptive sense, which is used here, is not rationality in the normative sense that is used in studies of decision making and social judgment....It is possible that humans are rational in the adaptive sense in the domains of cognition studied here but not in decision making and social judgment" (p. 31). Thus, Anderson (1991) acknowledges that there may be arguments for "optimizing money, the happiness of oneself and others, or any other goal. It is just that these goals do not produce optimization of the species" (pp. 510-511). As a result, a descriptive model of processing that is adaptively optimal could well deviate substantially from a normative model. This is because Anderson's (1990, 1991) adaptation assumption is that cognition is optimally adapted in an evolutionary sense--and this is not the same as positing that human cognitive activity will result in normatively appropriate responses.
Such a view can encompass both the impressive record of descriptive accuracy enjoyed by a variety of adaptionist models (Anderson, 1990, 1991; Oaksford & Chater, 1994, 1996, 1998) as well as the fact that cognitive ability sometimes dissociates from the response deemed optimal on an adaptionist analysis (Stanovich & West, 1998a). As discussed above, Oaksford and Chater (1994) have had considerable success in modeling the nondeontic selection task as an inductive problem in which optimal data selection is assumed (see also, Oaksford, Chater, Grainger, & Larkin, 1997). Their model predicts the modal response of P and Q and the corresponding dearth of P and not-Q choosers. Similarly, Anderson (1990, p. 157-160) models the 2 x 2 contingency assessment experiment using a model of optimally adapted information processing and shows how it can predict the much-replicated finding that the D cell (cause absent and effect absent) is vastly underweighted (see also Friedrich, 1993; Klayman & Ha, 1987). Finally, a host of investigators (Adler, 1984, 1991; Dulany & Hilton, 1991; Hilton, 1995; Levinson, 1995) have stressed how a model of rational conversational implicature predicts that violating the conjunction rule in the Linda Problem reflects the adaptive properties of interactional intelligence.
Yet in all three of these cases--despite the fact that the adaptionist models predict the modal response quite well--individual differences analyses demonstrate associations that also must be accounted for. Correct responders on the nondeontic selection task (P and not-Q choosers--not those choosing P and Q) are higher in cognitive ability. In the 2 x 2 covariation detection experiment, it is those subjects weighting cell D more equally (not those underweighting the cell in the way that the adaptionist model dictates) who are higher in cognitive ability and who tend to respond normatively on other tasks (Stanovich & West, 1998d). Finally, despite conversational implicatures indicating the opposite, individuals of higher cognitive ability disproportionately tend to adhere to the conjunction rule. These patterns make sense if it is assumed that the two systems of processing are optimized for different situations and different goals and that these data patterns reflect the greater probability that the analytic intelligence of System 2 will override the interactional intelligence of System 1 in individuals of higher cognitive ability.
In summary, the biases introduced by System 1 heuristic processing may well be universal--because the computational biases inherent in this system are ubiquitous and shared by all humans. However, it does not necessarily follow that errors on tasks from the heuristics and biases literature will be universal (we have known for some time that they are not). This is because, for some individuals, System 2 processes operating in parallel (see Evans & Over, 1996) will have the requisite computational power (or a low enough threshold) to override the response primed by System 1.
It is hypothesized that the features of System 1 are designed to very closely track increases in the reproduction probability of genes. System 2, while also clearly an evolutionary product, is also primarily a control system focused on the interests of the whole person. It is the primary maximizer of an individual's personal utility12. Maximizing the latter will occasionally result in sacrificing genetic fitness (Barkow, 1989; Cooper, 1989; Skyrms, 1996). Because System 2 is more attuned to normative rationality than is System 1, System 2 will seek to fulfill the individual's goals in the minority of cases where those goals conflict with the responses triggered by System 1.
It is proposed that just such conflicts are occurring in three of the tasks discussed previous previously (the Disease Problem, the Linda Problem, and the selection task). This conjecture is supported by the fact that evolutionary rationality has been conjoined with Gricean principles of conversational implicature by several theorists (Gigerenzer, 1996b; Hilton, 1995, Levinson, 1995) who emphasize the principle of "conversationally rational interpretation" (Hilton, 1995, pp. 265). According to this view, the pragmatic heuristics are not simply inferior substitutes for computationally costly logical mechanisms which would work better. Instead, the heuristics are optimally designed to solve an evolutionary problem in another domain--attributing intentions to conspecifics and coordinating mutual intersubjectivity so as to optimally negotiate cooperative behavior (Cummins, 1996; Levinson, 1995; Skyrms, 1996).
It must be stressed though that in the vast majority of mundane situations, the evolutionary rationality embodied in System 1 processes will also serve the goals of normative rationality. Our automatic, System 1 processes for accurately navigating around objects in the natural world were adaptive in an evolutionary sense, and they likewise serve our personal goals as we carry out our lives in the modern world (that is, navigational abilities are an evolutionary adaptation that serve the instrumental goals of the vehicle as well).
One way to view the difference between what we have termed here evolutionary and normative rationality is to note that they are not really two different types of rationality (see Oaksford & Chater, 1998, pp. 291-297) but are instead terms for characterizing optimization procedures operating at the subpersonal and personal levels, respectively. That there are two optimization procedures in operation here that could come into conflict is a consequence of the insight that the genes--as subpersonal replicators--can increase their fecundity