From Wikipedia, the free encyclopedia
Intuitive statistics, or folk statistics,
refers to the cognitive phenomenon where organisms use data to make
generalizations and predictions about the world. This can be a small
amount of sample data or training instances, which in turn contribute to
inductive inferences
about either population-level properties, future data, or both.
Inferences can involve revising hypotheses, or beliefs, in light of
probabilistic data that inform and motivate future predictions. The
informal tendency for cognitive animals to intuitively generate statistical inferences, when formalized with certain axioms of probability theory, constitutes statistics as an academic discipline.
Because this capacity can accommodate a broad range of
informational domains, the subject matter is similarly broad and
overlaps substantially with other cognitive phenomena. Indeed, some have
argued that "cognition as an intuitive statistician" is an apt
companion metaphor to the computer metaphor of cognition. Others appeal to a variety of statistical and probabilistic mechanisms behind theory construction and category structuring.
Research in this domain commonly focuses on generalizations relating to
number, relative frequency, risk, and any systematic signatures in
inferential capacity that an organism (e.g., humans, or non-human primates) might have.
Background and theory
Intuitive inferences can involve generating hypotheses from incoming sense data, such as categorization and concept
structuring. Data are typically probabilistic and uncertainty is the
rule, rather than the exception, in learning, perception, language, and
thought. Recently, researchers have drawn from ideas in probability theory, philosophy of mind, computer science, and psychology to model cognition as a predictive and generative system of probabilistic representations, allowing information structures to support multiple inferences in a variety of contexts and combinations. This approach has been called a probabilistic language of thought
because it constructs representations probabilistically, from
pre-existing concepts to predict a possible and likely state of the
world.
Probability
Statisticians
and probability theorists have long debated about the use of various
tools, assumptions, and problems relating to inductive inference in
particular. David Hume famously considered the problem of induction,
questioning the logical foundations of how and why people can arrive at
conclusions that extend beyond past experiences - both spatiotemporally
and epistemologically.
More recently, theorists have considered the problem by emphasizing
techniques for arriving from data to hypothesis using formal
content-independent procedures, or in contrast, by considering informal,
content-dependent tools for inductive inference.
Searches for formal procedures have led to different developments in
statistical inference and probability theory with different assumptions,
including Fisherian frequentist statistics, Bayesian inference, and Neyman-Pearson statistics.
Gerd Gigerenzer
and David Murray argue that twentieth century psychology as a
discipline adopted probabilistic inference as a unified set of ideas and
ignored the controversies among probability theorists. They claim that a
normative but incorrect view of how humans "ought to think rationally"
follows from this acceptance. They also maintain, however, that the
intuitive statistician metaphor of cognition is promising, and should
consider different formal tools or heuristics
as specialized for different problem domains, rather than a content- or
context-free toolkit. Signal detection theorists and object detection
models, for example, often use a Neyman-Pearson approach, whereas
Fisherian frequentist statistics might aid cause-effect inferences.
Frequentist inference
Frequentist inference
focuses on the relative proportions or frequencies of occurrences to
draw probabilistic conclusions. It is defined by its closely related
concept, frequentist probability.
This entails a view that "probability" is nonsensical in the absence of
pre-existing data, because it is understood as a relative frequency
that long-run samples would approach given large amounts of data. Leda Cosmides and John Tooby
have argued that it is not possible to derive a probability without
reference to some frequency of previous outcomes, and this likely has evolutionary origins:
Single-event probabilities, they claim, are not observable because
organisms evolved to intuitively understand and make statistical
inferences from frequencies of prior events, rather than to "see"
probability as an intrinsic property of an event.
Bayesian inference
Bayesian inference generally emphasizes the subjective probability of a hypothesis, which is computed as a posterior probability using Bayes' Theorem.
It requires a "starting point" called a prior probability, which has
been contentious for some frequentists who claim that frequency data are
required to develop a prior probability, in contrast to taking a probability as an a priori assumption.
Bayesian models have been quite popular among psychologists,
particularly learning theorists, because they appear to emulate the
iterative, predictive process by which people learn and develop
expectations from new observations, while giving appropriate weight to
previous observations. Andy Clark,
a cognitive scientist and philosopher, recently wrote a detailed
argument in support of understanding the brain as a constructive
Bayesian engine that is fundamentally action-oriented and predictive, rather than passive or reactive.
More classic lines of evidence cited among supporters of Bayesian
inference include conservatism, or the phenomenon where people modify
previous beliefs toward, but not all the way to, a conclusion implied by previous observations.
This pattern of behavior is similar to the pattern of posterior
probability distributions when a Bayesian model is conditioned on data,
though critics argued that this evidence had been overstated and lacked
mathematical rigor.
Alison Gopnik more recently tackled the problem by advocating the use of Bayesian networks, or directed graph
representations of conditional dependencies. In a Bayesian network,
edge weights are conditional dependency strengths that are updated in
light of new data, and nodes are observed variables. The graphical
representation itself constitutes a model, or hypothesis, about the
world and is subject to change, given new data.
Error management theory
Error management theory (EMT) is an application of Neyman-Pearson statistics to cognitive and evolutionary psychology. It maintains that the possible fitness costs and benefits of type I (false positive) and type II (false negative) errors are relevant to adaptively rational inferences, toward which an organism is expected to be biased due to natural selection. EMT was originally developed by Martie Haselton and David Buss,
with initial research focusing on its possible role in sexual
overperception bias in men and sexual underperception bias in women.
This is closely related to a concept called the "smoke detector
principle" in evolutionary theory. It is defined by the tendency for
immune, affective, and behavioral defenses to be hypersensitive and
overreactive, rather than insensitive or weakly expressed. Randolph Nesse maintains that this is a consequence of a typical payoff structure in signal detection:
In a system that is invariantly structured with a relatively low cost
of false positives and high cost of false negatives, naturally selected
defenses are expected to err on the side of hyperactivity in response to
potential threat cues. This general idea has been applied to hypotheses about the apparent tendency for humans to apply agency to non-agents based on uncertain or agent-like cues.
In particular, some claim that it is adaptive for potential prey to
assume agency by default if it is even slightly suspected, because
potential predator threats typically involve cheap false positives and
lethal false negatives.
Heuristics and biases
Heuristics
are efficient rules, or computational shortcuts, for producing a
judgment or decision. The intuitive statistician metaphor of cognition
led to a shift in focus for many psychologists, away from emotional or
motivational principles and toward computational or inferential
principles.
Empirical studies investigating these principles have led some to
conclude that human cognition, for example, has built-in and systematic
errors in inference, or cognitive biases.
As a result, cognitive psychologists have largely adopted the view that
intuitive judgments, generalizations, and numerical or probabilistic
calculations are systematically biased. The result is commonly an error
in judgment, including (but not limited to) recurrent logical fallacies
(e.g., the conjunction fallacy), innumeracy, and emotionally motivated
shortcuts in reasoning.
Social and cognitive psychologists have thus considered it
"paradoxical" that humans can outperform powerful computers at complex
tasks, yet be deeply flawed and error-prone in simple, everyday
judgments.
Much of this research was carried out by Amos Tversky and Daniel Kahneman as an expansion of work by Herbert Simon on bounded rationality and satisficing.
Tversky and Kahneman argue that people are regularly biased in their
judgments under uncertainty, because in a speed-accuracy tradeoff they
often rely on fast and intuitive heuristics with wide margins of error
rather than slow calculations from statistical principles.
These errors are called "cognitive illusions" because they involve
systematic divergences between judgments and accepted, normative rules
in statistical prediction.
Gigerenzer has been critical of this view, arguing that it builds
from a flawed assumption that a unified "normative theory" of
statistical prediction and probability exists. His contention is that
cognitive psychologists neglect the diversity of ideas and assumptions
in probability theory, and in some cases, their mutual incompatibility. Consequently, Gigerenzer argues that many cognitive illusions are not violations of probability theory per se,
but involve some kind of experimenter confusion between subjective
probabilities with degrees of confidence and long-run outcome
frequencies.
Cosmides and Tooby similarly claim that different probabilistic
assumptions can be more or less normative and rational in different
types of situations, and that there is not general-purpose statistical
toolkit for making inferences across all informational domains. In a
review of several experiments they conclude, in support of Gigerenzer, that previous heuristics and biases experiments did not represent problems in an ecologically valid
way, and that re-representing problems in terms of frequencies rather
than single-event probabilities can make cognitive illusions largely
vanish.
Tversky and Kahneman refuted this claim, arguing that making
illusions disappear by manipulating them, whether they are cognitive or
visual, does not undermine the initially discovered illusion. They also
note that Gigerenzer ignores cognitive illusions resulting from
frequency data, e.g., illusory correlations such as the hot hand in basketball.
This, they note, is an example of an illusory positive autocorrelation
that cannot be corrected by converted data to natural frequencies.
For adaptationists,
EMT can be applied to inference under any informational domain, where
risk or uncertainty are present, such as predator avoidance, agency detection, or foraging.
Researchers advocating this adaptive rationality view argue that
evolutionary theory casts heuristics and biases in a new light, namely,
as computationally efficient and ecologically rational shortcuts, or instances of adaptive error management.
Base rate neglect
People often neglect base rates,
or true actuarial facts about the probability or rate of a phenomenon,
and instead give inappropriate amounts of weight to specific
observations. In a Bayesian model of inference, this would amount to an underweighting of the prior probability, which has been cited as evidence against the appropriateness of a normative Bayesian framework for modeling cognition. Frequency representations can resolve base rate neglect, and some
consider the phenomenon to be an experimental artifact, i.e., a result
of probabilities or rates being represented as mathematical
abstractions, which are difficult to intuitively think about.
Gigerenzer speculates an ecological reason for this, noting that
individuals learn frequencies through successive trials in nature.
Tversky and Kahneman refute Gigerenzer's claim, pointing to experiments
where subjects predicted a disease based on the presence vs. absence of
pre-specified symptoms across 250 trials, with feedback after each
trial. They note that base rate neglect was still found, despite the frequency formulation of subject trials in the experiment.
Conjunction fallacy
Another popular example of a supposed cognitive illusion is the conjunction fallacy,
described in an experiment by Tversky and Kahneman known as the "Linda
problem." In this experiment, participants are presented with a short
description of a person called Linda, who is 31 years old, single,
intelligent, outspoken, and went to a university where she majored in
philosophy, was concerned about discrimination and social justice, and
participated in anti-nuclear protests. When participants were asked if
it were more probable that Linda is (1) a bank teller, or (2) a bank
teller and a feminist, 85% responded with option 2, even though it
option 1 cannot be less probable than option 2. They concluded that this
was a product of a representativeness heuristic,
or a tendency to draw probabilistic inferences based on property
similarities between instances of a concept, rather than a statistically
structured inference.
Gigerenzer argued that the conjunction fallacy is based on a
single-event probability, and would dissolve under a frequentist
approach. He and other researchers demonstrate that conclusions from the
conjunction fallacy result from ambiguous language, rather than robust
statistical errors or cognitive illusions.
In an alternative version of the Linda problem, participants are told
that 100 people fit Linda's description and are asked how many are (1)
bank tellers and (2) bank tellers and feminists. Experimentally, this
version of the task appears to eliminate or mitigate the conjunction
fallacy.
Computational models
There
has been some question about how concept structuring and generalization
can be understood in terms of brain architecture and processes. This
question is impacted by a neighboring debate among theorists about the
nature of thought, specifically between connectionist and language of thought models. Concept generalization and classification have been modeled in a variety of connectionist models, or neural networks, specifically in domains like language learning and categorization. Some emphasize the limitations of pure connectionist models when they
are expected to generalize future instances after training on previous
instances. Gary Marcus,
for example, asserts that training data would have to be completely
exhaustive for generalizations to occur in existing connectionist
models, and that as a result, they do not handle novel observations
well. He further advocates an integrationist perspective between a
language of thought, consisting of symbol representations and
operations, and connectionist models than retain the distributed
processing that is likely used by neural networks in the brain.
Evidence in humans
In practice, humans routinely make conceptual, linguistic, and probabilistic generalizations from small amounts of data.
There is some debate about the utility of various tools of statistical
inference in understanding the mind, but it is commonly accepted that
the human mind is somehow an exceptionally apt prediction
machine, and that action-oriented processes underlying this phenomenon,
whatever they might entail, are at the core of cognition. Probabilistic inferences and generalization play central roles in concepts and categories and language learning, and infant studies are commonly used to understand the developmental trajectory of humans' intuitive statistical toolkit(s).
Infant studies
Developmental psychologists such as Jean Piaget
have traditionally argued that children do not develop the general
cognitive capacities for probabilistic inference and hypothesis testing
until concrete operational (age 7–11 years) and formal operational (age
12 years-adulthood) stages of development, respectively.
This is sometimes contrasted to a growing preponderance of
empirical evidence suggesting that humans are capable generalizers in
infancy. For example, looking-time experiments using expected outcomes
of red and white ping pong ball proportions found that 8-month-old
infants appear to make inferences about population characteristics from
which the sample came, and vice versa when given population-level data. Other experiments have similarly supported a capacity for probabilistic inference with 6- and 11-month-old infants, but not in 4.5-month-olds.
The colored ball paradigm in these experiments did not
distinguish the possibilities of infants' inferences based on quantity
vs. proportion, which was addressed in follow-up research where
12-month-old infants seemed to understand proportions, basing
probabilistic judgments - motivated by preferences for the more probable
outcomes - on initial evidence of the proportions in their available
options.
Critics of the effectiveness of looking-time tasks allowed infants to
search for preferred objects in single-sample probability tasks,
supporting the notion that infants can infer probabilities of single
events when given a small or large initial sample size.
The researchers involved in these findings have argued that humans
possess some statistically structured, inferential system during
preverbal stages of development and prior to formal education.
It is less clear, however, how and why generalization is observed
in infants: It might extend directly from detection and storage of
similarities and differences in incoming data, or frequency
representations. Conversely, it might be produced by something like
general-purpose Bayesian inference, starting with a knowledge base that
is iteratively conditioned on data to update subjective probabilities,
or beliefs.
This ties together questions about the statistical toolkit(s) that
might be involved in learning, and how they apply to infant and
childhood learning specifically.
Gopnik advocates the hypothesis that infant and childhood learning are examples of inductive inference, a general-purpose mechanism for generalization, acting upon specialized information structures ("theories") in the brain.
On this view, infants and children are essentially proto-scientists
because they regularly use a kind of scientific method, developing
hypotheses, performing experiments via play, and updating models about
the world based on their results.
For Gopnik, this use of scientific thinking and categorization in
development and everyday life can be formalized as models of Bayesian
inference.
An application of this view is the "sampling hypothesis," or the view
that individual variation in children's causal and probabilistic
inferences is an artifact of random sampling from a diverse set of
hypotheses, and flexible generalizations based on sampling behavior and
context.
These views, particularly those advocating general Bayesian updating
from specialized theories, are considered successors to Piaget’s theory
rather than wholesale refutations because they maintain its
domain-generality, viewing children as randomly and unsystematically
considering a range of models before selecting a probable conclusion.
In contrast to the general-purpose mechanistic view, some researchers advocate both domain-specific information structures and similarly specialized inferential mechanisms. For example, while humans do not usually excel at conditional probability
calculations, the use of conditional probability calculations are
central to parsing speech sounds into comprehensible syllables, a
relatively straightforward and intuitive skill emerging as early as 8
months.
Infants also appear to be good at tracking not only spatiotemporal
states of objects, but at tracking properties of objects, and these
cognitive systems appear to be developmentally distinct. This has been
interpreted as domain specific toolkits of inference, each of which
corresponds to separate types of information and has applications to concept learning.
Concept formation
Infants
use form similarities and differences to develop concepts relating to
objects, and this relies on multiple trials with multiple patterns,
exhibiting some kind of common property between trials. Infants appear to become proficient at this ability in particular by 12 months, but different concepts and properties employ different relevant principles of Gestalt psychology, many of which might emerge at different stages of development.
Specifically, infant categorization at as early as 4.5 months involves
iterative and interdependent processes by which exemplars (data) and
their similarities and differences are crucial for drawing boundaries
around categories.
These abstract rules are statistical by nature, because they can entail
common co-occurrences of certain perceived properties in past instances
and facilitate inferences about their structure in future instances. This idea has been extrapolated by Douglas Hofstadter and Emmanuel Sander, who argue that because analogy
is a process of inference relying on similarities and differences
between concept properties, analogy and categorization are fundamentally
the same process used for organizing concepts from incoming data.
Language learning
Infants
and small children are not only capable generalizers of trait quantity
and proportion, but of abstract rule-based systems such as language and music.
These rules can be referred to as “algebraic rules” of abstract
informational structure, and are representations of rule systems, or grammars.
For language, creating generalizations with Bayesian inference and
similarity detection has been advocated by researchers as a special case
of concept formation.
Infants appear to be proficient in inferring abstract and structural
rules from streams of linguistic sounds produced in their developmental
environments, and to generate wider predictions based on those rules.
For example, 9-month-old infants are capable of more quickly and
dramatically updating their expectations when repeated syllable strings
contain surprising features, such as rare phonemes.
In general, preverbal infants appear to be capable of discriminating
between grammars with which they have been trained with experience, and
novel grammars.
In 7-month-old infant looking-time tasks, infants seemed to pay more
attention to unfamiliar grammatical structures than to familiar ones,
and in a separate study using 3-syllable strings, infants appeared to
similarly have generalized expectations based on abstract syllabic
structure previously presented, suggesting that they used surface
occurrences, or data, in order to infer deeper abstract structure. This
was taken to support the “multiple hypotheses [or models]” view by the
researchers involved.
Evidence in non-human animals
Grey parrots
Multiple studies by Irene Pepperberg and her colleagues suggested that Grey parrots (Psittacus erithacus) have some capacity for recognizing numbers or number-like concepts, appearing to understand ordinality and cardinality of numerals.
Recent experiments also indicated that, given some language training
and capacity for referencing recognized objects, they also have some
ability to make inferences about probabilities and hidden object type
ratios.
Non-human primates
Experiments found that when reasoning about preferred vs. non-preferred food proportions, capuchin monkeys were able to make inferences about proportions inferred by sequentially sampled data. Rhesus monkeys
were similarly capable of using probabilistic and sequentially sampled
data to make inferences about rewarding outcomes, and neural activity in
the parietal cortex appeared to be involved in the decision-making
process when they made inferences. In a series of 7 experiments using a variety of relative frequency differences between banana pellets and carrots, orangutans, bonobos, chimpanzees and gorillas
also appeared to guide their decisions based on the ratios favoring the
banana pellets after this was established as their preferred food item.
Applications
Reasoning in medicine
Research
on reasoning in medicine, or clinical reasoning, usually focuses on
cognitive processes and/or decision-making outcomes among physicians and
patients. Considerations include assessments of risk, patient
preferences, and evidence-based medical knowledge. On a cognitive level, clinical inference relies heavily on interplay between abstraction, abduction, deduction, and induction. Intuitive "theories," or knowledge in medicine, can be understood as prototypes in concept spaces, or alternatively, as semantic networks.
Such models serve as a starting point for intuitive generalizations to
be made from a small number of cues, resulting in the physician's
tradeoff between the "art and science" of medical judgement.
This tradeoff was captured in an artificially intelligent (AI) program
called MYCIN, which outperformed medical students, but not experienced
physicians with extensive practice in symptom recognition.
Some researchers argue that despite this, physicians are prone to
systematic biases, or cognitive illusions, in their judgment (e.g.,
satisficing to make premature diagnoses, confirmation bias when diagnoses are suspected a priori).
Communication of patient risk
Statistical literacy and risk judgments have been described as problematic for physician-patient communication. For example, physicians frequently inflate the perceived risk of non-treatment, alter patients' risk perceptions by positively or negatively framing
single statistics (e.g., 97% survival rate vs. 3% death rate), and/or
fail to sufficiently communicate "reference classes" of probability
statements to patients.
The reference class is the object of a probability statement: If a
psychiatrist says, for example, “this medication can lead to a 30-50%
chance of a sexual problem,” it is ambiguous whether this means that
30-50% of patients will develop a sexual problem at some point, or if
all patients will have problems in 30-50% of their sexual encounters.
Base rates in clinical judgment
In studies of base rate neglect,
the problems given to participants often use base rates of disease
prevalence. In these experiments, physicians and non-physicians are
similarly susceptible to base rate neglect, or errors in calculating
conditional probability. Here is an example from an empirical survey
problem given to experienced physicians: Suppose that a hypothetical
cancer had a prevalence of 0.3% in the population, and the true positive
rate of a screening test was 50% with a false positive rate of 3%.
Given a patient with a positive test result, what is the probability
that the patient has cancer? When asked this question, physicians with
an average of 14 years experience in medical practice ranged in their
answers from 1-99%, with most answers being 47% or 50%. (The correct
answer is 5%.)
This observation of clinical base rate neglect and conditional
probability error has been replicated in multiple empirical studies.
Physicians' judgments in similar problems, however, improved
substantially when the rates were re-formulated as natural frequencies.