Scientific modelling is an activity that produces models representing empirical objects, phenomena, and physical processes, to make a particular part or feature of the world easier to understand, define, quantify, visualize, or simulate.
It requires selecting and identifying relevant aspects of a situation
in the real world and then developing a model to replicate a system with
those features. Different types of models may be used for different
purposes, such as conceptual models to better understand, operational models to operationalize, mathematical models to quantify, computational models to simulate, and graphical models to visualize the subject.
Modelling is an essential and inseparable part of many scientific
disciplines, each of which has its own ideas about specific types of
modelling. The following was said by John von Neumann.
... the sciences do not try to
explain, they hardly even try to interpret, they mainly make models. By a
model is meant a mathematical construct which, with the addition of
certain verbal interpretations, describes observed phenomena. The
justification of such a mathematical construct is solely and precisely
that it is expected to work—that is, correctly to describe phenomena
from a reasonably wide area.
A scientific model seeks to represent empirical objects, phenomena, and physical processes in a logical and objective way. All models are in simulacra, that is, simplified reflections of reality that, despite being approximations, can be extremely useful.
Building and disputing models is fundamental to the scientific
enterprise. Complete and true representation may be impossible, but
scientific debate often concerns which is the better model for a given
task, e.g., which is the more accurate climate model for seasonal
forecasting.
Attempts to formalize the principles of the empirical sciences use an interpretation to model reality, in the same way logicians axiomatize the principles of logic. The aim of these attempts is to construct a formal system that will not produce theoretical consequences that are contrary to what is found in reality.
Predictions or other statements drawn from such a formal system mirror
or map the real world only insofar as these scientific models are true.
For the scientist, a model is also a way in which the human thought processes can be amplified.
For instance, models that are rendered in software allow scientists to
leverage computational power to simulate, visualize, manipulate and gain
intuition about the entity, phenomenon, or process being represented.
Such computer models are in silico. Other types of scientific models are in vivo (living models, such as laboratory rats) and in vitro (in glassware, such as tissue culture).
Basics
Modelling as a substitute for direct measurement and experimentation
Models
are typically used when it is either impossible or impractical to
create experimental conditions in which scientists can directly measure
outcomes. Direct measurement of outcomes under controlled conditions (see Scientific method) will always be more reliable than modeled estimates of outcomes.
Within modeling and simulation,
a model is a task-driven, purposeful simplification and abstraction of a
perception of reality, shaped by physical, legal, and cognitive
constraints.
It is task-driven because a model is captured with a certain question
or task in mind. Simplifications leave all the known and observed
entities and their relation out that are not important for the task.
Abstraction aggregates information that is important but not needed in
the same detail as the object of interest. Both activities,
simplification, and abstraction, are done purposefully. However, they
are done based on a perception of reality. This perception is already a model
in itself, as it comes with a physical constraint. There are also
constraints on what we are able to legally observe with our current
tools and methods, and cognitive constraints that limit what we are able
to explain with our current theories. This model comprises the
concepts, their behavior, and their relations informal form and is often
referred to as a conceptual model. In order to execute the model, it needs to be implemented as a computer simulation. This requires more choices, such as numerical approximations or the use of heuristics.
Despite all these epistemological and computational constraints,
simulation has been recognized as the third pillar of scientific
methods: theory building, simulation, and experimentation.
Simulation
A simulation
is a way to implement the model, often employed when the model is too
complex for the analytical solution. A steady-state simulation provides
information about the system at a specific instant in time (usually at
equilibrium, if such a state exists). A dynamic simulation provides
information over time. A simulation shows how a particular object or
phenomenon will behave. Such a simulation can be useful for testing, analysis, or training in those cases where real-world systems or concepts can be represented by models.
Structure
Structure
is a fundamental and sometimes intangible notion covering the
recognition, observation, nature, and stability of patterns and
relationships of entities. From a child's verbal description of a
snowflake, to the detailed scientific analysis of the properties of magnetic fields,
the concept of structure is an essential foundation of nearly every
mode of inquiry and discovery in science, philosophy, and art.
Systems
A system
is a set of interacting or interdependent entities, real or abstract,
forming an integrated whole. In general, a system is a construct or
collection of different elements that together can produce results not
obtainable by the elements alone.
The concept of an 'integrated whole' can also be stated in terms of a
system embodying a set of relationships which are differentiated from
relationships of the set to other elements, and form relationships
between an element of the set and elements not a part of the relational
regime. There are two types of system models: 1) discrete in which the
variables change instantaneously at separate points in time and, 2)
continuous where the state variables change continuously with respect to
time.
Generating a model
Modelling
is the process of generating a model as a conceptual representation of
some phenomenon. Typically a model will deal with only some aspects of
the phenomenon in question, and two models of the same phenomenon may be
essentially different—that is to say, that the differences between them
comprise more than just a simple renaming of components.
Such differences may be due to differing requirements of the
model's end users, or to conceptual or aesthetic differences among the
modelers and to contingent decisions made during the modelling process.
Considerations that may influence the structure of a model might be the modeler's preference for a reduced ontology, preferences regarding statistical models versus deterministic models,
discrete versus continuous time, etc. In any case, users of a model
need to understand the assumptions made that are pertinent to its
validity for a given use.
Building a model requires abstraction. Assumptions are used in modelling in order to specify the domain of application of the model. For example, the special theory of relativity assumes an inertial frame of reference. This assumption was contextualized and further explained by the general theory of relativity.
A model makes accurate predictions when its assumptions are valid, and
might well not make accurate predictions when its assumptions do not
hold. Such assumptions are often the point with which older theories are
succeeded by new ones (the general theory of relativity works in non-inertial reference frames as well).
A model is evaluated first and foremost by its consistency to
empirical data; any model inconsistent with reproducible observations
must be modified or rejected. One way to modify the model is by
restricting the domain over which it is credited with having high
validity. A case in point is Newtonian physics, which is highly useful
except for the very small, the very fast, and the very massive phenomena
of the universe. However, a fit to empirical data alone is not
sufficient for a model to be accepted as valid. Factors important in
evaluating a model include:
Ability to explain past observations
Ability to predict future observations
Cost of use, especially in combination with other models
Refutability, enabling estimation of the degree of confidence in the model
Simplicity, or even aesthetic appeal
People may attempt to quantify the evaluation of a model using a utility function.
Visualization
Visualization
is any technique for creating images, diagrams, or animations to
communicate a message. Visualization through visual imagery has been an
effective way to communicate both abstract and concrete ideas since the
dawn of man. Examples from history include cave paintings, Egyptian hieroglyphs, Greek geometry, and Leonardo da Vinci's revolutionary methods of technical drawing for engineering and scientific purposes.
Space mapping
Space mapping
refers to a methodology that employs a "quasi-global" modelling
formulation to link companion "coarse" (ideal or low-fidelity) with
"fine" (practical or high-fidelity) models of different complexities. In
engineering optimization,
space mapping aligns (maps) a very fast coarse model with its related
expensive-to-compute fine model so as to avoid direct expensive
optimization of the fine model. The alignment process iteratively
refines a "mapped" coarse model (surrogate model).
One application of scientific modelling is the field of modelling and simulation,
generally referred to as "M&S". M&S has a spectrum of
applications which range from concept development and analysis, through
experimentation, measurement, and verification, to disposal analysis.
Projects and programs may use hundreds of different simulations,
simulators and model analysis tools.
The figure shows how modelling and simulation is used as a central
part of an integrated program in a defence capability development
process.
Internal validity is the extent to which a piece of evidence supports a claim about cause and effect,
within the context of a particular study. It is one of the most
important properties of scientific studies and is an important concept
in reasoning about evidence
more generally. Internal validity is determined by how well a study can
rule out alternative explanations for its findings (usually, sources of
systematic error or 'bias'). It contrasts with external validity, the extent to which results can justify conclusions about other contexts (that is, the extent to which results can be generalized). Both internal and external validity can be described using qualitative or quantitative forms of causal notation.
Details
Inferences are said to possess internal validity if a causal relationship between two variables is properly demonstrated.A valid causal inference may be made when three criteria are satisfied:
the "cause" precedes the "effect" in time (temporal precedence),
the "cause" and the "effect" tend to occur together (covariation), and
there are no plausible alternative explanations for the observed covariation (nonspuriousness).
In scientific experimental settings, researchers often change the state of one variable (the independent variable) to see what effect it has on a second variable (the dependent variable).
For example, a researcher might manipulate the dosage of a particular
drug between different groups of people to see what effect it has on
health. In this example, the researcher wants to make a causal
inference, namely, that different doses of the drug may be held responsible
for observed changes or differences. When the researcher may
confidently attribute the observed changes or differences in the
dependent variable to the independent variable (that is, when the
researcher observes an association between these variables and can rule
out other explanations or rival hypotheses), then the causal inference is said to be internally valid.
In many cases, however, the size of effects found in the dependent variable may not just depend on
variations in the independent variable,
the power of the instruments and statistical procedures used to measure and detect the effects, and
Rather, a number of variables or circumstances uncontrolled for (or
uncontrollable) may lead to additional or alternative explanations (a)
for the effects found and/or (b) for the magnitude of the effects found.
Internal validity, therefore, is more a matter of degree than of
either-or, and that is exactly why research designs other than true
experiments may also yield results with a high degree of internal
validity.
In order to allow for inferences with a high degree of internal
validity, precautions may be taken during the design of the study. As a
rule of thumb, conclusions based on direct manipulation of the
independent variable allow for greater internal validity than
conclusions based on an association observed without manipulation.
When considering only Internal Validity, highly controlled true
experimental designs (i.e. with random selection, random assignment to
either the control or experimental groups, reliable instruments,
reliable manipulation processes, and safeguards against confounding
factors) may be the "gold standard" of scientific research. However,
the very methods used to increase internal validity may also limit the
generalizability or external validity
of the findings. For example, studying the behavior of animals in a zoo
may make it easier to draw valid causal inferences within that context,
but these inferences may not generalize to the behavior of animals in
the wild. In general, a typical experiment in a laboratory, studying a
particular process, may leave out many variables that normally strongly
affect that process in nature.
Example threats
To recall eight of these threats to internal validity, use the mnemonic acronym, THIS MESS, which stands for:
Testing,
History,
Instrument change,
Statistical regression toward the mean,
Maturation,
Experimental mortality,
Selection, and
Selection Interaction.
Ambiguous temporal precedence
When
it is not known which variable changed first, it can be difficult to
determine which variable is the cause and which is the effect.
Confounding
A major threat to the validity of causal inferences is confounding:
Changes in the dependent variable may rather be attributed to
variations in a third variable which is related to the manipulated
variable. Where spurious relationships cannot be ruled out, rival hypotheses to the original causal inference may be developed.
Selection bias
Selection
bias refers to the problem that, at pre-test, differences between
groups exist that may interact with the independent variable and thus be
'responsible' for the observed outcome. Researchers and participants
bring to the experiment a myriad of characteristics, some learned and
others inherent. For example, sex, weight, hair, eye, and skin color,
personality, mental capabilities, and physical abilities, but also
attitudes like motivation or willingness to participate.
During the selection step of the research study, if an unequal
number of test subjects have similar subject-related variables there is a
threat to the internal validity. For example, a researcher created two
test groups, the experimental and the control groups. The subjects in
both groups are not alike with regard to the independent variable but
similar in one or more of the subject-related variables.
Self-selection also has a negative effect on the interpretive
power of the dependent variable. This occurs often in online surveys
where individuals of specific demographics opt into the test at higher
rates than other demographics.
History
Events
outside of the study/experiment or between repeated measures of the
dependent variable may affect participants' responses to experimental
procedures. Often, these are large-scale events (natural disaster,
political change, etc.) that affect participants' attitudes and
behaviors such that it becomes impossible to determine whether any
change on the dependent measures is due to the independent variable, or
the historical event.
Maturation
Subjects
change during the course of the experiment or even between
measurements. For example, young children might mature and their ability
to concentrate may change as they grow up. Both permanent changes, such
as physical growth and temporary ones like fatigue, provide "natural"
alternative explanations; thus, they may change the way a subject would
react to the independent variable. So upon completion of the study, the
researcher may not be able to determine if the cause of the discrepancy
is due to time or the independent variable.
Repeated testing (also referred to as testing effects)
Repeatedly
measuring the participants may lead to bias. Participants may remember
the correct answers or may be conditioned to know that they are being
tested. Repeatedly taking (the same or similar) intelligence tests
usually leads to score gains, but instead of concluding that the
underlying skills have changed for good, this threat to Internal
Validity provides a good rival hypothesis.
Instrument change (instrumentality)
The
instrument used during the testing process can change the experiment.
This also refers to observers being more concentrated or primed, or
having unconsciously changed the criteria they use to make judgments.
This can also be an issue with self-report measures given at different
times. In this case, the impact may be mitigated through the use of
retrospective pretesting. If any instrumentation changes occur, the
internal validity of the main conclusion is affected, as alternative
explanations are readily available.
This type of error occurs when subjects are selected on the basis of
extreme scores (one far away from the mean) during a test. For example,
when children with the worst reading scores are selected to participate
in a reading course, improvements at the end of the course might be due
to regression toward the mean and not the course's effectiveness. If the
children had been tested again before the course started, they would
likely have obtained better scores anyway.
Likewise, extreme outliers on individual scores are more likely to be
captured in one instance of testing but will likely evolve into a more
normal distribution with repeated testing.
This error occurs if inferences are made on the basis of only those
participants that have participated from the start to the end. However,
participants may have dropped out of the study before completion, and
maybe even due to the study or programme or experiment itself. For
example, the percentage of group members having quit smoking at
post-test was found much higher in a group having received a
quit-smoking training program than in the control group. However, in the
experimental group only 60% have completed the program.
If this attrition is systematically related to any feature of the study,
the administration of the independent variable, the instrumentation, or
if dropping out leads to relevant bias between groups, a whole class of
alternative explanations is possible that account for the observed
differences.
Selection-maturation interaction
This
occurs when the subject-related variables, color of hair, skin color,
etc., and the time-related variables, age, physical size, etc.,
interact. If a discrepancy between the two groups occurs between the
testing, the discrepancy may be due to the age differences in the age
categories.
Diffusion
If
treatment effects spread from treatment groups to control groups, a lack
of differences between experimental and control groups may be observed.
This does not mean, however, that the independent variable has no
effect or that there is no relationship between dependent and
independent variable.
Compensatory rivalry/resentful demoralization
Behavior
in the control groups may alter as a result of the study. For example,
control group members may work extra hard to see that the expected
superiority of the experimental group is not demonstrated. Again, this
does not mean that the independent variable produced no effect or that
there is no relationship between dependent and independent variable.
Vice versa, changes in the dependent variable may only be affected due
to a demoralized control group, working less hard or motivated, not due
to the independent variable.
Experimenter bias
Experimenter
bias occurs when the individuals who are conducting an experiment
inadvertently affect the outcome by non-consciously behaving in
different ways to members of control and experimental groups. It is
possible to eliminate the possibility of experimenter bias through the
use of double-blind study designs, in which the experimenter is not aware of the condition to which a participant belongs.
Mutual-internal-validity problem
Experiments
that have high internal validity can produce phenomena and results that
have no relevance in real life, resulting in the
mutual-internal-validity problem.
It arises when researchers use experimental results to develop theories
and then use those theories to design theory-testing experiments. This
mutual feedback between experiments and theories can lead to theories
that explain only phenomena and results in artificial laboratory
settings but not in real life.
In statistics, regression toward the mean (also called reversion to the mean, and reversion to mediocrity) is the phenomenon where if one sample of a random variable is extreme, the next sampling of the same random variable is likely to be closer to its mean.
Furthermore, when many random variables are sampled and the most
extreme results are intentionally picked out, it refers to the fact that
(in many cases) a second sampling of these picked-out variables will
result in "less extreme" results, closer to the initial mean of all of
the variables.
Mathematically, the strength of this "regression" effect is dependent on whether or not all of the random variables are drawn from the same distribution,
or if there are genuine differences in the underlying distributions for
each random variable. In the first case, the "regression" effect is
statistically likely to occur, but in the second case, it may occur less
strongly or not at all.
Regression toward the mean is thus a useful concept to consider
when designing any scientific experiment, data analysis, or test, which
intentionally selects the most extreme events - it indicates that
follow-up checks may be useful in order to avoid jumping to false
conclusions about these events; they may be genuine extreme events, a
completely meaningless selection due to statistical noise, or a mix of
the two cases.
Conceptual examples
Simple example: students taking a test
Consider
a class of students taking a 100-item true/false test on a subject.
Suppose that all students choose randomly on all questions. Then, each
student's score would be a realization of one of a set of independent and identically distributedrandom variables, with an expected mean
of 50. Naturally, some students will score substantially above 50 and
some substantially below 50 just by chance. If one selects only the top
scoring 10% of the students and gives them a second test on which they
again choose randomly on all items, the mean score would again be
expected to be close to 50. Thus the mean of these students would
"regress" all the way back to the mean of all students who took the
original test. No matter what a student scores on the original test, the
best prediction of their score on the second test is 50.
If choosing answers to the test questions was not random – i.e.
if there were no luck (good or bad) or random guessing involved in the
answers supplied by the students – then all students would be expected
to score the same on the second test as they scored on the original
test, and there would be no regression toward the mean.
Most realistic situations fall between these two extremes: for example, one might consider exam scores as a combination of skill and luck.
In this case, the subset of students scoring above average would be
composed of those who were skilled and had not especially bad luck,
together with those who were unskilled, but were extremely lucky. On a
retest of this subset, the unskilled will be unlikely to repeat their
lucky break, while the skilled will have a second chance to have bad
luck. Hence, those who did well previously are unlikely to do quite as
well in the second test even if the original cannot be replicated.
The following is an example of this second kind of regression
toward the mean. A class of students takes two editions of the same test
on two successive days. It has frequently been observed that the worst
performers on the first day will tend to improve their scores on the
second day, and the best performers on the first day will tend to do
worse on the second day. The phenomenon occurs because student scores
are determined in part by underlying ability and in part by chance. For
the first test, some will be lucky, and score more than their ability,
and some will be unlucky and score less than their ability. Some of the
lucky students on the first test will be lucky again on the second test,
but more of them will have (for them) average or below average scores.
Therefore, a student who was lucky and over-performed their ability on
the first test is more likely to have a worse score on the second test
than a better score. Similarly, students who unluckily score less than
their ability on the first test will tend to see their scores increase
on the second test. The larger the influence of luck in producing an
extreme event, the less likely the luck will repeat itself in multiple
events.
Other examples
If
your favourite sports team won the championship last year, what does
that mean for their chances for winning next season? To the extent this
result is due to skill (the team is in good condition, with a top coach,
etc.), their win signals that it is more likely they will win again
next year. But the greater the extent this is due to luck (other teams
embroiled in a drug scandal, favourable draw, draft picks turned out to
be productive, etc.), the less likely it is they will win again next
year.
If a business organisation has a highly profitable quarter,
despite the underlying reasons for its performance being unchanged, it
is likely to do less well the next quarter.
Baseball players who hit well in their rookie season are likely to do worse their second; the "sophomore slump". Similarly, regression toward the mean is an explanation for the Sports Illustrated cover jinx
— periods of exceptional performance which results in a cover feature
are likely to be followed by periods of more mediocre performance,
giving the impression that appearing on the cover causes an athlete's
decline.
History
Discovery
The concept of regression comes from genetics and was popularized by Sir Francis Galton during the late 19th century with the publication of Regression towards mediocrity in hereditary stature.
Galton observed that extreme characteristics (e.g., height) in parents
are not passed on completely to their offspring. Rather, the
characteristics in the offspring regress toward a mediocre
point (a point which has since been identified as the mean). By
measuring the heights of hundreds of people, he was able to quantify
regression to the mean, and estimate the size of the effect. Galton
wrote that, "the average regression of the offspring is a constant
fraction of their respective mid-parental
deviations". This means that the difference between a child and its
parents for some characteristic is proportional to its parents'
deviation from typical people in the population. If its parents are each
two inches taller than the averages for men and women, then, on
average, the offspring will be shorter than its parents by some factor
(which, today, we would call one minus the regression coefficient)
times two inches. For height, Galton estimated this coefficient to be
about 2/3: the height of an individual will measure around a midpoint
that is two thirds of the parents' deviation from the population
average.
Galton also published these results using the simpler example of pellets falling through a Galton board to form a normal distribution
centred directly under their entrance point. These pellets might then
be released down into a second gallery corresponding to a second
measurement. Galton then asked the reverse question: "From where did
these pellets come?"
The answer was not 'on average directly above'. Rather it was 'on average, more towards the middle',
for the simple reason that there were more pellets above it towards the
middle that could wander left than there were in the left extreme that
could wander to the right, inwards.
Evolving usage of the term
Galton coined the term "regression" to describe an observable fact in the inheritance of multi-factorial quantitative genetic
traits: namely that traits of the offspring of parents who lie at the
tails of the distribution often tend to lie closer to the centre, the
mean, of the distribution. He quantified this trend, and in doing so
invented linear regression
analysis, thus laying the groundwork for much of modern statistical
modelling. Since then, the term "regression" has been used in other
contexts, and it may be used by modern statisticians to describe
phenomena such as sampling bias which have little to do with Galton's original observations in the field of genetics.
Galton's explanation for the regression phenomenon he observed in
biology was stated as follows: "A child inherits partly from his
parents, partly from his ancestors. Speaking generally, the further his
genealogy goes back, the more numerous and varied will his ancestry
become, until they cease to differ from any equally numerous sample
taken at haphazard from the race at large."
Galton's statement requires some clarification in light of knowledge of
genetics: Children receive genetic material from their parents, but
hereditary information (e.g. values of inherited traits) from earlier
ancestors can be passed through their parents (and may not have been expressed
in their parents). The mean for the trait may be nonrandom and
determined by selection pressure, but the distribution of values around
the mean reflects a normal statistical distribution.
The population-genetic
phenomenon studied by Galton is a special case of "regression to the
mean"; the term is often used to describe many statistical phenomena in
which data exhibit a normal distribution around a mean.
Importance
Regression toward the mean is a significant consideration in the design of experiments.
Take a hypothetical example of 1,000 individuals of a similar age
who were examined and scored on the risk of experiencing a heart
attack. Statistics could be used to measure the success of an
intervention on the 50 who were rated at the greatest risk, as measured
by a test with a degree of uncertainty. The intervention could be a
change in diet, exercise, or a drug treatment. Even if the interventions
are worthless, the test group would be expected to show an improvement
on their next physical exam, because of regression toward the mean. The
best way to combat this effect is to divide the group randomly into a
treatment group that receives the treatment, and a group that does not.
The treatment would then be judged effective only if the treatment
group improves more than the untreated group.
Alternatively, a group of disadvantaged
children could be tested to identify the ones with most college
potential. The top 1% could be identified and supplied with special
enrichment courses, tutoring, counseling and computers. Even if the
program is effective, their average scores may well be less when the
test is repeated a year later. However, in these circumstances it may be
considered unethical to have a control group of disadvantaged children
whose special needs are ignored. A mathematical calculation for shrinkage can adjust for this effect, although it will not be as reliable as the control group method (see also Stein's example).
The effect can also be exploited for general inference and
estimation. The hottest place in the country today is more likely to be
cooler tomorrow than hotter, as compared to today. The best performing
mutual fund over the last three years is more likely to see relative
performance decline than improve over the next three years. The most
successful Hollywood actor of this year is likely to have less gross
than more gross for his or her next movie. The baseball player with the
highest batting average by the All-Star break is more likely to have a
lower average than a higher average over the second half of the season.
Misunderstandings
The concept of regression toward the mean can be misused very easily.
In the student test example above, it was assumed implicitly that
what was being measured did not change between the two measurements.
Suppose, however, that the course was pass/fail and students were
required to score above 70 on both tests to pass. Then the students who
scored under 70 the first time would have no incentive to do well, and
might score worse on average the second time. The students just over 70,
on the other hand, would have a strong incentive to study and
concentrate while taking the test. In that case one might see movement away
from 70, scores below it getting lower and scores above it getting
higher. It is possible for changes between the measurement times to
augment, offset or reverse the statistical tendency to regress toward
the mean.
Statistical regression toward the mean is not a causal
phenomenon. A student with the worst score on the test on the first day
will not necessarily increase his score substantially on the second day
due to the effect. On average, the worst scorers improve, but that is
only true because the worst scorers are more likely to have been unlucky
than lucky. To the extent that a score is determined randomly, or that a
score has random variation or error, as opposed to being determined by
the student's academic ability or being a "true value", the phenomenon
will have an effect. A classic mistake in this regard was in education.
The students that received praise for good work were noticed to do more
poorly on the next measure, and the students who were punished for poor
work were noticed to do better on the next measure. The educators
decided to stop praising and keep punishing on this basis.
Such a decision was a mistake, because regression toward the mean is
not based on cause and effect, but rather on random error in a natural
distribution around a mean.
Although extreme individual measurements regress toward the mean, the second sample
of measurements will be no closer to the mean than the first. Consider
the students again. Suppose the tendency of extreme individuals is to
regress 10% of the way toward the mean of 80, so a student who scored 100 the first day is expected
to score 98 the second day, and a student who scored 70 the first day
is expected to score 71 the second day. Those expectations are closer to
the mean than the first day scores. But the second day scores will vary
around their expectations; some will be higher and some will be lower.
For extreme individuals, we expect the second score to be closer to the
mean than the first score, but for all individuals, we expect the distribution of distances from the mean to be the same on both sets of measurements.
Related to the point above, regression toward the mean works
equally well in both directions. We expect the student with the highest
test score on the second day to have done worse on the first day. And if
we compare the best student on the first day to the best student on the
second day, regardless of whether it is the same individual or not,
there is no tendency to regress toward the mean going in either
direction. We expect the best scores on both days to be equally far from
the mean.
Many phenomena tend to be attributed to the wrong causes when regression to the mean is not taken into account.
An extreme example is Horace Secrist's 1933 book The Triumph of Mediocrity in Business,
in which the statistics professor collected mountains of data to prove
that the profit rates of competitive businesses tend toward the average
over time. In fact, there is no such effect; the variability of profit
rates is almost constant over time. Secrist had only described the
common regression toward the mean. One exasperated reviewer, Harold Hotelling,
likened the book to "proving the multiplication table by arranging
elephants in rows and columns, and then doing the same for numerous
other kinds of animals".
The calculation and interpretation of "improvement scores" on
standardized educational tests in Massachusetts probably provides
another example of the regression fallacy.
In 1999, schools were given improvement goals. For each school, the
Department of Education tabulated the difference in the average score
achieved by students in 1999 and in 2000. It was quickly noted that most
of the worst-performing schools had met their goals, which the
Department of Education took as confirmation of the soundness of their
policies. However, it was also noted that many of the supposedly best
schools in the Commonwealth, such as Brookline High School (with 18
National Merit Scholarship finalists) were declared to have failed. As
in many cases involving statistics and public policy, the issue is
debated, but "improvement scores" were not announced in subsequent years
and the findings appear to be a case of regression to the mean.
The psychologist Daniel Kahneman, winner of the 2002 Nobel Memorial Prize in Economic Sciences,
pointed out that regression to the mean might explain why rebukes can
seem to improve performance, while praise seems to backfire.
I had the most satisfying Eureka
experience of my career while attempting to teach flight instructors
that praise is more effective than punishment for promoting
skill-learning. When I had finished my enthusiastic speech, one of the
most seasoned instructors in the audience raised his hand and made his
own short speech, which began by conceding that positive reinforcement
might be good for the birds, but went on to deny that it was optimal for
flight cadets. He said, "On many occasions I have praised flight cadets
for clean execution of some aerobatic maneuver, and in general when
they try it again, they do worse. On the other hand, I have often
screamed at cadets for bad execution, and in general they do better the
next time. So please don't tell us that reinforcement works and
punishment does not, because the opposite is the case." This was a
joyous moment, in which I understood an important truth about the world:
because we tend to reward others when they do well and punish them when
they do badly, and because there is regression to the mean, it is part
of the human condition that we are statistically punished for rewarding
others and rewarded for punishing them. I immediately arranged a
demonstration in which each participant tossed two coins at a target
behind his back, without any feedback. We measured the distances from
the target and could see that those who had done best the first time had
mostly deteriorated on their second try, and vice versa. But I knew
that this demonstration would not undo the effects of lifelong exposure
to a perverse contingency.
UK law enforcement policies have encouraged the visible siting of static or mobile speed cameras at accident blackspots. This policy was justified by a perception that there is a corresponding reduction in serious road traffic accidents
after a camera is set up. However, statisticians have pointed out that,
although there is a net benefit in lives saved, failure to take into
account the effects of regression to the mean results in the beneficial
effects being overstated.
Statistical analysts have long recognized the effect of
regression to the mean in sports; they even have a special name for it:
the "sophomore slump". For example, Carmelo Anthony of the NBA's Denver Nuggets
had an outstanding rookie season in 2004. It was so outstanding that he
could not be expected to repeat it: in 2005, Anthony's numbers had
dropped from his rookie season. The reasons for the "sophomore slump"
abound, as sports rely on adjustment and counter-adjustment, but
luck-based excellence as a rookie is as good a reason as any. Regression
to the mean in sports performance may also explain the apparent "Sports Illustrated cover jinx" and the "Madden Curse". John Hollinger has an alternative name for the phenomenon of regression to the mean: the "fluke rule"[citation needed], while Bill James calls it the "Plexiglas Principle".
Because popular lore has focused on regression toward the mean as
an account of declining performance of athletes from one season to the
next, it has usually overlooked the fact that such regression can also
account for improved performance. For example, if one looks at the batting average of Major League Baseball
players in one season, those whose batting average was above the league
mean tend to regress downward toward the mean the following year, while
those whose batting average was below the mean tend to progress upward
toward the mean the following year.
Other statistical phenomena
Regression
toward the mean simply says that, following an extreme random event,
the next random event is likely to be less extreme. In no sense does the
future event "compensate for" or "even out" the previous event, though
this is assumed in the gambler's fallacy (and the variant law of averages). Similarly, the law of large numbers
states that in the long term, the average will tend toward the expected
value, but makes no statement about individual trials. For example,
following a run of 10 heads on a flip of a fair coin (a rare, extreme
event), regression to the mean states that the next run of heads will
likely be less than 10, while the law of large numbers states that in
the long term, this event will likely average out, and the average
fraction of heads will tend to 1/2. By contrast, the gambler's fallacy
incorrectly assumes that the coin is now "due" for a run of tails to
balance out.
The opposite effect is regression to the tail, resulting from a
distribution with non-vanishing probability density toward infinity.
Definition for simple linear regression of data points
This is the definition of regression toward the mean that closely follows Sir Francis Galton's original usage.
Suppose there are n data points {yi, xi}, where i = 1, 2, ..., n. We want to find the equation of the regression line, i.e. the straight line
which would provide a best fit for the data points. (Note that a
straight line may not be the appropriate regression curve for the given
data points.) Here the best will be understood as in the least-squares approach: such a line that minimizes the sum of squared residuals of the linear regression model. In other words, numbers α and β solve the following minimization problem:
Find , where
Using calculus it can be shown that the values of α and β that minimize the objective function Q are
where rxy is the sample correlation coefficient between x and y, sx is the standard deviation of x, and sy is correspondingly the standard deviation of y. Horizontal bar over a variable means the sample average of that variable. For example:
Substituting the above expressions for and into yields fitted values
which yields
This shows the role rxy plays in the regression line of standardized data points.
If −1 < rxy < 1, then we say that
the data points exhibit regression toward the mean. In other words, if
linear regression is the appropriate model for a set of data points
whose sample correlation coefficient is not perfect, then there is
regression toward the mean. The predicted (or fitted) standardized value
of y is closer to its mean than the standardized value of x is to its mean.
Definitions for bivariate distribution with identical marginal distributions
Restrictive definition
Let X1, X2 be random variables with identical marginal distributions with mean μ. In this formalization, the bivariate distribution of X1 and X2 is said to exhibit regression toward the mean if, for every number c > μ, we have
μ ≤ E[X2 | X1 = c] < c,
with the reverse inequalities holding for c < μ.
The following is an informal description of the above definition. Consider a population of widgets. Each widget has two numbers, X1 and X2 (say, its left span (X1 ) and right span (X2)). Suppose that the probability distributions of X1 and X2 in the population are identical, and that the means of X1 and X2 are both μ. We now take a random widget from the population, and denote its X1 value by c. (Note that c may be greater than, equal to, or smaller than μ.) We have no access to the value of this widget's X2 yet. Let d denote the expected value of X2 of this particular widget. (i.e. Let d denote the average value of X2 of all widgets in the population with X1=c.) If the following condition is true:
Whatever the value c is, d lies between μ and c (i.e.d is closer to μ than c is),
then we say that X1 and X2 show regression toward the mean.
This definition accords closely with the current common usage,
evolved from Galton's original usage, of the term "regression toward the
mean". It is "restrictive" in the sense that not every bivariate
distribution with identical marginal distributions exhibits regression
toward the mean (under this definition).
Theorem
If a pair (X, Y) of random variables follows a bivariate normal distribution, then the conditional mean E(Y|X) is a linear function of X. The correlation coefficientr between X and Y, along with the marginal means and variances of X and Y, determines this linear relationship:
where E[X] and E[Y] are the expected values of X and Y, respectively, and σx and σy are the standard deviations of X and Y, respectively.
Hence the conditional expected value of Y, given that X is tstandard deviations above its mean (and that includes the case where it's below its mean, when t < 0), is rt standard deviations above the mean of Y. Since |r| ≤ 1, Y is no farther from the mean than X is, as measured in the number of standard deviations.
Hence, if 0 ≤ r < 1, then (X, Y) shows regression toward the mean (by this definition).
General definition
The following definition of reversion toward the mean has been proposed by Samuels as an alternative to the more restrictive definition of regression toward the mean above.
Let X1, X2 be random variables with identical marginal distributions with mean μ. In this formalization, the bivariate distribution of X1 and X2 is said to exhibit reversion toward the mean if, for every number c, we have
μ ≤ E[X2 | X1 > c] < E[X1 | X1 > c], and
μ ≥ E[X2 | X1 < c] > E[X1 | X1 < c]
This definition is "general" in the sense that every bivariate distribution with identical marginal distributions exhibits reversion toward the mean, provided some weak criteria are satisfied (non-degeneracy and weak positive dependence as described in Samuels's paper).
Alternative definition in financial usage
Jeremy Siegel uses the term "return to the mean" to describe a financial time series in which "returns can be very unstable in the short run but very stable in the long run." More quantitatively, it is one in which the standard deviation of average annual returns declines faster than the inverse of the holding period, implying that the process is not a random walk,
but that periods of lower returns are systematically followed by
compensating periods of higher returns, as is the case in many seasonal
businesses, for example.