Thomas Robert Malthus, after whom the Malthusian trap is named
A Malthusian catastrophe (also known as Malthusian trap, population trap, Malthusian check, Malthusian crisis, Malthusian spectre, Malthusian crunch) occurs when population growth outpaces agriculturalproduction, causing population to be limited by famine or war. It is named after Thomas Robert Malthus,
who suggested that while technological advances could increase a
society's supply of resources, such as food, and thereby improve the standard of living, the resource abundance would enable population growth,
which would eventually bring the per capita supply of resources back to
its original level. Some economists contend that since the industrial revolution, mankind has broken out of the trap. Others argue that the continuation of extreme poverty indicates that the Malthusian trap continues to operate. Others further argue that due to lack of food availability coupled with excessive pollution, developing countries show more evidence of the trap.
He argued that society has a natural propensity to increase its
population, a propensity that causes population growth to be the best
measure of the happiness of a people: "The happiness of a country does
not depend, absolutely, upon its poverty, or its riches, upon its youth,
or its age, upon its being thinly, or fully inhabited, but upon the
rapidity with which it is increasing, upon the degree in which the
yearly increase of food approaches to the yearly increase of an
unrestricted population."
However, the propensity for population increase also leads to a natural cycle of abundance and shortages:
We will suppose the means of
subsistence in any country just equal to the easy support of its
inhabitants. The constant effort towards population...increases the
number of people before the means of subsistence are increased. The food
therefore which before supported seven millions, must now be divided
among seven millions and a half or eight millions. The poor consequently
must live much worse, and many of them be reduced to severe distress.
The number of labourers also being above the proportion of the work in
the market, the price of labour must tend toward a decrease; while the
price of provisions would at the same time tend to rise. The labourer
therefore must work harder to earn the same as he did before. During
this season of distress, the discouragements to marriage, and the
difficulty of rearing a family are so great, that population is at a
stand. In the mean time the cheapness of labour, the plenty of
labourers, and the necessity of an increased industry amongst them,
encourage cultivators to employ more labour upon their land; to turn up
fresh soil, and to manure and improve more completely what is already in
tillage; till ultimately the means of subsistence become in the same
proportion to the population as at the period from which we set out. The
situation of the labourer being then again tolerably comfortable, the
restraints to population are in some degree loosened; and the same
retrograde and progressive movements with respect to happiness are
repeated.
Famine seems to be the last, the
most dreadful resource of nature. The power of population is so superior
to the power of the earth to produce subsistence for man, that
premature death must in some shape or other visit the human race. The
vices of mankind are active and able ministers of depopulation. They are
the precursors in the great army of destruction, and often finish the
dreadful work themselves. But should they fail in this war of
extermination, sickly seasons, epidemics, pestilence, and plague advance
in terrific array, and sweep off their thousands and tens of thousands.
Should success be still incomplete, gigantic inevitable famine stalks
in the rear, and with one mighty blow levels the population with the
food of the world.
Malthus faced opposition from economists both during his life and since. A vocal critic several decades later was Friedrich Engels.
Modern formulation
The modern formulation of the Malthusian theory was developed by Qumarul Ashraf and Oded Galor.
Their theoretical structure suggests that as long as: (i) higher income
has a positive effect on reproductive success, and (ii) land is limited
factor of production, then technological progress has only a temporary
effect in income per capita. While in the short-run technological
progress increases income per capita, resource abundance created by
technological progress would enable population growth, and would eventually bring the per capita income back to its original long-run level.
The testable prediction of the theory is that during the
Malthusian epoch technologically advanced economies were characterized
by higher population density, but their level of income per capita was
not different than the level in societies that are technologically
backward.
Preventive vs. positive population controls
Malthus proposed two kinds of population checks: preventive and positive.
A preventive check is a conscious decision to delay marriage or abstain from procreation based on a lack of resources.
Malthus argued that man is incapable of ignoring the consequences of
uncontrolled population growth, and would intentionally avoid
contributing to it.
According to Malthus, a positive check is any event or circumstance
that shortens the human life span. The primary examples of this are war, plague and famine. However, poor health and economic conditions are also considered instances of positive checks.
Neo-Malthusian theory
The rapid increase in the global population since 1900 exemplifies
Malthus's predicted population patterns, whereby expansion of food
supply has encouraged population growth. "Neo-Malthusianism" may be used
as a label for those who are concerned that human overpopulation may increase resource depletion or environmental degradation to a degree that is not sustainable. Many in environmental movements express concern over the potential dangers of population growth. In 1968, ecologist Garrett Hardin published an influential essay in Science that drew heavily from Malthusian theory. His essay, "The Tragedy of the Commons," argued that "a finite world can support only a finite population" and that "freedom to breed will bring ruin to all." The Club of Rome published a famous book entitled The Limits to Growth in 1972. Paul R. Ehrlich is a prominent neo-Malthusian who first raised concerns in 1968 with the publication of The Population Bomb.
Wheat yields in developing countries since 1961, in kg/ha.
The steep rise in crop yields in the U.S. began in the 1940s. The
percentage of growth was fastest in the early rapid growth stage. In
developing countries maize yields are still rapidly rising.
A study conducted in 2009
said that food production will have to increase by 70% over the next 40
years, and food production in the developing world will need to double.
This is a result of the increasing population (world population will
increase to 9.1 billion in 2050, where there are just 7.8 billion people
today). The effects of global warming
(floods, droughts, extreme weather events, ...) are expected to
negatively affect food production, with different impacts in different
regions. As a result, we will need to use the scarce natural resources more efficiently and adapt to climate change. The use of agricultural resources for biofuels may also put downward pressure on food availability.
Evidence in support
Research
indicates that technological superiority and higher land productivity
had significant positive effects on population density but insignificant
effects on the standard of living during the time period 1–1500 AD.
In addition, scholars have reported on the lack of a significant trend
of wages in various places over the world for very long stretches of
time.
In Babylonia during the period 1800 to 1600 BC, for example, the daily
wage for a common laborer was enough to buy about 15 pounds of wheat.
In Classical Athens in about 328 BC, the corresponding wage could buy
about 24 pounds of wheat. In England in 1800 AD the wage was about 13
pounds of wheat.
In spite of the technological developments across these societies, the
daily wage hardly varied. In Britain between 1200 and 1800, only
relatively minor fluctuations from the mean (less than a factor of two)
in real wages occurred. Following depopulation by the Black Death and other epidemics, real income in Britain peaked around 1450–1500 and began declining until the British Agricultural Revolution. Historian Walter Scheidel
posits that waves of plague following the initial outbreak of the Black
Death throughout Europe had a leveling effect that changed the ratio of
land to labor, reducing the value of the former while boosting that of
the latter, which lowered economic inequality
by making employers and landowners less well off while improving the
economic prospects and living standards of workers. He says that "the
observed improvement in living standards of the laboring population was
rooted in the suffering and premature death of tens of millions over the
course of several generations." This leveling effect was reversed by a
"demographic recovery that resulted in renewed population pressure."
Robert Fogel
published a study of lifespans and nutrition from about a century
before Malthus to the 19th century that examined European birth and
death records, military and other records of height and weight that
found significant stunted height and low body weight indicative of
chronic hunger and malnutrition. He also found short lifespans that he
attributed to chronic malnourishment which left people susceptible to
disease. Lifespans, height and weight began to steadily increase in the
UK and France after 1750. Fogel's findings are consistent with
estimates of available food supply.
Theory of breakout via technology
Industrial Revolution
Some
researchers contend that a British breakout occurred due to
technological improvements and structural change away from agricultural
production, while coal, capital, and trade played a minor role. Economic historian Gregory Clark, building on the insights of Galor and Moav, has argued, in his book A Farewell to Alms,
that a British breakout may have been caused by differences in
reproduction rates among the rich and the poor (the rich were more
likely to marry, tended to have more children, and, in a society where
disease was rampant and childhood mortality at times approached 50%,
upper-class children were more likely to survive to adulthood than poor
children.) This in turn led to sustained "downward mobility": the
descendants of the rich becoming more populous in British society and
spreading middle-class values such as hard work and literacy.
20th century
Global deaths in conflicts since the year 1400
A
chart of estimated annual growth rates in world population, 1800–2005.
Rates before 1950 are annualized historical estimates from the US Census
Bureau. Red = USCB projections to 2025.
Growth
in food production has historically been greater than the population
growth. Food per person increased since 1961. The graph runs up to
slightly past 2010.
After World War II, mechanized agriculture produced a dramatic increase in productivity of agriculture and the Green Revolution
greatly increased crop yields, expanding the world's food supply while
lowering food prices. In response, the growth rate of the world's
population accelerated rapidly, resulting in predictions by Paul R. Ehrlich, Simon Hopkins,
and many others of an imminent Malthusian catastrophe. However,
populations of most developed countries grew slowly enough to be
outpaced by gains in productivity.
A 2004 study by a group of prominent economists and ecologists, including Kenneth Arrow and Paul Ehrlich
suggests that the central concerns regarding sustainability have
shifted from population growth to the consumption/savings ratio, due to
shifts in population growth rates since the 1970s. Empirical estimates
show that public policy (taxes or the establishment of more complete
property rights) can promote more efficient consumption and investment
that are sustainable in an ecological sense; that is, given the current
(relatively low) population growth rate, the Malthusian catastrophe can
be avoided by either a shift in consumer preferences or public policy that induces a similar shift.
Criticism
Karl Marx and Friedrich Engels
argued that Malthus failed to recognize a crucial difference between
humans and other species. In capitalist societies, as Engels put it,
scientific and technological "progress is as unlimited and at least as
rapid as that of population". Marx argued, even more broadly, that the growth of both a human population in toto and the "relative surplus population" within it, occurred in direct proportion to accumulation.
Henry George in Progress and Poverty
(1879) criticized Malthus's view that population growth was a cause of
poverty, arguing that poverty was caused by the concentration of
ownership of land and natural resources. George noted that humans are
distinct from other species, because unlike most species humans can use
their minds to leverage the reproductive forces of nature to their
advantage. He wrote, "Both the jayhawk and the man eat chickens; but the
more jayhawks, the fewer chickens, while the more men, the more
chickens."
D. E. C. Eversley observed that Malthus appeared unaware of the
extent of industrialization, and either ignored or discredited the
possibility that it could improve living conditions of the poorer
classes.
Barry Commoner believed in The Closing Circle
(1971) that technological progress will eventually reduce the
demographic growth and environmental damage created by civilization. He
also opposed coercive measures postulated by neo-malthusian movements of
his time arguing that their cost will fall disproportionately on the
low-income population who is struggling already.
Ester Boserup
suggested that expanding population leads to agricultural
intensification and development of more productive and less
labor-intensive methods of farming. Thus, human population levels
determines agricultural methods, rather than agricultural methods
determining population.
The theory’s Malthusian premise has
been proven wrong since 1963, when the rate of population growth
reached a frightening 2 percent a year but then began dropping. The 1963
inflection point showed that the imagined soaring J-curve of human
increase was instead a normal S-curve. The growth rate was leveling off.
No one thought the growth rate might go negative and the population
start shrinking in this century without an overshoot and crash, but that
is what is happening.
Short-term trends, even on the scale of decades or centuries, cannot
prove or disprove the existence of mechanisms promoting a Malthusian
catastrophe over longer periods. However, due to the prosperity of a
major fraction of the human population at the beginning of the 21st
century, and the debatability of the predictions for ecological collapse made by Paul R. Ehrlich in the 1960s and 1970s, some people, such as economist Julian L. Simon and medical statistician Hans Rosling questioned its inevitability.
Joseph Tainter asserts that science has diminishing marginal return and that scientific progress is becoming more difficult, harder to
achieve, and more costly, which may reduce efficiency of the factors
that prevented the Malthusian scenarios from happening in the past.
The view that a "breakout" from the Malthusian trap has led to an era of sustained economic growth is explored by "unified growth theory".
One branch of unified growth theory is devoted to the interaction
between human evolution and economic development. In particular, Oded
Galor and Omer Moav argue that the forces of natural selection during
the Malthusian epoch selected beneficial traits to the growth process
and this growth enhancing change in the composition of human traits
brought about the escape from the Malthusian trap, the demographic
transition, and the take-off to modern growth.
The prisoner's dilemma is a standard example of a game analyzed in game theory that shows why two completely rational individuals might not cooperate, even if it appears that it is in their best interests to do so. It was originally framed by Merrill Flood and Melvin Dresher while working at RAND in 1950. Albert W. Tucker formalized the game with prison sentence rewards and named it "prisoner's dilemma", presenting it as follows:
Two members of a criminal gang are
arrested and imprisoned. Each prisoner is in solitary confinement with
no means of communicating with the other. The prosecutors lack
sufficient evidence to convict the pair on the principal charge, but
they have enough to convict both on a lesser charge. Simultaneously, the
prosecutors offer each prisoner a bargain. Each prisoner is given the
opportunity either to betray the other by testifying that the other
committed the crime, or to cooperate with the other by remaining silent.
The possible outcomes are:
If A and B each betray the other, each of them serves two years in prison
If A betrays B but B remains silent, A will be set free and B will serve three years in prison
If A remains silent but B betrays A, A will serve three years in prison and B will be set free
If A and B both remain silent, both of them will serve only one year in prison (on the lesser charge).
It is implied that the prisoners will have no opportunity to reward
or punish their partner other than the prison sentences they get and
that their decision will not affect their reputation in the future.
Because betraying a partner offers a greater reward than cooperating
with them, all purely rational self-interested prisoners will betray the
other, meaning the only possible outcome for two purely rational
prisoners is for them to betray each other. In reality, humans display a systemic bias
towards cooperative behavior in this and similar games despite what is
predicted by simple models of "rational" self-interested action.
This bias towards cooperation has been known since the test was first
conducted at RAND; the secretaries involved trusted each other and
worked together for the best common outcome. The prisoner's dilemma became the focus of extensive experimental research.
An extended "iterated" version of the game also exists. In this
version, the classic game is played repeatedly between the same
prisoners, who continuously have the opportunity to penalize the other
for previous decisions. If the number of times the game will be played
is known to the players, then (by backward induction)
two classically rational players will betray each other repeatedly, for
the same reasons as the single-shot variant. In an infinite or unknown
length game there is no fixed optimum strategy, and prisoner's dilemma
tournaments have been held to compete and test algorithms for such
cases.
The prisoner's dilemma game can be used as a model for many real world situations
involving cooperative behavior. In casual usage, the label "prisoner's
dilemma" may be applied to situations not strictly matching the formal
criteria of the classic or iterative games: for instance, those in which
two entities could gain important benefits from cooperating or suffer
from the failure to do so, but find it difficult or expensive—not
necessarily impossible—to coordinate their activities.
Strategy for the prisoner's dilemma
Two prisoners are separated into individual rooms and cannot communicate with each other.
The normal game is shown below:
Prisoner B
Prisoner A
Prisoner B stays silent (cooperates)
Prisoner B betrays (defects)
Prisoner A stays silent (cooperates)
Each serves 1 year
Prisoner A: 3 years Prisoner B: goes free
Prisoner A betrays (defects)
Prisoner A: goes free Prisoner B: 3 years
Each serves 2 years
It is assumed that both prisoners understand the nature of the game,
have no loyalty to each other, and will have no opportunity for
retribution or reward outside the game. Regardless of what the other
decides, each prisoner gets a higher reward by betraying the other
("defecting"). The reasoning involves an argument by dilemma:
B will either cooperate or defect. If B cooperates, A should defect,
because going free is better than serving 1 year. If B defects, A should
also defect, because serving 2 years is better than serving 3. So
either way, A should defect. Parallel reasoning will show that B should
defect.
Because defection always results in a better payoff than cooperation regardless of the other player's choice, it is a dominant strategy. Mutual defection is the only strong Nash equilibrium
in the game (i.e. the only outcome from which each player could only do
worse by unilaterally changing strategy). The dilemma, then, is that
mutual cooperation yields a better outcome than mutual defection but is
not the rational outcome because the choice to cooperate, from a
self-interested perspective, is irrational.
Generalized form
The
structure of the traditional prisoner's dilemma can be generalized from
its original prisoner setting. Suppose that the two players are
represented by the colors red and blue, and that each player chooses to
either "cooperate" or "defect".
If both players cooperate, they both receive the reward R for cooperating. If both players defect, they both receive the punishment payoff P. If Blue defects while Red cooperates, then Blue receives the temptation payoff T, while Red receives the "sucker's" payoff, S. Similarly, if Blue cooperates while Red defects, then Blue receives the sucker's payoff S, while Red receives the temptation payoff T.
and to be a prisoner's dilemma game in the strong sense, the following condition must hold for the payoffs:
The payoff relationship implies that mutual cooperation is superior to mutual defection, while the payoff relationships and imply that defection is the dominant strategy for both agents.
Special case: donation game
The "donation game" is a form of prisoner's dilemma in which cooperation corresponds to offering the other player a benefit b at a personal cost c with b > c. Defection means offering nothing. The payoff matrix is thus
Red
Blue
Cooperate
Defect
Cooperate
b−c
b−c
b
−c
Defect
−c
b
0
0
Note that (i.e. ) which qualifies the donation game to be an iterated game (see next section).
The donation game may be applied to markets. Suppose X grows oranges, Y grows apples. The marginal utility of an apple to the orange-grower X is b, which is higher than the marginal utility (c)
of an orange, since X has a surplus of oranges and no apples.
Similarly, for apple-grower Y, the marginal utility of an orange is b while the marginal utility of an apple is c.
If X and Y contract to exchange an apple and an orange, and each
fulfills their end of the deal, then each receive a payoff of b-c. If one "defects" and does not deliver as promised, the defector will receive a payoff of b, while the cooperator will lose c. If both defect, then neither one gains or loses anything.
The iterated prisoner's dilemma
If two players play prisoner's dilemma more than once in succession
and they remember previous actions of their opponent and change their
strategy accordingly, the game is called iterated prisoner's dilemma.
In addition to the general form above, the iterative version also requires that , to prevent alternating cooperation and defection giving a greater reward than mutual cooperation.
The iterated prisoner's dilemma game is fundamental to some
theories of human cooperation and trust. On the assumption that the game
can model transactions between two people requiring trust, cooperative
behaviour in populations may be modeled by a multi-player, iterated,
version of the game. It has, consequently, fascinated many scholars over
the years. In 1975, Grofman and Pool estimated the count of scholarly
articles devoted to it at over 2,000. The iterated prisoner's dilemma
has also been referred to as the "peace-war game".
If the game is played exactly N times and both players know this, then it is optimal to defect in all rounds. The only possible Nash equilibrium is to always defect. The proof is inductive:
one might as well defect on the last turn, since the opponent will not
have a chance to later retaliate. Therefore, both will defect on the
last turn. Thus, the player might as well defect on the second-to-last
turn, since the opponent will defect on the last no matter what is done,
and so on. The same applies if the game length is unknown but has a
known upper limit.
Unlike the standard prisoner's dilemma, in the iterated
prisoner's dilemma the defection strategy is counter-intuitive and fails
badly to predict the behavior of human players. Within standard
economic theory, though, this is the only correct answer. The superrational strategy in the iterated prisoner's dilemma with fixed N is to cooperate against a superrational opponent, and in the limit of large N, experimental results on strategies agree with the superrational version, not the game-theoretic rational one.
For cooperation to emerge between game theoretic rational players, the total number of rounds N
must be unknown to the players. In this case "always defect" may no
longer be a strictly dominant strategy, only a Nash equilibrium. Amongst
results shown by Robert Aumann in a 1959 paper, rational players repeatedly interacting for indefinitely long games can sustain the cooperative outcome.
According to a 2019 experimental study in the American Economic Review
which tested what strategies real-life subjects used in iterated
prisoners' dilemma situations with perfect monitoring, the majority of
chosen strategies were always defect, tit-for-tat, and grim trigger. Which strategy the subjects chose depended on the parameters of the game.
Strategy for the iterated prisoner's dilemma
Interest in the iterated prisoner's dilemma (IPD) was kindled by Robert Axelrod in his book The Evolution of Cooperation (1984). In it he reports on a tournament he organized of the N step prisoner's dilemma (with N
fixed) in which participants have to choose their mutual strategy again
and again, and have memory of their previous encounters. Axelrod
invited academic colleagues all over the world to devise computer
strategies to compete in an IPD tournament. The programs that were
entered varied widely in algorithmic complexity, initial hostility,
capacity for forgiveness, and so forth.
Axelrod discovered that when these encounters were repeated over a
long period of time with many players, each with different strategies,
greedy strategies tended to do very poorly in the long run while more altruistic
strategies did better, as judged purely by self-interest. He used this
to show a possible mechanism for the evolution of altruistic behaviour
from mechanisms that are initially purely selfish, by natural selection.
The winning deterministic strategy was tit for tat, which Anatol Rapoport developed and entered into the tournament. It was the simplest of any program entered, containing only four lines of BASIC,
and won the contest. The strategy is simply to cooperate on the first
iteration of the game; after that, the player does what his or her
opponent did on the previous move. Depending on the situation, a
slightly better strategy can be "tit for tat with forgiveness". When the
opponent defects, on the next move, the player sometimes cooperates
anyway, with a small probability (around 1–5%). This allows for
occasional recovery from getting trapped in a cycle of defections. The
exact probability depends on the line-up of opponents.
By analysing the top-scoring strategies, Axelrod stated several conditions necessary for a strategy to be successful.
Nice
The most important condition is that the strategy must be "nice",
that is, it will not defect before its opponent does (this is sometimes
referred to as an "optimistic" algorithm). Almost all of the top-scoring
strategies were nice; therefore, a purely selfish strategy will not
"cheat" on its opponent, for purely self-interested reasons first.
Retaliating
However, Axelrod contended, the successful strategy must not be a
blind optimist. It must sometimes retaliate. An example of a
non-retaliating strategy is Always Cooperate. This is a very bad choice,
as "nasty" strategies will ruthlessly exploit such players.
Forgiving
Successful strategies must also be forgiving. Though players will
retaliate, they will once again fall back to cooperating if the opponent
does not continue to defect. This stops long runs of revenge and
counter-revenge, maximizing points.
Non-envious
The last quality is being non-envious, that is not striving to score more than the opponent.
The optimal (points-maximizing) strategy for the one-time PD game is
simply defection; as explained above, this is true whatever the
composition of opponents may be. However, in the iterated-PD game the
optimal strategy depends upon the strategies of likely opponents, and
how they will react to defections and cooperations. For example,
consider a population where everyone defects every time, except for a
single individual following the tit for tat strategy. That individual is
at a slight disadvantage because of the loss on the first turn. In such
a population, the optimal strategy for that individual is to defect
every time. In a population with a certain percentage of
always-defectors and the rest being tit for tat players, the optimal
strategy for an individual depends on the percentage, and on the length
of the game.
In the strategy called Pavlov, win-stay, lose-switch, faced with a failure to cooperate, the player switches strategy the next turn. In certain circumstances, Pavlov beats all other strategies by giving preferential treatment to co-players using a similar strategy.
Deriving the optimal strategy is generally done in two ways:
Bayesian Nash equilibrium:
If the statistical distribution of opposing strategies can be
determined (e.g. 50% tit for tat, 50% always cooperate) an optimal
counter-strategy can be derived analytically.
Monte Carlo simulations of populations have been made, where individuals with low scores die off, and those with high scores reproduce (a genetic algorithm
for finding an optimal strategy). The mix of algorithms in the final
population generally depends on the mix in the initial population. The
introduction of mutation (random variation during reproduction) lessens
the dependency on the initial population; empirical experiments with
such systems tend to produce tit for tat players (see for instance Chess
1988), but no analytic proof exists that this will always occur.
Although tit for tat is considered to be the most robust basic strategy, a team from Southampton University
in England introduced a new strategy at the 20th-anniversary iterated
prisoner's dilemma competition, which proved to be more successful than
tit for tat. This strategy relied on collusion between programs to
achieve the highest number of points for a single program. The
university submitted 60 programs to the competition, which were designed
to recognize each other through a series of five to ten moves at the
start.
Once this recognition was made, one program would always cooperate and
the other would always defect, assuring the maximum number of points for
the defector. If the program realized that it was playing a
non-Southampton player, it would continuously defect in an attempt to
minimize the score of the competing program. As a result, the 2004
Prisoners' Dilemma Tournament results show University of Southampton's
strategies in the first three places, despite having fewer wins and
many more losses than the GRIM strategy. (In a PD tournament, the aim of
the game is not to "win" matches – that can easily be achieved by
frequent defection). Also, even without implicit collusion between software strategies
(exploited by the Southampton team) tit for tat is not always the
absolute winner of any given tournament; it would be more precise to say
that its long run results over a series of tournaments outperform its
rivals. (In any one event a given strategy can be slightly better
adjusted to the competition than tit for tat, but tit for tat is more
robust). The same applies for the tit for tat with forgiveness variant,
and other optimal strategies: on any given day they might not "win"
against a specific mix of counter-strategies. An alternative way of
putting it is using the Darwinian ESS
simulation. In such a simulation, tit for tat will almost always come
to dominate, though nasty strategies will drift in and out of the
population because a tit for tat population is penetrable by
non-retaliating nice strategies, which in turn are easy prey for the
nasty strategies. Richard Dawkins
showed that here, no static mix of strategies form a stable equilibrium
and the system will always oscillate between bounds.}} this strategy
ended up taking the top three positions in the competition, as well as a
number of positions towards the bottom.
This strategy takes advantage of the fact that multiple entries
were allowed in this particular competition and that the performance of a
team was measured by that of the highest-scoring player (meaning that
the use of self-sacrificing players was a form of minmaxing).
In a competition where one has control of only a single player, tit for
tat is certainly a better strategy. Because of this new rule, this
competition also has little theoretical significance when analyzing
single agent strategies as compared to Axelrod's seminal tournament.
However, it provided a basis for analysing how to achieve cooperative
strategies in multi-agent frameworks, especially in the presence of
noise. In fact, long before this new-rules tournament was played,
Dawkins, in his book The Selfish Gene,
pointed out the possibility of such strategies winning if multiple
entries were allowed, but he remarked that most probably Axelrod would
not have allowed them if they had been submitted. It also relies on
circumventing rules about the prisoner's dilemma in that there is no
communication allowed between the two players, which the Southampton
programs arguably did with their opening "ten move dance" to recognize
one another; this only reinforces just how valuable communication can be
in shifting the balance of the game.
Stochastic iterated prisoner's dilemma
In a stochastic iterated prisoner's dilemma game, strategies are specified by in terms of "cooperation probabilities". In an encounter between player X and player Y, X 's strategy is specified by a set of probabilities P of cooperating with Y. P is a function of the outcomes of their previous encounters or some subset thereof. If P is a function of only their most recent n encounters, it is called a "memory-n" strategy. A memory-1 strategy is then specified by four cooperation probabilities:
, where is the probability that X
will cooperate in the present encounter given that the previous
encounter was characterized by (ab). For example, if the previous
encounter was one in which X cooperated and Y defected, then is the probability that X
will cooperate in the present encounter. If each of the probabilities
are either 1 or 0, the strategy is called deterministic. An example of a
deterministic strategy is the tit for tat strategy written as P={1,0,1,0}, in which X responds as Y did in the previous encounter. Another is the win–stay, lose–switch strategy written as P={1,0,0,1}, in which X
responds as in the previous encounter, if it was a "win" (i.e. cc or
dc) but changes strategy if it was a loss (i.e. cd or dd). It has been
shown that for any memory-n strategy there is a corresponding memory-1
strategy which gives the same statistical results, so that only memory-1
strategies need be considered.
If we define P as the above 4-element strategy vector of X and as the 4-element strategy vector of Y, a transition matrix M may be defined for X whose ij th entry is the probability that the outcome of a particular encounter between X and Y will be j given that the previous encounter was i, where i and j are one of the four outcome indices: cc, cd, dc, or dd. For example, from X 's point of view, the probability that the outcome of the present encounter is cd given that the previous encounter was cd is equal to . (The indices for Q are from Y 's point of view: a cd outcome for X is a dc outcome for Y.) Under these definitions, the iterated prisoner's dilemma qualifies as a stochastic process and M is a stochastic matrix, allowing all of the theory of stochastic processes to be applied.
One result of stochastic theory is that there exists a stationary vector v for the matrix M such that . Without loss of generality, it may be specified that v is normalized so that the sum of its four components is unity. The ij th entry in will give the probability that the outcome of an encounter between X and Y will be j given that the encounter n steps previous is i. In the limit as n approaches infinity, M will converge to a matrix with fixed values, giving the long-term probabilities of an encounter producing j which will be independent of i. In other words, the rows of
will be identical, giving the long-term equilibrium result
probabilities of the iterated prisoners dilemma without the need to
explicitly evaluate a large number of interactions. It can be seen that v is a stationary vector for and particularly , so that each row of will be equal to v. Thus the stationary vector specifies the equilibrium outcome probabilities for X. Defining and as the short-term payoff vectors for the {cc,cd,dc,dd} outcomes (From X 's point of view), the equilibrium payoffs for X and Y can now be specified as and , allowing the two strategies P and Q to be compared for their long term payoffs.
The
relationship between zero-determinant (ZD), cooperating and defecting
strategies in the iterated prisoner's dilemma (IPD) illustrated in a Venn diagram.
Cooperating strategies always cooperate with other cooperating
strategies, and defecting strategies always defect against other
defecting strategies. Both contain subsets of strategies that are robust
under strong selection, meaning no other memory-1 strategy is selected
to invade such strategies when they are resident in a population. Only
cooperating strategies contain a subset that are always robust, meaning
that no other memory-1 strategy is selected to invade and replace such
strategies, under both strong and weak selection.
The intersection between ZD and good cooperating strategies is the set
of generous ZD strategies. Extortion strategies are the intersection
between ZD and non-robust defecting strategies. Tit-for-tat lies at the
intersection of cooperating, defecting and ZD strategies.
In 2012, William H. Press and Freeman Dyson published a new class of strategies for the stochastic iterated prisoner's dilemma called "zero-determinant" (ZD) strategies. The long term payoffs for encounters between X and Y can be expressed as the determinant of a matrix which is a function of the two strategies and the short term payoff vectors: and , which do not involve the stationary vector v. Since the determinant function is linear in f, it follows that (where U={1,1,1,1}). Any strategies for which is by definition a ZD strategy, and the long term payoffs obey the relation .
Tit-for-tat is a ZD strategy which is "fair" in the sense of not
gaining advantage over the other player. However, the ZD space also
contains strategies that, in the case of two players, can allow one
player to unilaterally set the other player's score or alternatively,
force an evolutionary player to achieve a payoff some percentage lower
than his own. The extorted player could defect but would thereby hurt
himself by getting a lower payoff. Thus, extortion solutions turn the
iterated prisoner's dilemma into a sort of ultimatum game. Specifically, X is able to choose a strategy for which , unilaterally setting to a specific value within a particular range of values, independent of Y 's strategy, offering an opportunity for X to "extort" player Y (and vice versa). (It turns out that if X tries to set
to a particular value, the range of possibilities is much smaller, only
consisting of complete cooperation or complete defection.)
An extension of the IPD is an evolutionary stochastic IPD, in
which the relative abundance of particular strategies is allowed to
change, with more successful strategies relatively increasing. This
process may be accomplished by having less successful players imitate
the more successful strategies, or by eliminating less successful
players from the game, while multiplying the more successful ones. It
has been shown that unfair ZD strategies are not evolutionarily stable.
The key intuition is that an evolutionarily stable strategy must not
only be able to invade another population (which extortionary ZD
strategies can do) but must also perform well against other players of
the same type (which extortionary ZD players do poorly, because they
reduce each other's surplus).
Theory and simulations confirm that beyond a critical population
size, ZD extortion loses out in evolutionary competition against more
cooperative strategies, and as a result, the average payoff in the
population increases when the population is larger. In addition, there
are some cases in which extortioners may even catalyze cooperation by
helping to break out of a face-off between uniform defectors and win–stay, lose–switch agents.
While extortionary ZD strategies are not stable in large populations, another ZD class called "generous" strategies is
both stable and robust. In fact, when the population is not too small,
these strategies can supplant any other ZD strategy and even perform
well against a broad array of generic strategies for iterated prisoner's
dilemma, including win–stay, lose–switch. This was proven specifically
for the donation game by Alexander Stewart and Joshua Plotkin in 2013.
Generous strategies will cooperate with other cooperative players, and
in the face of defection, the generous player loses more utility than
its rival. Generous strategies are the intersection of ZD strategies and
so-called "good" strategies, which were defined by Akin (2013)
to be those for which the player responds to past mutual cooperation
with future cooperation and splits expected payoffs equally if he
receives at least the cooperative expected payoff. Among good
strategies, the generous (ZD) subset performs well when the population
is not too small. If the population is very small, defection strategies
tend to dominate.
Continuous iterated prisoner's dilemma
Most
work on the iterated prisoner's dilemma has focused on the discrete
case, in which players either cooperate or defect, because this model is
relatively simple to analyze. However, some researchers have looked at
models of the continuous iterated prisoner's dilemma, in which players
are able to make a variable contribution to the other player. Le and
Boyd
found that in such situations, cooperation is much harder to evolve
than in the discrete iterated prisoner's dilemma. The basic intuition
for this result is straightforward: in a continuous prisoner's dilemma,
if a population starts off in a non-cooperative equilibrium, players who
are only marginally more cooperative than non-cooperators get little
benefit from assorting
with one another. By contrast, in a discrete prisoner's dilemma, tit
for tat cooperators get a big payoff boost from assorting with one
another in a non-cooperative equilibrium, relative to non-cooperators.
Since nature arguably offers more opportunities for variable cooperation
rather than a strict dichotomy of cooperation or defection, the
continuous prisoner's dilemma may help explain why real-life examples of
tit for tat-like cooperation are extremely rare in nature (ex.
Hammerstein) even though tit for tat seems robust in theoretical models.
Emergence of stable strategies
Players
cannot seem to coordinate mutual cooperation, thus often get locked
into the inferior yet stable strategy of defection. In this way,
iterated rounds facilitate the evolution of stable strategies.
Iterated rounds often produce novel strategies, which have implications
to complex social interaction. One such strategy is win-stay
lose-shift. This strategy outperforms a simple Tit-For-Tat strategy –
that is, if you can get away with cheating, repeat that behavior,
however if you get caught, switch.
The only problem of this tit-for-tat strategy is that they are
vulnerable to signal error. The problem arises when one individual
cheats in retaliation but the other interprets it as cheating. As a
result of this, the second individual now cheats and then it starts a
see-saw pattern of cheating in a chain reaction.
Real-life examples
The
prisoner setting may seem contrived, but there are in fact many
examples in human interaction as well as interactions in nature that
have the same payoff matrix. The prisoner's dilemma is therefore of
interest to the social sciences such as economics, politics, and sociology, as well as to the biological sciences such as ethology and evolutionary biology.
Many natural processes have been abstracted into models in which living
beings are engaged in endless games of prisoner's dilemma. This wide
applicability of the PD gives the game its substantial importance.
Environmental studies
In environmental studies, the PD is evident in crises such as global climate-change. It is argued all countries will benefit from a stable climate, but any single country is often hesitant to curb CO 2
emissions. The immediate benefit to any one country from maintaining
current behavior is wrongly perceived to be greater than the purported
eventual benefit to that country if all countries' behavior was changed,
therefore explaining the impasse concerning climate-change in 2007.
An important difference between climate-change politics and the
prisoner's dilemma is uncertainty; the extent and pace at which
pollution can change climate is not known. The dilemma faced by
government is therefore different from the prisoner's dilemma in that
the payoffs of cooperation are unknown. This difference suggests that
states will cooperate much less than in a real iterated prisoner's
dilemma, so that the probability of avoiding a possible climate
catastrophe is much smaller than that suggested by a game-theoretical
analysis of the situation using a real iterated prisoner's dilemma.
Osang and Nandy (2003) provide a theoretical explanation with
proofs for a regulation-driven win-win situation along the lines of Michael Porter's hypothesis, in which government regulation of competing firms is substantial.
Animals
Cooperative
behavior of many animals can be understood as an example of the
prisoner's dilemma. Often animals engage in long term partnerships,
which can be more specifically modeled as iterated prisoner's dilemma.
For example, guppies inspect predators cooperatively in groups, and they are thought to punish non-cooperative inspectors.
Vampire bats
are social animals that engage in reciprocal food exchange. Applying
the payoffs from the prisoner's dilemma can help explain this behavior:
C/C: "Reward: I get blood on my unlucky nights, which saves me
from starving. I have to give blood on my lucky nights, which doesn't
cost me too much."
D/C: "Temptation: You save my life on my poor night. But then I get
the added benefit of not having to pay the slight cost of feeding you on
my good night."
C/D: "Sucker's Payoff: I pay the cost of saving your life on my good
night. But on my bad night you don't feed me and I run a real risk of
starving to death."
D/D: "Punishment: I don't have to pay the slight costs of feeding
you on my good nights. But I run a real risk of starving on my poor
nights."
Psychology
In addiction research / behavioral economics, George Ainslie points out that addiction can be cast as an intertemporal PD problem between the present and future selves of the addict. In this case, defecting means relapsing,
and it is easy to see that not defecting both today and in the future
is by far the best outcome. The case where one abstains today but
relapses in the future is the worst outcome – in some sense the
discipline and self-sacrifice involved in abstaining today have been
"wasted" because the future relapse means that the addict is right back
where he started and will have to start over (which is quite
demoralizing, and makes starting over more difficult). Relapsing today
and tomorrow is a slightly "better" outcome, because while the addict is
still addicted, they haven't put the effort in to trying to stop. The
final case, where one engages in the addictive behavior today while
abstaining "tomorrow" will be familiar to anyone who has struggled with
an addiction. The problem here is that (as in other PDs) there is an
obvious benefit to defecting "today", but tomorrow one will face the
same PD, and the same obvious benefit will be present then, ultimately
leading to an endless string of defections.
John Gottman
in his research described in "the science of trust" defines good
relationships as those where partners know not to enter the (D,D) cell
or at least not to get dynamically stuck there in a loop.
Economics
The prisoner's dilemma has been called the E. coli of social psychology, and it has been used widely to research various topics such as oligopolistic competition and collective action to produce a collective good.
Advertising is sometimes cited as a real-example of the prisoner's dilemma. When cigarette advertising
was legal in the United States, competing cigarette manufacturers had
to decide how much money to spend on advertising. The effectiveness of
Firm A's advertising was partially determined by the advertising
conducted by Firm B. Likewise, the profit derived from advertising for
Firm B is affected by the advertising conducted by Firm A. If both Firm
A and Firm B chose to advertise during a given period, then the
advertisement from each firm negates the other's, receipts remain
constant, and expenses increase due to the cost of advertising. Both
firms would benefit from a reduction in advertising. However, should
Firm B choose not to advertise, Firm A could benefit greatly by
advertising. Nevertheless, the optimal amount of advertising by one firm
depends on how much advertising the other undertakes. As the best
strategy is dependent on what the other firm chooses there is no
dominant strategy, which makes it slightly different from a prisoner's
dilemma. The outcome is similar, though, in that both firms would be
better off were they to advertise less than in the equilibrium.
Sometimes cooperative behaviors do emerge in business situations. For
instance, cigarette manufacturers endorsed the making of laws banning
cigarette advertising, understanding that this would reduce costs and
increase profits across the industry. This analysis is likely to be pertinent in many other business situations involving advertising.
Without enforceable agreements, members of a cartel are also involved in a (multi-player) prisoner's dilemma.
'Cooperating' typically means keeping prices at a pre-agreed minimum
level. 'Defecting' means selling under this minimum level, instantly
taking business (and profits) from other cartel members. Anti-trust authorities want potential cartel members to mutually defect, ensuring the lowest possible prices for consumers.
Sport
Doping in sport has been cited as an example of a prisoner's dilemma.
Two competing athletes have the option to use an illegal and/or
dangerous drug to boost their performance. If neither athlete takes the
drug, then neither gains an advantage. If only one does, then that
athlete gains a significant advantage over their competitor, reduced by
the legal and/or medical dangers of having taken the drug. If both
athletes take the drug, however, the benefits cancel out and only the
dangers remain, putting them both in a worse position than if neither
had used doping.
International politics
In international political theory, the Prisoner's Dilemma is often used to demonstrate the coherence of strategic realism,
which holds that in international relations, all states (regardless of
their internal policies or professed ideology), will act in their
rational self-interest given international anarchy. A classic example is an arms race like the Cold War and similar conflicts. During the Cold War the opposing alliances of NATO and the Warsaw Pact
both had the choice to arm or disarm. From each side's point of view,
disarming whilst their opponent continued to arm would have led to
military inferiority and possible annihilation. Conversely, arming
whilst their opponent disarmed would have led to superiority. If both
sides chose to arm, neither could afford to attack the other, but both
incurred the high cost of developing and maintaining a nuclear arsenal.
If both sides chose to disarm, war would be avoided and there would be
no costs.
Although the 'best' overall outcome is for both sides to disarm,
the rational course for both sides is to arm, and this is indeed what
happened. Both sides poured enormous resources into military research
and armament in a war of attrition for the next thirty years until the Soviet Union could not withstand the economic cost. The same logic could be applied in any similar scenario, be it economic or technological competition between sovereign states.
Multiplayer dilemmas
Many real-life dilemmas involve multiple players. Although metaphorical, Hardin'stragedy of the commons
may be viewed as an example of a multi-player generalization of the PD:
Each villager makes a choice for personal gain or restraint. The
collective reward for unanimous (or even frequent) defection is very low
payoffs (representing the destruction of the "commons"). A commons
dilemma most people can relate to is washing the dishes in a shared
house. By not washing dishes an individual can gain by saving his time,
but if that behavior is adopted by every resident the collective cost
is no clean plates for anyone.
The commons are not always exploited: William Poundstone,
in a book about the prisoner's dilemma, describes a situation in New
Zealand where newspaper boxes are left unlocked. It is possible for
people to take a paper without paying (defecting) but very few do, feeling that if they do not pay then neither will others, destroying the system. Subsequent research by Elinor Ostrom, winner of the 2009 Nobel Memorial Prize in Economic Sciences,
hypothesized that the tragedy of the commons is oversimplified, with
the negative outcome influenced by outside influences. Without
complicating pressures, groups communicate and manage the commons among
themselves for their mutual benefit, enforcing social norms to preserve
the resource and achieve the maximum good for the group, an example of
effecting the best case outcome for PD.
Related games
Closed-bag exchange
The prisoner's dilemma as a briefcase exchange
Douglas Hofstadter
once suggested that people often find problems such as the PD problem
easier to understand when it is illustrated in the form of a simple
game, or trade-off. One of several examples he used was "closed bag
exchange":
Two people meet and exchange closed
bags, with the understanding that one of them contains money, and the
other contains a purchase. Either player can choose to honor the deal by
putting into his or her bag what he or she agreed, or he or she can
defect by handing over an empty bag.
Defection always gives a game-theoretically preferable outcome.
Friend or Foe?
Friend or Foe? is a game show that aired from 2002 to 2003 on the Game Show Network
in the US. It is an example of the prisoner's dilemma game tested on
real people, but in an artificial setting. On the game show, three pairs
of people compete. When a pair is eliminated, they play a game similar
to the prisoner's dilemma to determine how the winnings are split. If
they both cooperate (Friend), they share the winnings 50–50. If one
cooperates and the other defects (Foe), the defector gets all the
winnings and the cooperator gets nothing. If both defect, both leave
with nothing. Notice that the reward matrix is slightly different from
the standard one given above, as the rewards for the "both defect" and
the "cooperate while the opponent defects" cases are identical. This
makes the "both defect" case a weak equilibrium, compared with being a
strict equilibrium in the standard prisoner's dilemma. If a contestant
knows that their opponent is going to vote "Foe", then their own choice
does not affect their own winnings. In a specific sense, Friend or Foe has a rewards model between prisoner's dilemma and the game of Chicken.
The rewards matrix is
Pair 2
Pair 1
"Friend" (cooperate)
"Foe" (defect)
"Friend" (cooperate)
1
1
2
0
"Foe" (defect)
0
2
0
0
This payoff matrix has also been used on the Britishtelevision programmes Trust Me, Shafted, The Bank Job and Golden Balls, and on the American game shows Take It All, as well as for the winning couple on the Reality Show shows Bachelor Pad. Game data from the Golden Balls
series has been analyzed by a team of economists, who found that
cooperation was "surprisingly high" for amounts of money that would seem
consequential in the real world, but were comparatively low in the
context of the game.
Iterated snowdrift
Researchers from the University of Lausanne and the University of Edinburgh
have suggested that the "Iterated Snowdrift Game" may more closely
reflect real-world social situations. Although this model is actually a chicken game,
it will be described here. In this model, the risk of being exploited
through defection is lower, and individuals always gain from taking the
cooperative choice. The snowdrift game imagines two drivers who are
stuck on opposite sides of a snowdrift,
each of whom is given the option of shoveling snow to clear a path, or
remaining in their car. A player's highest payoff comes from leaving the
opponent to clear all the snow by themselves, but the opponent is still
nominally rewarded for their work.
This may better reflect real world scenarios, the researchers
giving the example of two scientists collaborating on a report, both of
whom would benefit if the other worked harder. "But when your
collaborator doesn’t do any work, it’s probably better for you to do all
the work yourself. You’ll still end up with a completed project."
Example snowdrift payouts (A, B)
B
A
Cooperates
Defects
Cooperates
200, 200
100, 300
Defects
300, 100
0, 0
Example PD payouts (A, B)
B
A
Cooperates
Defects
Cooperates
200, 200
-100, 300
Defects
300, -100
0, 0
Coordination games
In coordination games, players must coordinate their strategies for a
good outcome. An example is two cars that abruptly meet in a blizzard;
each must choose whether to swerve left or right. If both swerve left,
or both right, the cars do not collide. The local left- and right-hand traffic convention helps to co-ordinate their actions.
A
more general set of games are asymmetric. As in the prisoner's dilemma,
the best outcome is co-operation, and there are motives for defection.
Unlike the symmetric prisoner's dilemma, though, one player has more to
lose and/or more to gain than the other. Some such games have been
described as a prisoner's dilemma in which one prisoner has an alibi, whence the term "alibi game".
In experiments, players getting unequal payoffs in repeated games
may seek to maximize profits, but only under the condition that both
players receive equal payoffs; this may lead to a stable equilibrium
strategy in which the disadvantaged player defects every X games, while
the other always co-operates. Such behaviour may depend on the
experiment's social norms around fairness.
Software
Several
software packages have been created to run prisoner's dilemma
simulations and tournaments, some of which have available source code.
The source code for the second tournament run by Robert Axelrod (written by Axelrod and many contributors in Fortran) is available online
Prison, a library written in Java, last updated in 1998
Hannu Rajaniemi set the opening scene of his The Quantum Thief
trilogy in a "dilemma prison". The main theme of the series has been
described as the "inadequacy of a binary universe" and the ultimate
antagonist is a character called the All-Defector. Rajaniemi is
particularly interesting as an artist treating this subject in that he
is a Cambridge-trained mathematician and holds a PhD in mathematical physics –
the interchangeability of matter and information is a major feature of
the books, which take place in a "post-singularity" future. The first
book in the series was published in 2010, with the two sequels, The Fractal Prince and The Causal Angel, published in 2012 and 2014, respectively.
In The Mysterious Benedict Society and the Prisoner's Dilemma by Trenton Lee Stewart,
the main characters start by playing a version of the game and escaping
from the "prison" altogether. Later they become actual prisoners and
escape once again.
In The Adventure Zone: Balance during The Suffering Game
subarc, the player characters are twice presented with the prisoner's
dilemma during their time in two liches' domain, once cooperating and
once defecting.
In the 8th novel from the author James S. A. Corey Tiamat's Wrath, Winston Duarte explains the prisoners dilemma to his 14-year-old daughter, Teresa, to train her in strategic thinking.[citation needed]
This is examined literally in the 2019 film The Platform,
where inmates in a vertical prison may only eat whatever is left over
by those above them. If everyone were to eat their fair share, there
would be enough food, but those in the lower levels are shown to starve
because of the higher inmates' overconsumption.