A Medley of Potpourri

Saturday, November 17, 2018

Prisoner's dilemma

From Wikipedia, the free encyclopedia

Prisoner's dilemma payoff matrix
B A	B stays silent	B betrays
A stays silent	-1 -1	0 -3
A betrays	-3 0	-2 -2

The prisoner's dilemma is a standard example of a game analyzed in game theory that shows why two completely rational individuals might not cooperate, even if it appears that it is in their best interests to do so. It was originally framed by Merrill Flood and Melvin Dresher while working at RAND in 1950. Albert W. Tucker formalized the game with prison sentence rewards and named it "prisoner's dilemma", presenting it as follows:

Two members of a criminal gang are arrested and imprisoned. Each prisoner is in solitary confinement with no means of communicating with the other. The prosecutors lack sufficient evidence to convict the pair on the principal charge, but they have enough to convict both on a lesser charge. Simultaneously, the prosecutors offer each prisoner a bargain. Each prisoner is given the opportunity either to betray the other by testifying that the other committed the crime, or to cooperate with the other by remaining silent.

The offer is:

If A and B each betray the other, each of them serves two years in prison
If A betrays B but B remains silent, A will be set free and B will serve three years in prison (and vice versa)
If A and B both remain silent, both of them will only serve one year in prison (on the lesser charge).

It is implied that the prisoners will have no opportunity to reward or punish their partner other than the prison sentences they get and that their decision will not affect their reputation in the future. Because betraying a partner offers a greater reward than cooperating with them, all purely rational self-interested prisoners will betray the other, meaning the only possible outcome for two purely rational prisoners is for them to betray each other. The interesting part of this result is that pursuing individual reward logically leads both of the prisoners to betray when they would get a better reward if they both kept silent. In reality, humans display a systemic bias towards cooperative behavior in this and similar games despite what is predicted by simple models of "rational" self-interested action.

An extended "iterated" version of the game also exists. In this version, the classic game is played repeatedly between the same prisoners, who continuously have the opportunity to penalize the other for previous decisions. If the number of times the game will be played is known to the players, then (by backward induction) two classically rational players will betray each other repeatedly, for the same reasons as the single-shot variant. In an infinite or unknown length game there is no fixed optimum strategy, and prisoner's dilemma tournaments have been held to compete and test algorithms for such cases.

The prisoner's dilemma game can be used as a model for many real world situations involving cooperative behavior. In casual usage, the label "prisoner's dilemma" may be applied to situations not strictly matching the formal criteria of the classic or iterative games: for instance, those in which two entities could gain important benefits from cooperating or suffer from the failure to do so, but find it difficult or expensive—not necessarily impossible—to coordinate their activities.

Strategy for the prisoner's dilemma

Two prisoners are separated into individual rooms and cannot communicate with each other. The normal game is shown below:

Prisoner B Prisoner A	Prisoner B stays silent (cooperates)	Prisoner B betrays (defects)
Prisoner A stays silent (cooperates)	Each serves 1 year	Prisoner A: 3 years Prisoner B: goes free
Prisoner A betrays (defects)	Prisoner A: goes free Prisoner B: 3 years	Each serves 2 years

It is assumed that both prisoners understand the nature of the game, have no loyalty to each other, and will have no opportunity for retribution or reward outside the game. Regardless of what the other decides, each prisoner gets a higher reward by betraying the other ("defecting"). The reasoning involves an argument by dilemma: B will either cooperate or defect. If B cooperates, A should defect, because going free is better than serving 1 year. If B defects, A should also defect, because serving 2 years is better than serving 3. So either way, A should defect. Parallel reasoning will show that B should defect.

Because defection always results in a better payoff than cooperation regardless of the other player's choice, it is a dominant strategy. Mutual defection is the only strong Nash equilibrium in the game (i.e. the only outcome from which each player could only do worse by unilaterally changing strategy). The dilemma, then, is that mutual cooperation yields a better outcome than mutual defection but is not the rational outcome because the choice to cooperate, from a self-interested perspective, is irrational.

Generalized form

The structure of the traditional Prisoner's Dilemma can be generalized from its original prisoner setting. Suppose that the two players are represented by the colors, red and blue, and that each player chooses to either "Cooperate" or "Defect".

If both players cooperate, they both receive the reward R for cooperating. If both players defect, they both receive the punishment payoff P. If Blue defects while Red cooperates, then Blue receives the temptation payoff T, while Red receives the "sucker's" payoff, S. Similarly, if Blue cooperates while Red defects, then Blue receives the sucker's payoff S, while Red receives the temptation payoff T.

This can be expressed in normal form:

Canonical PD payoff matrix
Red Blue	Cooperate	Defect
Cooperate	R R	T S
Defect	S T	P P

and to be a prisoner's dilemma game in the strong sense, the following condition must hold for the payoffs:

T > R > P > S

The payoff relationship R > P implies that mutual cooperation is superior to mutual defection, while the payoff relationships T > R and P > S imply that defection is the dominant strategy for both agents.

Special case: Donation game

The "donation game" is a form of prisoner's dilemma in which cooperation corresponds to offering the other player a benefit b at a personal cost c with b > c. Defection means offering nothing. The payoff matrix is thus

Red Blue	Cooperate	Defect
Cooperate	b-c b-c	b -c
Defect	-c b	0 0

Note that

2R>T+S

(i.e.

2(b-c)>b-c

) which qualifies the donation game to be an iterated game (see next section).

The donation game may be applied to markets. Suppose X grows oranges, Y grows apples. The marginal utility of an apple to the orange-grower X is b, which is higher than the marginal utility (c) of an orange, since X has a surplus of oranges and no apples. Similarly, for apple-grower Y, the marginal utility of an orange is b while the marginal utility of an apple is c. If X and Y contract to exchange an apple and an orange, and each fulfills their end of the deal, then each receive a payoff of b-c. If one "defects" and does not deliver as promised, the defector will receive a payoff of b, while the cooperator will lose c. If both defect, then neither one gains or loses anything.

The iterated prisoner's dilemma

If two players play prisoner's dilemma more than once in succession and they remember previous actions of their opponent and change their strategy accordingly, the game is called iterated prisoner's dilemma.

In addition to the general form above, the iterative version also requires that 2R > T + S, to prevent alternating cooperation and defection giving a greater reward than mutual cooperation.

The iterated prisoner's dilemma game is fundamental to some theories of human cooperation and trust. On the assumption that the game can model transactions between two people requiring trust, cooperative behaviour in populations may be modeled by a multi-player, iterated, version of the game. It has, consequently, fascinated many scholars over the years. In 1975, Grofman and Pool estimated the count of scholarly articles devoted to it at over 2,000. The iterated prisoner's dilemma has also been referred to as the "Peace-War game".

If the game is played exactly N times and both players know this, then it is optimal to defect in all rounds. The only possible Nash equilibrium is to always defect. The proof is inductive: one might as well defect on the last turn, since the opponent will not have a chance to later retaliate. Therefore, both will defect on the last turn. Thus, the player might as well defect on the second-to-last turn, since the opponent will defect on the last no matter what is done, and so on. The same applies if the game length is unknown but has a known upper limit.

Unlike the standard prisoner's dilemma, in the iterated prisoner's dilemma the defection strategy is counter-intuitive and fails badly to predict the behavior of human players. Within standard economic theory, though, this is the only correct answer. The superrational strategy in the iterated prisoner's dilemma with fixed N is to cooperate against a superrational opponent, and in the limit of large N, experimental results on strategies agree with the superrational version, not the game-theoretic rational one.

For cooperation to emerge between game theoretic rational players, the total number of rounds N must be unknown to the players. In this case 'always defect' may no longer be a strictly dominant strategy, only a Nash equilibrium. Amongst results shown by Robert Aumann in a 1959 paper, rational players repeatedly interacting for indefinitely long games can sustain the cooperative outcome.

Strategy for the iterated prisoner's dilemma

Interest in the iterated prisoner's dilemma (IPD) was kindled by Robert Axelrod in his book The Evolution of Cooperation (1984). In it he reports on a tournament he organized of the N step prisoner's dilemma (with N fixed) in which participants have to choose their mutual strategy again and again, and have memory of their previous encounters. Axelrod invited academic colleagues all over the world to devise computer strategies to compete in an IPD tournament. The programs that were entered varied widely in algorithmic complexity, initial hostility, capacity for forgiveness, and so forth.

Axelrod discovered that when these encounters were repeated over a long period of time with many players, each with different strategies, greedy strategies tended to do very poorly in the long run while more altruistic strategies did better, as judged purely by self-interest. He used this to show a possible mechanism for the evolution of altruistic behaviour from mechanisms that are initially purely selfish, by natural selection.

The winning deterministic strategy was tit for tat, which Anatol Rapoport developed and entered into the tournament. It was the simplest of any program entered, containing only four lines of BASIC, and won the contest. The strategy is simply to cooperate on the first iteration of the game; after that, the player does what his or her opponent did on the previous move. Depending on the situation, a slightly better strategy can be "tit for tat with forgiveness". When the opponent defects, on the next move, the player sometimes cooperates anyway, with a small probability (around 1–5%). This allows for occasional recovery from getting trapped in a cycle of defections. The exact probability depends on the line-up of opponents.

By analysing the top-scoring strategies, Axelrod stated several conditions necessary for a strategy to be successful.

Nice: The most important condition is that the strategy must be "nice", that is, it will not defect before its opponent does (this is sometimes referred to as an "optimistic" algorithm). Almost all of the top-scoring strategies were nice; therefore, a purely selfish strategy will not "cheat" on its opponent, for purely self-interested reasons first.
Retaliating: However, Axelrod contended, the successful strategy must not be a blind optimist. It must sometimes retaliate. An example of a non-retaliating strategy is Always Cooperate. This is a very bad choice, as "nasty" strategies will ruthlessly exploit such players.
Forgiving: Successful strategies must also be forgiving. Though players will retaliate, they will once again fall back to cooperating if the opponent does not continue to defect. This stops long runs of revenge and counter-revenge, maximizing points.
Non-envious: The last quality is being non-envious, that is not striving to score more than the opponent.

The optimal (points-maximizing) strategy for the one-time PD game is simply defection; as explained above, this is true whatever the composition of opponents may be. However, in the iterated-PD game the optimal strategy depends upon the strategies of likely opponents, and how they will react to defections and cooperations. For example, consider a population where everyone defects every time, except for a single individual following the tit for tat strategy. That individual is at a slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy for that individual is to defect every time. In a population with a certain percentage of always-defectors and the rest being tit for tat players, the optimal strategy for an individual depends on the percentage, and on the length of the game.

In the strategy called Pavlov, win-stay, lose-switch, faced with a failure to cooperate, the player switches strategy the next turn. In certain circumstances, Pavlov beats all other strategies by giving preferential treatment to co-players using a similar strategy.

Deriving the optimal strategy is generally done in two ways:

Bayesian Nash Equilibrium: If the statistical distribution of opposing strategies can be determined (e.g. 50% tit for tat, 50% always cooperate) an optimal counter-strategy can be derived analytically.
Monte Carlo simulations of populations have been made, where individuals with low scores die off, and those with high scores reproduce (a genetic algorithm for finding an optimal strategy). The mix of algorithms in the final population generally depends on the mix in the initial population. The introduction of mutation (random variation during reproduction) lessens the dependency on the initial population; empirical experiments with such systems tend to produce tit for tat players (see for instance Chess 1988), but no analytic proof exists that this will always occur.

Although tit for tat is considered to be the most robust basic strategy, a team from Southampton University in England (led by Professor Nicholas Jennings and consisting of Rajdeep Dash, Sarvapali Ramchurn, Alex Rogers, Perukrishnen Vytelingum) introduced a new strategy at the 20th-anniversary iterated prisoner's dilemma competition, which proved to be more successful than tit for tat. This strategy relied on collusion between programs to achieve the highest number of points for a single program. The university submitted 60 programs to the competition, which were designed to recognize each other through a series of five to ten moves at the start. Once this recognition was made, one program would always cooperate and the other would always defect, assuring the maximum number of points for the defector. If the program realized that it was playing a non-Southampton player, it would continuously defect in an attempt to minimize the score of the competing program. As a result, this strategy ended up taking the top three positions in the competition, as well as a number of positions towards the bottom.

This strategy takes advantage of the fact that multiple entries were allowed in this particular competition and that the performance of a team was measured by that of the highest-scoring player (meaning that the use of self-sacrificing players was a form of minmaxing). In a competition where one has control of only a single player, tit for tat is certainly a better strategy. Because of this new rule, this competition also has little theoretical significance when analysing single agent strategies as compared to Axelrod's seminal tournament. However, it provided a basis for analysing how to achieve cooperative strategies in multi-agent frameworks, especially in the presence of noise. In fact, long before this new-rules tournament was played, Richard Dawkins in his book The Selfish Gene pointed out the possibility of such strategies winning if multiple entries were allowed, but he remarked that most probably Axelrod would not have allowed them if they had been submitted. It also relies on circumventing rules about the prisoner's dilemma in that there is no communication allowed between the two players, which the Southampton programs arguably did with their opening "ten move dance" to recognize one another; this only reinforces just how valuable communication can be in shifting the balance of the game.

Stochastic iterated prisoner's dilemma

In a stochastic iterated prisoner's dilemma game, strategies are specified by in terms of "cooperation probabilities". In an encounter between player X and player Y, X 's strategy is specified by a set of probabilities P of cooperating with Y. P is a function of the outcomes of their previous encounters or some subset thereof. If P is a function of only their most recent n encounters, it is called a "memory-n" strategy. A memory-1 strategy is then specified by four cooperation probabilities:

P=\{P_{cc},P_{cd},P_{dc},P_{dd}\}

, where

P_{ab}

is the probability that X will cooperate in the present encounter given that the previous encounter was characterized by (ab). For example, if the previous encounter was one in which X cooperated and Y defected, then

P_{cd}

is the probability that X will cooperate in the present encounter. If each of the probabilities are either 1 or 0, the strategy is called deterministic. An example of a deterministic strategy is the "tit for tat" strategy written as P={1,0,1,0}, in which X responds as Y did in the previous encounter. Another is the win–stay, lose–switch strategy written as P={1,0,0,1}, in which X responds as in the previous encounter, if it was a "win" (i.e. cc or dc) but changes strategy if it was a loss (i.e. cd or dd). It has been shown that for any memory-n strategy there is a corresponding memory-1 strategy which gives the same statistical results, so that only memory-1 strategies need be considered.

If we define P as the above 4-element strategy vector of X and

Q=\{Q_{cc},Q_{cd},Q_{dc},Q_{dd}\}

as the 4-element strategy vector of Y, a transition matrix M may be defined for X whose ij th entry is the probability that the outcome of a particular encounter between X and Y will be j given that the previous encounter was i, where i and j are one of the four outcome indices: cc, cd, dc, or dd. For example, from X 's point of view, the probability that the outcome of the present encounter is cd given that the previous encounter was cd is equal to

M_{cd,cd}=P_{cd}(1-Q_{dc})

. (Note that the indices for Q are from Y 's point of view: a cd outcome for X is a dc outcome for Y.) Under these definitions, the iterated prisoner's dilemma qualifies as a stochastic process and M is a stochastic matrix, allowing all of the theory of stochastic processes to be applied.

One result of stochastic theory is that there exists a stationary vector v for the matrix M such that

v\cdot M=v

. Without loss of generality, it may be specified that v is normalized so that the sum of its four components is unity. The ij th entry in

M^{n}

will give the probability that the outcome of an encounter between X and Y will be j given that the encounter n steps previous is i. In the limit as n approaches infinity, M will converge to a matrix with fixed values, giving the long-term probabilities of an encounter producing j which will be independent of i. In other words, the rows of

M^{\infty }

will be identical, giving the long-term equilibrium result probabilities of the iterated prisoners dilemma without the need to explicitly evaluate a large number of interactions. It can be seen that v is a stationary vector for

M^{n}

and particularly

M^{\infty }

, so that each row of

M^{\infty }

will be equal to v. Thus the stationary vector specifies the equilibrium outcome probabilities for X. Defining

S_{x}=\{R,S,T,P\}

and

S_{y}=\{R,T,S,P\}

as the short-term payoff vectors for the {cc,cd,dc,dd} outcomes (From X 's point of view), the equilibrium payoffs for X and Y can now be specified as

s_{x}=v\cdot S_{x}

and

s_{y}=v\cdot S_{y}

, allowing the two strategies P and Q to be compared for their long term payoffs.

Zero-determinant strategies

The relationship between zero-determinant (ZD), cooperating and defecting strategies in the Iterated Prisoner’s Dilemma (IPD) illustrated in a Venn diagram. Cooperating strategies always cooperate with other cooperating strategies, and defecting strategies always defect against other defecting strategies. Both contain subsets of strategies that are robust under strong selection, meaning no other memory-1 strategy is selected to invade such strategies when they are resident in a population. Only cooperating strategies contain a subset that are always robust, meaning that no other memory-1 strategy is selected to invade and replace such strategies, under both strong and weak selection. The intersection between ZD and good cooperating strategies is the set of generous ZD strategies. Extortion strategies are the intersection between ZD and non-robust defecting strategies. Tit-for-tat lies at the intersection of cooperating, defecting and ZD strategies.

In 2012, William H. Press and Freeman Dyson published a new class of strategies for the stochastic iterated prisoner's dilemma called "zero-determinant" (ZD) strategies. The long term payoffs for encounters between X and Y can be expressed as the determinant of a matrix which is a function of the two strategies and the short term payoff vectors:

s_{x}=D(P,Q,S_{x})

and

s_{y}=D(P,Q,S_{y})

, which do not involve the stationary vector v. Since the determinant function

s_{y}=D(P,Q,f)

is linear in f, it follows that

\alpha s_{x}+\beta s_{y}+\gamma =D(P,Q,\alpha S_{x}+\beta S_{y}+\gamma U)

(where U={1,1,1,1}). Any strategies for which

D(P,Q,\alpha S_{x}+\beta S_{y}+\gamma U)=0

is by definition a ZD strategy, and the long term payoffs obey the relation

\alpha s_{x}+\beta s_{y}+\gamma =0

.

Tit-for-tat is a ZD strategy which is "fair" in the sense of not gaining advantage over the other player. However, the ZD space also contains strategies that, in the case of two players, can allow one player to unilaterally set the other player's score or alternatively, force an evolutionary player to achieve a payoff some percentage lower than his own. The extorted player could defect but would thereby hurt himself by getting lower payoff. Thus, extortion solutions turn the iterated prisoner's dilemma into a sort of ultimatum game. Specifically, X is able to choose a strategy for which

D(P,Q,\beta S_{y}+\gamma U)=0

, unilaterally setting

s_{y}

to a specific value within a particular range of values, independent of Y 's strategy, offering an opportunity for X to "extort" player Y (and vice versa). (It turns out that if X tries to set

s_{x}

to a particular value, the range of possibilities is much smaller, only consisting of complete cooperation or complete defection.)

An extension of the IPD is an evolutionary stochastic IPD, in which the relative abundance of particular strategies is allowed to change, with more successful strategies relatively increasing. This process may be accomplished by having less successful players imitate the more successful strategies, or by eliminating less successful players from the game, while multiplying the more successful ones. It has been shown that unfair ZD strategies are not evolutionarily stable. The key intuition is that an evolutionarily stable strategy must not only be able to invade another population (which extortionary ZD strategies can do) but must also perform well against other players of the same type (which extortionary ZD players do poorly, because they reduce each other's surplus).

Theory and simulations confirm that beyond a critical population size, ZD extortion loses out in evolutionary competition against more cooperative strategies, and as a result, the average payoff in the population increases when the population is bigger. In addition, there are some cases in which extortioners may even catalyze cooperation by helping to break out of a face-off between uniform defectors and win–stay, lose–switch agents.

While extortionary ZD strategies are not stable in large populations, another ZD class called "generous" strategies is both stable and robust. In fact, when the population is not too small, these strategies can supplant any other ZD strategy and even perform well against a broad array of generic strategies for iterated prisoner's dilemma, including win–stay, lose–switch. This was proven specifically for the donation game by Alexander Stewart and Joshua Plotkin in 2013. Generous strategies will cooperate with other cooperative players, and in the face of defection, the generous player loses more utility than its rival. Generous strategies are the intersection of ZD strategies and so-called "good" strategies, which were defined by Akin (2013) to be those for which the player responds to past mutual cooperation with future cooperation and splits expected payoffs equally if he receives at least the cooperative expected payoff. Among good strategies, the generous (ZD) subset performs well when the population is not too small. If the population is very small, defection strategies tend to dominate.

Continuous iterated prisoner's dilemma

Most work on the iterated prisoner's dilemma has focused on the discrete case, in which players either cooperate or defect, because this model is relatively simple to analyze. However, some researchers have looked at models of the continuous iterated prisoner's dilemma, in which players are able to make a variable contribution to the other player. Le and Boyd found that in such situations, cooperation is much harder to evolve than in the discrete iterated prisoner's dilemma. The basic intuition for this result is straightforward: in a continuous prisoner's dilemma, if a population starts off in a non-cooperative equilibrium, players who are only marginally more cooperative than non-cooperators get little benefit from assorting with one another. By contrast, in a discrete prisoner's dilemma, tit for tat cooperators get a big payoff boost from assorting with one another in a non-cooperative equilibrium, relative to non-cooperators. Since nature arguably offers more opportunities for variable cooperation rather than a strict dichotomy of cooperation or defection, the continuous prisoner's dilemma may help explain why real-life examples of tit for tat-like cooperation are extremely rare in nature (ex. Hammerstein) even though tit for tat seems robust in theoretical models.

Emergence of stable strategies

Players cannot seem to coordinate mutual cooperation, thus often get locked into the inferior yet stable strategy of defection. In this way, iterated rounds facilitate the evolution of stable strategies. Iterated rounds often produce novel strategies, which have implications to complex social interaction. One such strategy is win-stay lose-shift. This strategy outperforms a simple Tit-For-Tat strategy – that is, if you can get away with cheating, repeat that behavior, however if you get caught, switch.

The only problem of this tit-for-tat strategy is that they are vulnerable to signal error. The problem arises when one individual shows cooperative behavior but the other interprets it as cheating. As a result of this, the second individual now cheats and then it starts a see-saw pattern of cheating in a chain reaction.

Real-life examples

The prisoner setting may seem contrived, but there are in fact many examples in human interaction as well as interactions in nature that have the same payoff matrix. The prisoner's dilemma is therefore of interest to the social sciences such as economics, politics, and sociology, as well as to the biological sciences such as ethology and evolutionary biology. Many natural processes have been abstracted into models in which living beings are engaged in endless games of prisoner's dilemma. This wide applicability of the PD gives the game its substantial importance.

In environmental studies

In environmental studies, the PD is evident in crises such as global climate-change. It is argued all countries will benefit from a stable climate, but any single country is often hesitant to curb CO₂ emissions. The immediate benefit to any one country from maintaining current behavior is wrongly perceived to be greater than the purported eventual benefit to that country if all countries' behavior was changed, therefore explaining the impasse concerning climate-change in 2007.

An important difference between climate-change politics and the prisoner's dilemma is uncertainty; the extent and pace at which pollution can change climate is not known. The dilemma faced by government is therefore different from the prisoner's dilemma in that the payoffs of cooperation are unknown. This difference suggests that states will cooperate much less than in a real iterated prisoner's dilemma, so that the probability of avoiding a possible climate catastrophe is much smaller than that suggested by a game-theoretical analysis of the situation using a real iterated prisoner's dilemma.

Osang and Nandy provide a theoretical explanation with proofs for a regulation-driven win-win situation along the lines of Michael Porter's hypothesis, in which government regulation of competing firms is substantial.

In animals

Cooperative behavior of many animals can be understood as an example of the prisoner's dilemma. Often animals engage in long term partnerships, which can be more specifically modeled as iterated prisoner's dilemma. For example, guppies inspect predators cooperatively in groups, and they are thought to punish non-cooperative inspectors.

Vampire bats are social animals that engage in reciprocal food exchange. Applying the payoffs from the prisoner's dilemma can help explain this behavior:

C/C: "Reward: I get blood on my unlucky nights, which saves me from starving. I have to give blood on my lucky nights, which doesn't cost me too much."
D/C: "Temptation: You save my life on my poor night. But then I get the added benefit of not having to pay the slight cost of feeding you on my good night."
C/D: "Sucker's Payoff: I pay the cost of saving your life on my good night. But on my bad night you don't feed me and I run a real risk of starving to death."
D/D: "Punishment: I don't have to pay the slight costs of feeding you on my good nights. But I run a real risk of starving on my poor nights."

In psychology

In addiction research / behavioral economics, George Ainslie points out that addiction can be cast as an intertemporal PD problem between the present and future selves of the addict. In this case, defecting means relapsing, and it is easy to see that not defecting both today and in the future is by far the best outcome. The case where one abstains today but relapses in the future is the worst outcome – in some sense the discipline and self-sacrifice involved in abstaining today have been "wasted" because the future relapse means that the addict is right back where he started and will have to start over (which is quite demoralizing, and makes starting over more difficult). Relapsing today and tomorrow is a slightly "better" outcome, because while the addict is still addicted, they haven't put the effort in to trying to stop. The final case, where one engages in the addictive behavior today while abstaining "tomorrow" will be familiar to anyone who has struggled with an addiction. The problem here is that (as in other PDs) there is an obvious benefit to defecting "today", but tomorrow one will face the same PD, and the same obvious benefit will be present then, ultimately leading to an endless string of defections.

John Gottman in his research described in "the science of trust" defines good relationships as those where partners know not to enter the (D,D) cell or at least not to get dynamically stuck there in a loop.

In economics

Advertising is sometimes cited as a real-example of the prisoner’s dilemma. When cigarette advertising was legal in the United States, competing cigarette manufacturers had to decide how much money to spend on advertising. The effectiveness of Firm A’s advertising was partially determined by the advertising conducted by Firm B. Likewise, the profit derived from advertising for Firm B is affected by the advertising conducted by Firm A. If both Firm A and Firm B chose to advertise during a given period, then the advertising cancels out, receipts remain constant, and expenses increase due to the cost of advertising. Both firms would benefit from a reduction in advertising. However, should Firm B choose not to advertise, Firm A could benefit greatly by advertising. Nevertheless, the optimal amount of advertising by one firm depends on how much advertising the other undertakes. As the best strategy is dependent on what the other firm chooses there is no dominant strategy, which makes it slightly different from a prisoner's dilemma. The outcome is similar, though, in that both firms would be better off were they to advertise less than in the equilibrium. Sometimes cooperative behaviors do emerge in business situations. For instance, cigarette manufacturers endorsed the making of laws banning cigarette advertising, understanding that this would reduce costs and increase profits across the industry. This analysis is likely to be pertinent in many other business situations involving advertising.

Without enforceable agreements, members of a cartel are also involved in a (multi-player) prisoner's dilemma. 'Cooperating' typically means keeping prices at a pre-agreed minimum level. 'Defecting' means selling under this minimum level, instantly taking business (and profits) from other cartel members. Anti-trust authorities want potential cartel members to mutually defect, ensuring the lowest possible prices for consumers.

In sport

Doping in sport has been cited as an example of a prisoner's dilemma.

Two competing athletes have the option to use an illegal and/or dangerous drug to boost their performance. If neither athlete takes the drug, then neither gains an advantage. If only one does, then that athlete gains a significant advantage over their competitor, reduced by the legal and/or medical dangers of having taken the drug. If both athletes take the drug, however, the benefits cancel out and only the dangers remain, putting them both in a worse position than if neither had used doping.

In international politics

In international political theory, the Prisoner's Dilemma is often used to demonstrate the coherence of strategic realism, which holds that in international relations, all states (regardless of their internal policies or professed ideology), will act in their rational self-interest given international anarchy. A classic example is an arms race like the Cold War and similar conflicts. During the Cold War the opposing alliances of NATO and the Warsaw Pact both had the choice to arm or disarm. From each side's point of view, disarming whilst their opponent continued to arm would have led to military inferiority and possible annihilation. Conversely, arming whilst their opponent disarmed would have led to superiority. If both sides chose to arm, neither could afford to attack the other, but both incurred the high cost of developing and maintaining a nuclear arsenal. If both sides chose to disarm, war would be avoided and there would be no costs.

Although the 'best' overall outcome is for both sides to disarm, the rational course for both sides is to arm, and this is indeed what happened. Both sides poured enormous resources into military research and armament in a war of attrition for the next thirty years until the Soviet Union could not withstand the economic cost. The same logic could be applied in any similar scenario, be it economic or technological competition between sovereign states.

Multiplayer dilemmas

Many real-life dilemmas involve multiple players. Although metaphorical, Hardin's tragedy of the commons may be viewed as an example of a multi-player generalization of the PD: Each villager makes a choice for personal gain or restraint. The collective reward for unanimous (or even frequent) defection is very low payoffs (representing the destruction of the "commons"). A commons dilemma most people can relate to is washing the dishes in a shared house. By not washing dishes an individual can gain by saving his time, but if that behavior is adopted by every resident the collective cost is no clean plates for anyone.

The commons are not always exploited: William Poundstone, in a book about the prisoner's dilemma (see References below), describes a situation in New Zealand where newspaper boxes are left unlocked. It is possible for people to take a paper without paying (defecting) but very few do, feeling that if they do not pay then neither will others, destroying the system. Subsequent research by Elinor Ostrom, winner of the 2009 Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel, hypothesized that the tragedy of the commons is oversimplified, with the negative outcome influenced by outside influences. Without complicating pressures, groups communicate and manage the commons among themselves for their mutual benefit, enforcing social norms to preserve the resource and achieve the maximum good for the group, an example of effecting the best case outcome for PD.

Related games

Closed-bag exchange

The prisoner's dilemma as a briefcase exchange

Douglas Hofstadter once suggested that people often find problems such as the PD problem easier to understand when it is illustrated in the form of a simple game, or trade-off. One of several examples he used was "closed bag exchange":

Two people meet and exchange closed bags, with the understanding that one of them contains money, and the other contains a purchase. Either player can choose to honor the deal by putting into his or her bag what he or she agreed, or he or she can defect by handing over an empty bag.

In this game, defection always gives a game-theoretically preferable outcome.

Friend or Foe?

Friend or Foe? is a game show that aired from 2002 to 2005 on the Game Show Network in the US. It is an example of the prisoner's dilemma game tested on real people, but in an artificial setting. On the game show, three pairs of people compete. When a pair is eliminated, they play a game similar to the prisoner's dilemma to determine how the winnings are split. If they both cooperate (Friend), they share the winnings 50–50. If one cooperates and the other defects (Foe), the defector gets all the winnings and the cooperator gets nothing. If both defect, both leave with nothing. Notice that the reward matrix is slightly different from the standard one given above, as the rewards for the "both defect" and the "cooperate while the opponent defects" cases are identical. This makes the "both defect" case a weak equilibrium, compared with being a strict equilibrium in the standard prisoner's dilemma. If a contestant knows that their opponent is going to vote "Foe", then their own choice does not affect their own winnings. In a specific sense, Friend or Foe has a rewards model between prisoner's dilemma and the game of Chicken.

The rewards matrix is

Pair 2 Pair 1	"Friend" (cooperate)	"Foe" (defect)
"Friend" (cooperate)	1 1	2 0
"Foe" (defect)	0 2	0 0

This payoff matrix has also been used on the British television programmes Trust Me, Shafted, The Bank Job and Golden Balls, and on the American shows Bachelor Pad and Take It All. Game data from the Golden Balls series has been analyzed by a team of economists, who found that cooperation was "surprisingly high" for amounts of money that would seem consequential in the real world, but were comparatively low in the context of the game.

Iterated snowdrift

Researchers from the University of Lausanne and the University of Edinburgh have suggested that the "Iterated Snowdrift Game" may more closely reflect real-world social situations. Although this model is actually a chicken game, it will be described here. In this model, the risk of being exploited through defection is lower, and individuals always gain from taking the cooperative choice. The snowdrift game imagines two drivers who are stuck on opposite sides of a snowdrift, each of whom is given the option of shoveling snow to clear a path, or remaining in their car. A player's highest payoff comes from leaving the opponent to clear all the snow by themselves, but the opponent is still nominally rewarded for their work.

This may better reflect real world scenarios, the researchers giving the example of two scientists collaborating on a report, both of whom would benefit if the other worked harder. "But when your collaborator doesn’t do any work, it’s probably better for you to do all the work yourself. You’ll still end up with a completed project."

Example snowdrift payouts (A, B)
B A	Cooperates	Defects
Cooperates	200, 200	100, 300
Defects	300, 100	0, 0

Example PD payouts (A, B)
B A	Cooperates	Defects
Cooperates	200, 200	-100, 300
Defects	300, -100	0, 0

Software

Several software packages have been created to run prisoner's dilemma simulations and tournaments, some of which have available source code.

The source code for the second tournament run by Robert Axelrod (written by Axelrod and many contributors in Fortran) is available online
Prison, a library written in Java, last updated in 1998
Axelrod-Python, written in Python (programming language)
play the Iterative Prisonerr's Dilemma in the browser, play against strategies or let strategies play against other strategies

In fiction

Hannu Rajaniemi set the opening scene of his The Quantum Thief trilogy in a "dilemma prison". The main theme of the series has been described as the "inadequacy of a binary universe" and the ultimate antagonist is a character called the All-Defector. Rajaniemi is particularly interesting as an artist treating this subject in that he is a Cambridge-trained mathematician and holds a PhD in mathematical physics – the interchangeability of matter and information is a major feature of the books, which take place in a "post-singularity" future. The first book in the series was published in 2010, with the two sequels, The Fractal Prince and The Causal Angel, published in 2012 and 2014, respectively.

A game modeled after the (iterated) prisoner's dilemma is a central focus of the 2012 video game Zero Escape: Virtue's Last Reward and a minor part in its 2016 sequel Zero Escape: Zero Time Dilemma.

In The Mysterious Benedict Society and the Prisoner's Dilemma by Trenton Lee Stewart, the main characters start by playing a version of the game and escaping from the "prison" altogether. Later they become actual prisoners and escape once again.

Collective intentionality

From Wikipedia, the free encyclopedia

Collective intentionality demonstrated in a human formation.

In the philosophy of mind, collective intentionality characterizes the intentionality that occurs when two or more individuals undertake a task together. Examples include two individuals carrying a heavy table up a flight of stairs or dancing a tango.

This phenomenon is approached from psychological and normative perspectives, among others. Prominent philosophers working in the psychological manner are Raimo Tuomela, Kaarlo Miller, John R. Searle, and Michael E. Bratman. Margaret Gilbert takes a normative approach dealing specifically with group formation. David Velleman is also concerned with how groups are formed, but his account lacks the normative element present in Gilbert.

The notion that collectives are capable of forming intentions can be found, whether implicitly or explicitly, in literature going back thousands of years. For example, ancient texts such as Plato's Republic discuss the cooperative determination of laws and social order by the group composed of society as a whole. This theme was later expanded into social contract theory by Enlightenment-era philosophers such as Thomas Hobbes and John Locke. In the 20th century, the likes of Wilfrid Sellars and Anthony Quinton noted the existence of "We-Intentions" amid broader discussion of the concept of intentionality, and thus laid the groundwork for the focused philosophical analysis of collective intentionality that began in the late 1980s.

Raimo Tuomela and Kaarlo Miller

Contemporary philosophical discussion of collective intentionality was initiated by Raimo Tuomela and Kaarlo Miller's "We-Intentions". In this paper, Tuomela and Miller assert three conditions necessary for a collective intention, highlighting the importance of beliefs among the agents of the group. After citing examples that are commonly accepted as requiring more than one member to participate (carrying a table upstairs, playing tennis, toasting to a friend, conversing, etc.), they state their criteria:

A member (A) of a collective (G) we-intends to do a group action (X) if and only if:

(A) intends to do his or her part of X
(A) believes that accomplishing X is possible, and that all members of G intend to do their part towards accomplishing X
(A) believes that all the members of G also believe that accomplishing X is possible.

To illustrate this idea, imagine Anne and Bob intend to carry a table (that is far too heavy for one person to carry) upstairs. In order for this action to qualify as a we-intention, Anne first needs to intend to do her part in carrying the table. Next, Anne needs to believe that carrying the table upstairs is possible, and that Bob intends to do his part in carrying. Finally, Anne needs to believe that Bob also believes that carrying the table upstairs is possible. If all of these conditions are met, then Anne and Bob have collective intentions under Tuomela and Miller's criteria.

John Searle

John Searle's 1990 paper, "Collective Intentions and Actions" offers another interpretation of collective action. In contrast to Tuomela and Miller, Searle claims that collective intentionality is a "primitive phenomenon, which cannot be analyzed as the summation of individual intentional behavior". He exemplifies the fundamental distinction between "I-intentions" and "We-intentions" by comparing the hypothetical case of a set of picnickers and a dance troupe. During a rainstorm, each picnicker spontaneously runs for cover. On the other hand, the members of the dance troupe run for cover as part of a preconceived routine. Searle claims that the picnickers, whose intentions are individually oriented and simply happen to coincide, do not display collective intentionality, while members of the dance troupe do, because they deliberately cooperate with one another.

Searle's rebuttal to Tuomela and Miller's account begins with a counterexample involving a group of business school graduates who intend to pursue their own selfish interests, but believe that by doing so, they will indirectly serve humanity. These young businessmen believe that their fellow graduates will do likewise, but do not actively cooperate with one another in pursuing their goals. Searle holds that this example fulfills all of Tuomela and Miller's criteria for collective intentionality. However, he claims that collective intentionality does not actually exist in such a situation unless the graduates have organized and formed an explicit pact with one another to serve humanity through self-interested action.

He proceeds to specify two criteria that must be satisfied by any proper account of collective intentionality:

It "must be consistent with the fact that society consists of nothing but individuals."
It must take into account that any individual's intentions are independent of "the fact of whether or not he is getting things right."

Although a "we-intention" is always held by an individual, it must make fundamental reference to a collective formed in conjunction with the other individual(s). For instance, two individuals who, while sharing the labor of hollandaise-sauce production, each believe the proposition "We are making hollandaise sauce", have formed a collective intention. This would not exist if they only held beliefs to the effect of "I am stirring", or "I am pouring". It is thus, Searle claims, that collective intentionality is not reducible to individual intentionality.

Michael Bratman

Michael Bratman's 1992 paper "Shared Cooperative Activity", contends that shared cooperative activity (SCA) can be reduced to "I-intentions". In other words, just as an individual can plan to act by him or herself, that same individual can also plan for a group to act. With this in mind, he presents three characteristics of shared cooperative activity:

Each participant must be mutually responsive to the intentions and actions of the others,
The participants must each be committed to the joint activity,
The participants must each be committed to supporting the efforts of the others.

One aspect of Bratman's argument that supports these criteria is the idea of meshing subplans. Bratman claims that in a shared cooperative activity, individuals' secondary plans do not need to be the same, but they cannot conflict. For example, consider his example of two people who intend to paint a house together. Let us call these two people Alice and Bill. Suppose Alice wants to paint the house red and Bill wants to paint the house blue. Both are aware that their subplans conflict, and that the other is aware of it as well. Bratman argues that even if Alice and Bill do end up painting the house together, they do not have a shared cooperative activity, because their subplans are in conflict. Furthermore, each participant must also be committed to having subplans that mesh. Without this commitment, participants might disregard others' subplans, leading to a lack of cooperation. However, he additionally claims that their subplans need not be identical. For instance, suppose Alice wants to use an inexpensive paint and Bill wants paint from a specific hardware store. In this case, there is a way that both subplans can achieved: they could buy an inexpensive paint from Bill's store of choice. The details of Bratman's view are as follows:

For a cooperatively neutral action, our doing an action J is an SCA if and only if:

We do J (in a way that could involve cooperation, but does not have to)
It is common knowledge between us that we are both committed to meshing subplans and
(B) leads to (A) by way of mutual responsiveness (in the pursuit of completing our action) of intention and in action.

Responses to Bratman

One work associated with Bratman is Facundo Alonso's "Shared Intention, Reliance, and Interpersonal Obligations". Alonso contends that shared intention is a basis for interpersonal obligation. He begins the paper by asserting characteristics of joint action, which do not include multiple agents acting individually or factors of body movements, but instead are shared or collective intentions to act. Alonso distinguishes the normative theory supplied by Gilbert and the descriptive theory supplied by Bratman. Whereas Bratman focuses on intents, Alonso is also careful to point out Tuomela and Miller's focus on action to describe the roots of joint action. Alonso attempts to compromise both views by taking a path where joint action is not necessarily a normative or descriptive case. He argues for a system built off Bratman's that can take place in a descriptive nature addressed by Margaret Gilbert.

Stephen Butterfill offers another response to Bratman's view. He argues that Bratman's account is unable to explain simple interactions between agents. For example, Butterfill states that Bratman cannot explain cooperative actions between very young children, who do not yet have an understanding of other minds.

Margaret Gilbert

Whereas Bratman argues for a descriptive account of collective intentionality, other authors have taken a normative approach. Margaret Gilbert in "Walking Together: A Paradigmatic Social Phenomena", sets the conditions for people entering, enduring, and exiting acts of collective intentionality. Gilbert asserts that social groups in general can be defined by something as simple as two people walking together. In her analysis the basic conditions for collective intentions that must be satisfied are as follows:

People must know they are entering into an agreement by communicating it clearly (even if they are coerced). Gilbert states that this act of agreement is sufficient to set a goal for a group. Furthermore, the agreement groups the agents who comprise the group into a plural subject.
The agreement implies that each member is obligated to completing the final goal.
Because of this implied obligation, any and all members may rebuke any one else who fails to do their part towards the completion of the goal. The "right to rebuke" is stated as a necessary feature of the group arrangement. This functions as a tool for each member of the group to ensure the goal is accomplished.
In order to break the agreement there has to be joint consent among all members of the group.

A paradigmatic social phenomenon: two people walking together

Responses to Gilbert

A number of philosophers have responded to the normative theory of Gilbert with papers that consider obligations, promises and commitments. One of these, Christopher McMahon, argues that Gilbert has observed crucial behavioral phenomena involved in acts of collective intentionality, but has misidentified the psychological dynamics underlying these phenomena. Specifically, he holds that the behaviors characterizing collective intentionality arise not from a set of mutual obligations which facilitate a "right to rebuke" but from the existence of de facto authority, or some kind of social decision-making process. This de facto authority gives one party a right to partially determine another's intentions.

Facundo M. Alonso sets conditions for how the normative phenomenon of shared intention can arise. Alonso claims that shared intention involves mutual reliance between participants. He further argues for a cognitive requirement that each member publicly intends the joint activity. Thus, Alonso states, "[R]elations of mutual reliance generate...interpersonal obligations between the participants". As a result, shared intentions generate normative promises that are enforced by mutual reliance and relevant obligation.

A. S. Roth offers his own modifications to Gilbert's account of intentionality. He, too, relies on a normative notion to explain collective intentions. Rather than obligations, however, Roth is interested in commitments. Roth enumerates four different types of commitment: participatory, contralateral, executive, and ipsilateral commitments. Roth claims that the contralateral commitments are necessary for joint actions to occur, and that they may have a moral component (though not necessarily). This opposes Gilbert's claim that the obligations found in joint activity have no moral component.
Christopher Kutz's work "Acting Together" contests the basis for what is considered a group. When speaking of a group, it becomes common to say "they" did whatever action the group is seen as doing. However, Kutz explains that each person may have varying levels of involvement in their group or their group efforts. He also questions what obligations each member is considered to have to the group and what binds those individuals to their group. To illustrate his objections, Kutz describes two group types: executive and participatory. An "executive" commitment would extend to those members of a group who participate with others of a group only superficially but still carry the name of the group as a title. This includes people working in an office or an assembly line. A "participatory" group is involved directly with the process and end results of an action. Each member is assumed to have at least some knowledge of all of the plans and sub-plans for the actions taken by the group. This opens Kutz to a discussion about who, within the group, may be considered responsible for the actions of the group.

J. David Velleman

J. David Velleman provides a reaction to Gilbert as well as Searle. Velleman is concerned with explaining how a group is capable of making a decision, or, as he puts it, "how... several different minds (can) submit themselves to a single making up". To that end, he picks up Gilbert's notion of the 'pool of wills', that is, "a single will forged from the wills of different individuals". However, according to Velleman, Gilbert does not explain how such a thing can be formed. To solve this problem, he turns to a portion of Searle's theory of intentions, namely that an "intention is a mental representation that causes behavior by representing itself as causing it".

Velleman explains that, since a representation is capable of causing behavior, and speech acts are a form of representation, it is possible for a speech act to cause a behavior. That is, saying a thing can cause one to do that thing. Thus, a speech act can, in itself, be an intention. This is critical for him to make the case that an agent, having made a decision or an intending speech act, can "remain decided". In other words, that agent can continue to intend after the speech act has been accomplished. With this, Velleman shows how an agent can make a decision for a group. If an agent utters a conditional intention, and another agent utters an intention that fulfills the conditions present in the previous utterance, then the second agent has effectively decided the question for the first agent. Thus, a single collective will has been formed from multiple individual wills.

Therefore, Velleman argues that collective intention is not the summation of multiple individual intentions, but rather one shared intention. This is accomplished by perceiving intentions as existing outside the mind of an individual and within a verbal statement. The verbal statements have causal power because of the desire to not speak falsely.

Natalie Gold and Robert Sugden

Collective intentionality has also been approached in light of economic theories, including game theory. According to Natalie Gold and Robert Sugden, efforts to define collective intentions as individual intentions and related beliefs (such as those of Tuomela & Miller and Michael Bratman) fail because they allow obviously non-cooperative actions to be counted as cooperative. For example, in many simple games analyzed by game theory, the players are counted as acting jointly when they achieve the Nash equilibrium, even though that equilibrium state is neither optimal nor achieved cooperatively. In the prisoner's dilemma, the Nash equilibrium occurs when each player defects against the other, even though they would both do better if they cooperated.

The normal game for prisoners' dilemma is shown below:

	Prisoner B stays silent (cooperates)	Prisoner B betrays (defects)
Prisoner A stays silent (cooperates)	Each serves 1 month	Prisoner A: 12 months Prisoner B: goes free
Prisoner A betrays (defects)	Prisoner A: goes free Prisoner B: 12 months	Each serves 3 months

Standard game theory bases rationality in individual self-interest, and thus predicts that all rational agents will choose defect. However, as Gold and Sugden note, between 40 and 50 percent of participants in prisoner's dilemma trials instead choose cooperate. They argue that by employing we-reasoning, a team of people can intend and act in rational ways to achieve the outcome they, as a group, desire. Members of a group reason with the goal of achieving not "what is best for me", but "what is best for us". This distinction draws on Searle's claim that "the notion of a we-intention...implies the notion of cooperation". As a result, if each prisoner recognizes that he or she belongs to a team, he or she will conclude that cooperation is in the best interest of the group.

A Medley of Potpourri

Search This Blog

Saturday, November 17, 2018

Prisoner's dilemma

Strategy for the prisoner's dilemma

Generalized form

Special case: Donation game

The iterated prisoner's dilemma

Strategy for the iterated prisoner's dilemma

Stochastic iterated prisoner's dilemma

Zero-determinant strategies

Continuous iterated prisoner's dilemma

Emergence of stable strategies

Real-life examples

In environmental studies

In animals

In psychology

In economics

In sport

In international politics

Multiplayer dilemmas

Related games

Closed-bag exchange

Friend or Foe?

Iterated snowdrift

Software

In fiction

Collective intentionality

Raimo Tuomela and Kaarlo Miller

John Searle

Michael Bratman

Responses to Bratman

Margaret Gilbert

Responses to Gilbert

J. David Velleman

Natalie Gold and Robert Sugden

Hard problem of consciousness

Followers

Total Pageviews