A Medley of Potpourri

Wednesday, January 8, 2020

Gambler's ruin

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Gambler's_ruin

The term gambler's ruin is a statistical concept expressed in a variety of forms:

The original meaning is that a persistent gambler who raises his bet to a fixed fraction of bankroll when he wins, but does not reduce it when he loses, will eventually and inevitably go broke, even if he has a positive expected value on each bet.
Another common meaning is that a persistent gambler with finite wealth, playing a fair game (that is, each bet has expected value zero to both sides) will eventually and inevitably go broke against an opponent with infinite wealth. Such a situation can be modeled by a random walk on the real number line. In that context it is provable that the agent will return to his point of origin or go broke and is ruined an infinite number of times if the random walk continues forever.
The result above is a corollary of a general theorem by Christiaan Huygens which is also known as gambler's ruin. That theorem shows how to compute the probability of each player winning a series of bets that continues until one's entire initial stake is lost, given the initial stakes of the two players and the constant probability of winning. This is the oldest mathematical idea that goes by the name gambler's ruin, but not the first idea to which the name was applied.
The most common use of the term today is that a gambler playing a negative expected value game will eventually go broke, regardless of betting system. This is another corollary to Huygens's result.
The concept may be stated as an ironic paradox: Persistently taking beneficial chances is never beneficial at the end. This paradoxical form of gambler's ruin should not be confused with the gambler's fallacy, a different concept.

The concept has specific relevance for gamblers; however it also leads to mathematical theorems with wide application and many related results in probability and statistics. Huygens's result in particular led to important advances in the mathematical theory of probability.

History

The earliest known mention of the gambler's ruin problem is a letter from Blaise Pascal to Pierre Fermat in 1656 (two years after the more famous correspondence on the problem of points). Pascal's version was summarized in a 1656 letter from Pierre de Carcavi to Huygens:

Let two men play with three dice, the first player scoring a point whenever 11 is thrown, and the second whenever 14 is thrown. But instead of the points accumulating in the ordinary way, let a point be added to a player's score only if his opponent's score is nil, but otherwise let it be subtracted from his opponent's score. It is as if opposing points form pairs, and annihilate each other, so that the trailing player always has zero points. The winner is the first to reach twelve points; what are the relative chances of each player winning?

Huygens reformulated the problem and published it in De ratiociniis in ludo aleae ("On Reasoning in Games of Chance", 1657):

Problem (2-1) Each player starts with 12 points, and a successful roll of the three dice for a player (getting an 11 for the first player or a 14 for the second) adds one to that player's score and subtracts one from the other player's score; the loser of the game is the first to reach zero points. What is the probability of victory for each player?

This is the classic gambler's ruin formulation: two players begin with fixed stakes, transferring points until one or the other is "ruined" by getting to zero points. However, the term "gambler's ruin" was not applied until many years later.

Reasons for the four results

Let "bankroll" be the amount of money a gambler has at his disposal at any moment, and let N be any positive integer. Suppose that he raises his stake to

{\frac {\text{bankroll}}{N}}

when he wins, but does not reduce his stake when he loses. This general pattern is not uncommon among real gamblers, and casinos encourage it by "chipping up" winners (giving them higher denomination chips). Under this betting scheme, it will take at most N losing bets in a row to bankrupt him. If his probability of winning each bet is less than 1 (if it is 1, then he is no gambler), he will eventually lose N bets in a row, however big N is. It is not necessary that he follow the precise rule, just that he increase his bet fast enough as he wins. This is true even if the expected value of each bet is positive.

The gambler playing a fair game (with 0.5 probability of winning) will eventually either go broke or double his wealth. Let's define that the game ends upon either event. These events are equally likely, or the game would not be fair. So he has a 0.5 chance of going broke before doubling his money. Given he doubles his money, a new game begins and he again has a 0.5 chance of doubling his money before going broke. After the second game there is a 1/2 x 1/2 chance that he has not gone broke in the first and second games. Continuing this way, his chance of not going broke after n successive games is 1/2 x 1/2 x 1/2 x . . . 1/2^n which approaches 0. His chance of going broke after n successive games is 0.5 + 0.25 + 0.125 + . . . 1 - 1/2^n which approaches 1.

Huygens's result is illustrated in the next section.

The eventual fate of a player at a negative expected value game cannot be better than the player at a fair game, so he will go broke as well.

Example of Huygens's result

Fair coin flipping

Consider a coin-flipping game with two players where each player has a 50% chance of winning with each flip of the coin. After each flip of the coin the loser transfers one penny to the winner. The game ends when one player has all the pennies.

If there are no other limitations on the number of flips, the probability that the game will eventually end this way is 1. (One way to see this is as follows. Any given finite string of heads and tails will eventually be flipped with certainty: the probability of not seeing this string, while high at first, decays exponentially. In particular, the players would eventually flip a string of heads as long as the total number of pennies in play, by which time the game must have already ended.)

If player one has n₁ pennies and player two n₂ pennies, the probabilities P₁ and P₂ that players one and two, respectively, will end penniless are:

{\displaystyle {\begin{aligned}P_{1}&={\frac {n_{2}}{n_{1}+n_{2}}}\\[5pt]P_{2}&={\frac {n_{1}}{n_{1}+n_{2}}}\end{aligned}}}

Two examples of this are if one player has more pennies than the other; and if both players have the same number of pennies. In the first case say player one

(P_{1})

has 8 pennies and player two (

P_{2}

) were to have 5 pennies then the probability of each losing is:

{\displaystyle {\begin{aligned}P_{1}&={\frac {5}{8+5}}={\frac {5}{13}}=0.3846{\text{ or }}38.46\%\\[6pt]P_{2}&={\frac {8}{8+5}}={\frac {8}{13}}=0.6154{\text{ or }}61.54\%\end{aligned}}}

It follows that even with equal odds of winning the player that starts with fewer pennies is more likely to fail.

In the second case where both players have the same number of pennies (in this case 6) the likelihood of each losing is:

{\displaystyle {\begin{aligned}P_{1}&={\frac {6}{6+6}}={\frac {6}{12}}={\frac {1}{2}}=0.5\\[5pt]P_{2}&={\frac {6}{6+6}}={\frac {6}{12}}={\frac {1}{2}}=0.5\end{aligned}}}

Unfair coin flipping

In the event of an unfair coin, where player one wins each toss with probability p, and player two wins with probability q = 1 − p, then the probability of each ending penniless is:

{\displaystyle {\begin{aligned}P_{1}&={\frac {1-({\frac {p}{q}})^{n_{2}}}{1-({\frac {p}{q}})^{n_{1}+n_{2}}}}\\[5pt]P_{2}&={\frac {1-({\frac {q}{p}})^{n_{1}}}{1-({\frac {q}{p}})^{n_{1}+n_{2}}}}\end{aligned}}}

This can be shown as follows: Consider the probability of player 1 experiencing gamblers ruin having started with

n>1

amount of money,

P(R_{n})

. Then, using the Law of Total Probability, we have

P(R_{n})=P(R_{n}\mid W)P(W)+P(R_{n}\mid {\bar {W}})P({\bar {W}}),

where W denotes the event that player 1 wins the first bet. Then clearly

P(W)=p

and

P({\bar {W}})=1-p=q

. Also

P(R_{n}\mid W)

is the probability that player 1 experiences gambler's ruin having started with

n+1

amount of money:

P(R_{n+1})

; and

P(R_{n}\mid {\bar {W}})

is the probability that player 1 experiences gambler's ruin having started with

n-1

amount of money:

P(R_{n-1})

Denoting

q_{n}=P(R_{n})

, we get the linear homogeneous recurrence relation

q_{n}=q_{n+1}p+q_{n-1}q,

which we can solve using the fact that

q_{0}=1

(i.e. the probability of gambler's ruin given that player 1 starts with no money is 1), and

q_{n_{1}+n_{2}}=0

(i.e. the probability of gambler's ruin given that player 1 starts with all the money is 0.) For a more detailed description of the method see e.g. Feller (1970), An introduction to probability theory and its applications, 3rd ed.

N-player ruin problem

The above described problem (2 players) is a special case of the so-called N-Player ruin problem. Here

N\geq 2\,\,

players with initial capital

x_{1},x_{2},\ldots ,x_{N}\,\,

dollars, respectively, play a sequence of (arbitrary) independent games and win and lose certain amounts of dollars from/to each other according to fixed rules. The sequence of games ends as soon as at least one player is ruined. Standard Markov chain methods can be applied to solve in principle this more general problem, but the computations quickly become prohibitive as soon as the number of players or their initial capital increase. For

N=2\,

and large initial capitals

x_{1},x_{2}\,

the solution can be well approximated by using two-dimensional Brownian motion. (For

N\geq 3

this is not possible.) In practice the true problem is to find the solution for the typical cases of

N\geq 3

and limited initial capital. Swan (2006) proposed an algorithm based on Matrix-analytic methods (Folding algorithm for ruin problems) which significantly reduces the order of the computational task in such cases.

Probability interpretations

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Probability_interpretations

The word probability has been used in a variety of ways since it was first applied to the mathematical study of games of chance. Does probability measure the real, physical tendency of something to occur or is it a measure of how strongly one believes it will occur, or does it draw on both these elements? In answering such questions, mathematicians interpret the probability values of probability theory.

There are two broad categories of probability interpretations which can be called "physical" and "evidential" probabilities. Physical probabilities, which are also called objective or frequency probabilities, are associated with random physical systems such as roulette wheels, rolling dice and radioactive atoms. In such systems, a given type of event (such as a die yielding a six) tends to occur at a persistent rate, or "relative frequency", in a long run of trials. Physical probabilities either explain, or are invoked to explain, these stable frequencies. The two main kinds of theory of physical probability are frequentist accounts (such as those of Venn, Reichenbach and von Mises) and propensity accounts (such as those of Popper, Miller, Giere and Fetzer).

Evidential probability, also called Bayesian probability, can be assigned to any statement whatsoever, even when no random process is involved, as a way to represent its subjective plausibility, or the degree to which the statement is supported by the available evidence. On most accounts, evidential probabilities are considered to be degrees of belief, defined in terms of dispositions to gamble at certain odds. The four main evidential interpretations are the classical (e.g. Laplace's) interpretation, the subjective interpretation (de Finetti and Savage), the epistemic or inductive interpretation (Ramsey, Cox) and the logical interpretation (Keynes and Carnap). There are also evidential interpretations of probability covering groups, which are often labelled as 'intersubjective' (proposed by Gillies and Rowbottom).

Some interpretations of probability are associated with approaches to statistical inference, including theories of estimation and hypothesis testing. The physical interpretation, for example, is taken by followers of "frequentist" statistical methods, such as Ronald Fisher, Jerzy Neyman and Egon Pearson. Statisticians of the opposing Bayesian school typically accept the existence and importance of physical probabilities, but also consider the calculation of evidential probabilities to be both valid and necessary in statistics. This article, however, focuses on the interpretations of probability rather than theories of statistical inference.

The terminology of this topic is rather confusing, in part because probabilities are studied within a variety of academic fields. The word "frequentist" is especially tricky. To philosophers it refers to a particular theory of physical probability, one that has more or less been abandoned. To scientists, on the other hand, "frequentist probability" is just another name for physical (or objective) probability. Those who promote Bayesian inference view "frequentist statistics" as an approach to statistical inference that recognises only physical probabilities. Also the word "objective", as applied to probability, sometimes means exactly what "physical" means here, but is also used of evidential probabilities that are fixed by rational constraints, such as logical and epistemic probabilities.

It is unanimously agreed that statistics depends somehow on probability. But, as to what probability is and how it is connected with statistics, there has seldom been such complete disagreement and breakdown of communication since the Tower of Babel. Doubtless, much of the disagreement is merely terminological and would disappear under sufficiently sharp analysis.

— (Savage, 1954, p 2)

Philosophy

The philosophy of probability presents problems chiefly in matters of epistemology and the uneasy interface between mathematical concepts and ordinary language as it is used by non-mathematicians. Probability theory is an established field of study in mathematics. It has its origins in correspondence discussing the mathematics of games of chance between Blaise Pascal and Pierre de Fermat in the seventeenth century, and was formalized and rendered axiomatic as a distinct branch of mathematics by Andrey Kolmogorov in the twentieth century. In axiomatic form, mathematical statements about probability theory carry the same sort of epistemological confidence within the philosophy of mathematics as are shared by other mathematical statements.

The mathematical analysis originated in observations of the behaviour of game equipment such as playing cards and dice, which are designed specifically to introduce random and equalized elements; in mathematical terms, they are subjects of indifference. This is not the only way probabilistic statements are used in ordinary human language: when people say that "it will probably rain", they typically do not mean that the outcome of rain versus not-rain is a random factor that the odds currently favor; instead, such statements are perhaps better understood as qualifying their expectation of rain with a degree of confidence. Likewise, when it is written that "the most probable explanation" of the name of Ludlow, Massachusetts "is that it was named after Roger Ludlow", what is meant here is not that Roger Ludlow is favored by a random factor, but rather that this is the most plausible explanation of the evidence, which admits other, less likely explanations.

Thomas Bayes attempted to provide a logic that could handle varying degrees of confidence; as such, Bayesian probability is an attempt to recast the representation of probabilistic statements as an expression of the degree of confidence by which the beliefs they express are held.

Though probability initially had somewhat mundane motivations, its modern influence and use is widespread ranging from evidence-based medicine, through six sigma, all the way to the probabilistically checkable proof and the string theory landscape.

A summary of some interpretations of probability
	Classical	Frequentist	Subjective	Propensity
Main hypothesis	Principle of indifference	Frequency of occurrence	Degree of belief	Degree of causal connection
Conceptual basis	Hypothetical symmetry	Past data and reference class	Knowledge and intuition	Present state of system
Conceptual approach	Conjectural	Empirical	Subjective	Metaphysical
Single case possible	Yes	No	Yes	Yes
Precise	Yes	No	No	Yes
Problems	Ambiguity in principle of indifference	Circular definition	Reference class problem	Disputed concept

Classical definition

The first attempt at mathematical rigour in the field of probability, championed by Pierre-Simon Laplace, is now known as the classical definition. Developed from studies of games of chance (such as rolling dice) it states that probability is shared equally between all the possible outcomes, provided these outcomes can be deemed equally likely.

The theory of chance consists in reducing all the events of the same kind to a certain number of cases equally possible, that is to say, to such as we may be equally undecided about in regard to their existence, and in determining the number of cases favorable to the event whose probability is sought. The ratio of this number to that of all the cases possible is the measure of this probability, which is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible.

— Pierre-Simon Laplace, A Philosophical Essay on Probabilities

The classical definition of probability works well for situations with only a finite number of equally-likely outcomes.

This can be represented mathematically as follows: If a random experiment can result in N mutually exclusive and equally likely outcomes and if N_A of these outcomes result in the occurrence of the event A, the probability of A is defined by

P(A)={N_{A} \over N}.

There are two clear limitations to the classical definition. Firstly, it is applicable only to situations in which there is only a 'finite' number of possible outcomes. But some important random experiments, such as tossing a coin until it rises heads, give rise to an infinite set of outcomes. And secondly, you need to determine in advance that all the possible outcomes are equally likely without relying on the notion of probability to avoid circularity—for instance, by symmetry considerations.

Frequentism

For frequentists, the probability of the ball landing in any pocket can be determined only by repeated trials in which the observed result converges to the underlying probability in the long run.

Frequentists posit that the probability of an event is its relative frequency over time, (3.4) i.e., its relative frequency of occurrence after repeating a process a large number of times under similar conditions. This is also known as aleatory probability. The events are assumed to be governed by some random physical phenomena, which are either phenomena that are predictable, in principle, with sufficient information (see determinism); or phenomena which are essentially unpredictable. Examples of the first kind include tossing dice or spinning a roulette wheel; an example of the second kind is radioactive decay. In the case of tossing a fair coin, frequentists say that the probability of getting a heads is 1/2, not because there are two equally likely outcomes but because repeated series of large numbers of trials demonstrate that the empirical frequency converges to the limit 1/2 as the number of trials goes to infinity.

If we denote by

\textstyle n_{a}

the number of occurrences of an event

{\mathcal {A}}

\textstyle n

trials, then if

\lim _{n\to +\infty }{n_{a} \over n}=p

we say that $\textstyle P({\mathcal {A}})=p$ .

The frequentist view has its own problems. It is of course impossible to actually perform an infinity of repetitions of a random experiment to determine the probability of an event. But if only a finite number of repetitions of the process are performed, different relative frequencies will appear in different series of trials. If these relative frequencies are to define the probability, the probability will be slightly different every time it is measured. But the real probability should be the same every time. If we acknowledge the fact that we only can measure a probability with some error of measurement attached, we still get into problems as the error of measurement can only be expressed as a probability, the very concept we are trying to define. This renders even the frequency definition circular; see for example “What is the Chance of an Earthquake?”

Subjectivism

Gambling odds reflect the average bettor's 'degree of belief' in the outcome.

Subjectivists, also known as Bayesians or followers of epistemic probability, give the notion of probability a subjective status by regarding it as a measure of the 'degree of belief' of the individual assessing the uncertainty of a particular situation. Epistemic or subjective probability is sometimes called credence, as opposed to the term chance for a propensity probability.

Some examples of epistemic probability are to assign a probability to the proposition that a proposed law of physics is true, and to determine how probable it is that a suspect committed a crime, based on the evidence presented.

Gambling odds don't reflect the bookies' belief in a likely winner, so much as the other bettors' belief, because the bettors are actually betting against one another. The odds are set based on how many people have bet on a possible winner, so that even if the high odds players always win, the bookies will always make their percentages anyway.

The use of Bayesian probability raises the philosophical debate as to whether it can contribute valid justifications of belief.

Bayesians point to the work of Ramsey (p 182) and de Finetti (p 103) as proving that subjective beliefs must follow the laws of probability if they are to be coherent. Evidence casts doubt that humans will have coherent beliefs.

The use of Bayesian probability involves specifying a prior probability. This may be obtained from consideration of whether the required prior probability is greater or lesser than a reference probability associated with an urn model or a thought experiment. The issue is that for a given problem, multiple thought experiments could apply, and choosing one is a matter of judgement: different people may assign different prior probabilities, known as the reference class problem. The "sunrise problem" provides an example.

Propensity

Propensity theorists think of probability as a physical propensity, or disposition, or tendency of a given type of physical situation to yield an outcome of a certain kind or to yield a long run relative frequency of such an outcome. This kind of objective probability is sometimes called 'chance'.

Propensities, or chances, are not relative frequencies, but purported causes of the observed stable relative frequencies. Propensities are invoked to explain why repeating a certain kind of experiment will generate given outcome types at persistent rates, which are known as propensities or chances. Frequentists are unable to take this approach, since relative frequencies do not exist for single tosses of a coin, but only for large ensembles or collectives (see "single case possible" in the table above). In contrast, a propensitist is able to use the law of large numbers to explain the behaviour of long-run frequencies. This law, which is a consequence of the axioms of probability, says that if (for example) a coin is tossed repeatedly many times, in such a way that its probability of landing heads is the same on each toss, and the outcomes are probabilistically independent, then the relative frequency of heads will be close to the probability of heads on each single toss. This law allows that stable long-run frequencies are a manifestation of invariant single-case probabilities. In addition to explaining the emergence of stable relative frequencies, the idea of propensity is motivated by the desire to make sense of single-case probability attributions in quantum mechanics, such as the probability of decay of a particular atom at a particular time.

The main challenge facing propensity theories is to say exactly what propensity means. (And then, of course, to show that propensity thus defined has the required properties.) At present, unfortunately, none of the well-recognised accounts of propensity comes close to meeting this challenge.

A propensity theory of probability was given by Charles Sanders Peirce. A later propensity theory was proposed by philosopher Karl Popper, who had only slight acquaintance with the writings of C. S. Peirce, however. Popper noted that the outcome of a physical experiment is produced by a certain set of "generating conditions". When we repeat an experiment, as the saying goes, we really perform another experiment with a (more or less) similar set of generating conditions. To say that a set of generating conditions has propensity p of producing the outcome E means that those exact conditions, if repeated indefinitely, would produce an outcome sequence in which E occurred with limiting relative frequency p. For Popper then, a deterministic experiment would have propensity 0 or 1 for each outcome, since those generating conditions would have same outcome on each trial. In other words, non-trivial propensities (those that differ from 0 and 1) only exist for genuinely nondeterministic experiments.

A number of other philosophers, including David Miller and Donald A. Gillies, have proposed propensity theories somewhat similar to Popper's.

Other propensity theorists (e.g. Ronald Giere) do not explicitly define propensities at all, but rather see propensity as defined by the theoretical role it plays in science. They argued, for example, that physical magnitudes such as electrical charge cannot be explicitly defined either, in terms of more basic things, but only in terms of what they do (such as attracting and repelling other electrical charges). In a similar way, propensity is whatever fills the various roles that physical probability plays in science.

What roles does physical probability play in science? What are its properties? One central property of chance is that, when known, it constrains rational belief to take the same numerical value. David Lewis called this the Principal Principle, (3.3 & 3.5) a term that philosophers have mostly adopted. For example, suppose you are certain that a particular biased coin has propensity 0.32 to land heads every time it is tossed. What is then the correct price for a gamble that pays $1 if the coin lands heads, and nothing otherwise? According to the Principal Principle, the fair price is 32 cents.

Logical, epistemic, and inductive probability

It is widely recognized that the term "probability" is sometimes used in contexts where it has nothing to do with physical randomness. Consider, for example, the claim that the extinction of the dinosaurs was probably caused by a large meteorite hitting the earth. Statements such as "Hypothesis H is probably true" have been interpreted to mean that the (presently available) empirical evidence (E, say) supports H to a high degree. This degree of support of H by E has been called the logical probability of H given E, or the epistemic probability of H given E, or the inductive probability of H given E.

The differences between these interpretations are rather small, and may seem inconsequential. One of the main points of disagreement lies in the relation between probability and belief. Logical probabilities are conceived (for example in Keynes' Treatise on Probability) to be objective, logical relations between propositions (or sentences), and hence not to depend in any way upon belief. They are degrees of (partial) entailment, or degrees of logical consequence, not degrees of belief. (They do, nevertheless, dictate proper degrees of belief, as is discussed below.) Frank P. Ramsey, on the other hand, was skeptical about the existence of such objective logical relations and argued that (evidential) probability is "the logic of partial belief". (p 157) In other words, Ramsey held that epistemic probabilities simply are degrees of rational belief, rather than being logical relations that merely constrain degrees of rational belief.

Another point of disagreement concerns the uniqueness of evidential probability, relative to a given state of knowledge. Rudolf Carnap held, for example, that logical principles always determine a unique logical probability for any statement, relative to any body of evidence. Ramsey, by contrast, thought that while degrees of belief are subject to some rational constraints (such as, but not limited to, the axioms of probability) these constraints usually do not determine a unique value. Rational people, in other words, may differ somewhat in their degrees of belief, even if they all have the same information.

Prediction

An alternative account of probability emphasizes the role of prediction – predicting future observations on the basis of past observations, not on unobservable parameters. In its modern form, it is mainly in the Bayesian vein. This was the main function of probability before the 20th century, but fell out of favor compared to the parametric approach, which modeled phenomena as a physical system that was observed with error, such as in celestial mechanics.

The modern predictive approach was pioneered by Bruno de Finetti, with the central idea of exchangeability – that future observations should behave like past observations. This view came to the attention of the Anglophone world with the 1974 translation of de Finetti's book, and has since been propounded by such statisticians as Seymour Geisser.

Axiomatic probability

The mathematics of probability can be developed on an entirely axiomatic basis that is independent of any interpretation: see the articles on probability theory and probability axioms for a detailed treatment.

Likelihood principle

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Likelihood_principle

In statistics, the likelihood principle is the proposition that, given a statistical model, all the evidence in a sample relevant to model parameters is contained in the likelihood function.

A likelihood function arises from a probability density function considered as a function of its distributional parameterization argument. For example, consider a model which gives the probability density function ƒ_X(x | θ) of observable random variable X as a function of a parameter θ. Then for a specific value x of X, the function

{\mathcal {L}}

(θ | x) = ƒ_X(x | θ) is a likelihood function of θ: it gives a measure of how "likely" any particular value of θ is, if we know that X has the value x. The density function may be a density with respect to counting measure, i.e. a probability mass function.

Two likelihood functions are equivalent if one is a scalar multiple of the other. The likelihood principle is this: all information from the data that is relevant to inferences about the value of the model parameters is in the equivalence class to which the likelihood function belongs. The strong likelihood principle applies this same criterion to cases such as sequential experiments where the sample of data that is available results from applying a stopping rule to the observations earlier in the experiment.

Example

Suppose

X is the number of successes in twelve independent Bernoulli trials with probability θ of success on each trial, and
Y is the number of independent Bernoulli trials needed to get three successes, again with probability θ (= 1/2 for a coin-toss) of success on each trial.

Then the observation that X = 3 induces the likelihood function

{\mathcal {L}}(\theta \mid X=3)={\binom {12}{3}}\theta ^{3}(1-\theta )^{9}=220\theta ^{3}(1-\theta )^{9},

while the observation that Y = 12 induces the likelihood function

{\mathcal {L}}(\theta \mid Y=12)={\binom {11}{2}}\theta ^{3}(1-\theta )^{9}=55\theta ^{3}(1-\theta )^{9}.

The likelihood principle says that, as the data are the same in both cases, the inferences drawn about the value of θ should also be the same. In addition, all the inferential content in the data about the value of θ is contained in the two likelihoods, and is the same if they are proportional to one another. This is the case in the above example, reflecting the fact that the difference between observing X = 3 and observing Y = 12 lies not in the actual data, but merely in the design of the experiment. Specifically, in one case, one has decided in advance to try twelve times; in the other, to keep trying until three successes are observed. The inference about θ should be the same, and this is reflected in the fact that the two likelihoods are proportional to each other.

This is not always the case, however. The use of frequentist methods involving p-values leads to different inferences for the two cases above, showing that the outcome of frequentist methods depends on the experimental procedure, and thus violates the likelihood principle.

The law of likelihood

A related concept is the law of likelihood, the notion that the extent to which the evidence supports one parameter value or hypothesis against another is equal to the ratio of their likelihoods, their likelihood ratio. That is,

\Lambda ={{\mathcal {L}}(a\mid X=x) \over {\mathcal {L}}(b\mid X=x)}={P(X=x\mid a) \over P(X=x\mid b)}

is the degree to which the observation x supports parameter value or hypothesis a against b. If this ratio is 1, the evidence is indifferent; if greater than 1, the evidence supports the value a against b; or if less, then vice versa.

In Bayesian statistics, this ratio is known as the Bayes factor, and Bayes' rule can be seen as the application of the law of likelihood to inference.

In frequentist inference, the likelihood ratio is used in the likelihood-ratio test, but other non-likelihood tests are used as well. The Neyman–Pearson lemma states the likelihood-ratio test is the most powerful test for comparing two simple hypotheses at a given significance level, which gives a frequentist justification for the law of likelihood.

Combining the likelihood principle with the law of likelihood yields the consequence that the parameter value which maximizes the likelihood function is the value which is most strongly supported by the evidence. This is the basis for the widely used method of maximum likelihood.

History

The likelihood principle was first identified by that name in print in 1962 (Barnard et al., Birnbaum, and Savage et al.), but arguments for the same principle, unnamed, and the use of the principle in applications goes back to the works of R.A. Fisher in the 1920s. The law of likelihood was identified by that name by I. Hacking (1965). More recently the likelihood principle as a general principle of inference has been championed by A. W. F. Edwards. The likelihood principle has been applied to the philosophy of science by R. Royall.

Birnbaum proved that the likelihood principle follows from two more primitive and seemingly reasonable principles, the conditionality principle and the sufficiency principle. The conditionality principle says that if an experiment is chosen by a random process independent of the states of nature

\theta

, then only the experiment actually performed is relevant to inferences about

\theta

. The sufficiency principle says that if

T(X)

is a sufficient statistic for

\theta

, and if in two experiments with data

x_{1}

and

x_{2}

we have

T(x_{1})=T(x_{2})

, then the evidence about

\theta

given by the two experiments is the same.

Arguments for and against

Some widely used methods of conventional statistics, for example many significance tests, are not consistent with the likelihood principle.

Let us briefly consider some of the arguments for and against the likelihood principle.

The original Birnbaum argument

Birnbaum's proof of the likelihood principle has been disputed by philosophers of science, including Deborah Mayo and statisticians including Michael Evans. On the other hand, a new proof of the likelihood principle has been provided by Greg Gandenberger.

Experimental design arguments on the likelihood principle

Unrealized events play a role in some common statistical methods. For example, the result of a significance test depends on the p-value, the probability of a result as extreme or more extreme than the observation, and that probability may depend on the design of the experiment. To the extent that the likelihood principle is accepted, such methods are therefore denied.

Some classical significance tests are not based on the likelihood. A commonly cited example is the optional stopping problem. Suppose I tell you that I tossed a coin 12 times and in the process observed 3 heads. You might make some inference about the probability of heads and whether the coin was fair. Suppose now I tell that I tossed the coin until I observed 3 heads, and I tossed it 12 times. Will you now make some different inference?

The likelihood function is the same in both cases: it is proportional to

p^{3}(1-p)^{9}.

According to the likelihood principle, the inference should be the same in either case.

Suppose a number of scientists are assessing the probability of a certain outcome (which we shall call 'success') in experimental trials. Conventional wisdom suggests that if there is no bias towards success or failure then the success probability would be one half. Adam, a scientist, conducted 12 trials and obtains 3 successes and 9 failures. Then he left the lab.

Bill, a colleague in the same lab, continued Adam's work and published Adam's results, along with a significance test. He tested the null hypothesis that p, the success probability, is equal to a half, versus p < 0.5. The probability of the observed result that out of 12 trials 3 or something fewer (i.e. more extreme) were successes, if H₀ is true, is

\left({12 \choose 9}+{12 \choose 10}+{12 \choose 11}+{12 \choose 12}\right)\left({1 \over 2}\right)^{12}

which is 299/4096 = 7.3%. Thus the null hypothesis is not rejected at the 5% significance level.

Charlotte, another scientist, reads Bill's paper and writes a letter, saying that it is possible that Adam kept trying until he obtained 3 successes, in which case the probability of needing to conduct 12 or more experiments is given by

{\displaystyle 1-\left({10 \choose 2}\left({1 \over 2}\right)^{11}+{9 \choose 2}\left({1 \over 2}\right)^{10}+\cdots +{2 \choose 2}\left({1 \over 2}\right)^{3}\right)}

which is 134/4096 = 3.27%. Now the result is statistically significant at the 5% level. Note that there is no contradiction among these two results; both computations are correct.

To these scientists, whether a result is significant or not depends on the design of the experiment, not on the likelihood (in the sense of the likelihood function) of the parameter value being 1/2.

Results of this kind are considered by some as arguments against the likelihood principle. For others it exemplifies the value of the likelihood principle and is an argument against significance tests.

Similar themes appear when comparing Fisher's exact test with Pearson's chi-squared test.

The voltmeter story

An argument in favor of the likelihood principle is given by Edwards in his book Likelihood. He cites the following story from J.W. Pratt, slightly condensed here. Note that the likelihood function depends only on what actually happened, and not on what could have happened.

An engineer draws a random sample of electron tubes and measures their voltages. The measurements range from 75 to 99 Volts. A statistician computes the sample mean and a confidence interval for the true mean. Later the statistician discovers that the voltmeter reads only as far as 100 Volts, so technically, the population appears to be “censored”. If the statistician is orthodox this necessitates a new analysis. However, the engineer says he has another meter reading to 1000 Volts, which he would have used if any voltage had been over 100. This is a relief to the statistician, because it means the population was effectively uncensored after all. But later, the statistician ascertains that the second meter was not working at the time of the measurements. The engineer informs the statistician that he would not have held up the original measurements until the second meter was fixed, and the statistician informs him that new measurements are required. The engineer is astounded. “Next you'll be asking about my oscilloscope!”

This story can be translated to Adam's stopping rule above, as follows. Adam stopped immediately after 3 successes, because his boss Bill had instructed him to do so. After the publication of the statistical analysis by Bill, Adam realizes that he has missed a second instruction from Bill to conduct 12 trials instead, and that Bill's paper is based on this second instruction. Adam is very glad that he got his 3 successes after exactly 12 trials, and explains to his friend Charlotte that by coincidence he executed the second instruction. Later, he is astonished to hear about Charlotte's letter explaining that now the result is significant.

A Medley of Potpourri

Search This Blog

Wednesday, January 8, 2020

Gambler's ruin

History

Reasons for the four results

Example of Huygens's result

Fair coin flipping

Unfair coin flipping

N-player ruin problem

Probability interpretations

Philosophy

Classical definition

Frequentism

Subjectivism

Propensity

Logical, epistemic, and inductive probability

Prediction

Axiomatic probability

Likelihood principle

Example

The law of likelihood

History

Arguments for and against

The original Birnbaum argument

Experimental design arguments on the likelihood principle

The voltmeter story

Copper in biology

Followers

Total Pageviews