A strange loop is a cyclic structure that goes through several levels in a hierarchical system. It arises when, by moving only upwards or downwards through the system, one finds oneself back where one started.
Strange loops may involve self-reference and paradox. The concept of a strange loop was proposed and extensively discussed by Douglas Hofstadter in Gödel, Escher, Bach, and is further elaborated in Hofstadter's book I Am a Strange Loop, published in 2007.
A tangled hierarchy is a hierarchical system in which a strange loop appears.
Definitions
A strange loop is a hierarchy of levels, each of which is linked to
at least one other by some type of relationship. A strange loop
hierarchy is "tangled" (Hofstadter refers to this as a "heterarchy"),
in that there is no well defined highest or lowest level; moving
through the levels, one eventually returns to the starting point, i.e.,
the original level. Examples of strange loops that Hofstadter offers
include: many of the works of M. C. Escher, the Canon 5. a 2 from J.S. Bach's Musical Offering, the information flow network between DNA and enzymes through protein synthesis and DNA replication, and self-referentialGödelian statements in formal systems.
And yet when I say "strange loop", I have something else
in mind — a less concrete, more elusive notion. What I mean by "strange
loop" is — here goes a first stab, anyway — not a physical circuit but
an abstract loop in which, in the series of stages that constitute the
cycling-around, there is a shift from one level of abstraction (or
structure) to another, which feels like an upwards movement in an
hierarchy, and yet somehow the successive "upward" shifts turn out to
give rise to a closed cycle. That is, despite one's sense of departing
ever further from one's origin, one winds up, to one's shock, exactly
where one had started out. In short, a strange loop is a paradoxical
level-crossing feedback loop. (pp. 101–102)
In cognitive science
According to Hofstadter, strange loops take form in human
consciousness as the complexity of active symbols in the brain
inevitably leads to the same kind of self-reference which Gödel proved was inherent in any sufficiently complex logical or arithmetical system (that allows for arithmetic by means of the Peano axioms) in his incompleteness theorem. Gödel showed that mathematics and logic contain strange loops: propositions that not only refer to mathematical and logical truths, but also to the symbol systems expressing those truths. This leads to the sort of paradoxes seen in statements such as "This statement is false," wherein the sentence's basis of truth is found in referring to itself and its assertion, causing a logical paradox.
Hofstadter argues that the psychological self arises out of a similar kind of paradox. The brain is not born with an "I" – the ego
emerges only gradually as experience shapes the brain's dense web of
active symbols into a tapestry rich and complex enough to begin twisting back upon itself.
According to this view, the psychological "I" is a narrative fiction,
something created only from intake of symbolic data and the brain's
ability to create stories about itself from that data. The consequence
is that a self-perspective is a culmination of a unique pattern of
symbolic activity in the brain, which suggests that the pattern of
symbolic activity that makes identity, that constitutes subjectivity,
can be replicated within the brains of others, and likely even in artificial brains.
Strangeness
The "strangeness" of a strange loop comes from the brain's
perception, because the brain categorizes its input in a small number of
"symbols" (by which Hofstadter means groups of neurons standing for
something in the outside world). So the difference between the
video-feedback loop and the brain's strange loops, is that while the
former converts light to the same pattern on a screen, the latter
categorizes a pattern and outputs its "essence", so that as the brain
gets closer and closer to its "essence", it goes further down its
strange loop.
Downward causality
Hofstadter thinks that minds appear to determine the world by way of "downward causality", which refers to effects being viewed in terms of their underlying causes. Hofstadter says this happens in the proof of Gödel's incompleteness theorem:
Merely from knowing the formula's meaning, one can infer
its truth or falsity without any effort to derive it in the
old-fashioned way, which requires one to trudge methodically "upwards"
from the axioms. This is not just peculiar; it is astonishing. Normally,
one cannot merely look at what a mathematical conjecture says and simply appeal to the content of that statement on its own to deduce whether the statement is true or false. (pp. 169–170)
Hofstadter claims a similar "flipping around of causality" appears to happen in minds possessing self-consciousness; the mind perceives itself as the cause of certain feelings.
The parallels between downward causality in formal systems and downward causality in brains are explored by Theodor Nenu in 2022, together with other aspects of Hofstadter's metaphysics of mind. Nenu
also questions the correctness of the above quote by focusing on the
sentence which "says about itself" that it is provable (also known as a
Henkin-sentence, named after logician Leon Henkin). It turns out that under suitable meta-mathematical choices (where the Hilbert-Bernays provability conditions
do not obtain), one can construct formally undecidable (or even
formally refutable) Henkin-sentences for the arithmetical system under
investigation. This system might very well be Hofstadter's Typographical Number Theory used in Gödel, Escher, Bach or the more familiar Peano Arithmetic
or some other sufficiently rich formal arithmetic. Thus, there are
examples of sentences "which say about themselves that they are
provable", but they don't exhibit the sort of downward causal powers
described in the displayed quote.
The "chicken or the egg" paradox is perhaps the best-known strange loop problem.
The "ouroboros",
which depicts a dragon eating its own tail, is perhaps one of the most
ancient and universal symbolic representations of the reflexive loop
concept.
A Shepard tone is another illustrative example of a strange loop. Named after Roger Shepard, it is a sound consisting of a superposition of tones separated by octaves. When played with the base pitch of the tone moving upwards or downwards, it is referred to as the Shepard scale. This creates the auditory illusion
of a tone that continually ascends or descends in pitch, yet which
ultimately seems to get no higher or lower. In a similar way a sound
with seemingly ever increasing tempo can be constructed, as was
demonstrated by Jean-Claude Risset.
A quine
in software programming is a program that produces a new version of
itself without any input from the outside. A similar concept is metamorphic code.
Efron's dice are four dice that are intransitive under gambler's preference. I.e., the dice are ordered A > B > C > D > A, where x > y means "a gambler prefers x to y".
Individual preferences are always transitive, excluding preferences when given explicit rules such as in Efron's dice or rock-paper-scissors; however, aggregate preferences of a group may be intransitive. This can result in a Condorcet paradox
wherein following a path from one candidate across a series of majority
preferences may return to the original candidate, leaving no clear
preference by the group. In this case, some candidate beats an
opponent, who in turn beats another opponent, and so forth, until a
candidate is reached who beats the original candidate.
The mathematical phenomenon of polysemy
has been observed to be a strange loop. At the denotational level, the
term refers to situations where a single entity can be seen to mean more than one mathematical object. See Tanenbaum (1999).
The Stonecutter is an old Japanese fairy tale with a story that explains social and natural hierarchies as a strange loop.
A strange loop can be found by traversing the links in the “See also” sections of the respective English Wikipedia articles. For instance: This article->Mise en abyme->Recursion->this article.
In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments (Bernoulli trials). In other words, a binomial proportion confidence interval is an interval estimate of a success probability when only the number of experiments and the number of successes are known.
There are several formulas for a binomial confidence interval, but all of them rely on the assumption of a binomial distribution.
In general, a binomial distribution applies when an experiment is
repeated a fixed number of times, each trial of the experiment has two
possible outcomes (success and failure), the probability of success is
the same for each trial, and the trials are statistically independent. Because the binomial distribution is a discrete probability distribution
(i.e., not continuous) and difficult to calculate for large numbers of
trials, a variety of approximations are used to calculate this
confidence interval, all with their own tradeoffs in accuracy and
computational intensity.
A simple example of a binomial distribution is the set of various
possible outcomes, and their probabilities, for the number of heads
observed when a coin is flipped
ten times. The observed binomial proportion is the fraction of the
flips that turn out to be heads. Given this observed proportion, the
confidence interval for the true probability of the coin landing on
heads is a range of possible proportions, which may or may not contain
the true proportion. A 95% confidence interval for the proportion, for
instance, will contain the true proportion 95% of the times that the
procedure for constructing the confidence interval is employed.
Problems with using a normal approximation or "Wald interval"
Plotting the normal approximation interval on an arbitrary logistic curve reveals problems of overshoot and zero-width intervals.
A commonly used formula for a binomial confidence interval relies on
approximating the distribution of error about a binomially-distributed
observation, with a normal distribution. The normal approximation depends on the de Moivre–Laplace theorem (the original, binomial-only version of the central limit theorem)
and becomes unreliable when it violates the theorems' premises, as the
sample size becomes small or the success probability grows close to
either 0 or 1 .
Using the normal approximation, the success probability is estimated by
where is the proportion of successes in a Bernoulli trial process and an estimator for in the underlying Bernoulli distribution. The equivalent formula in terms of observation counts is
where the data are the results of trials that yielded successes and failures. The distribution function argument is the quantile of a standard normal distribution (i.e., the probit) corresponding to the target error rate For a 95% confidence level, the error so that and
When using the Wald formula to estimate or just considering the possible outcomes of this calculation, two problems immediately become apparent:
First, for approaching either 1 or 0, the interval narrows to zero width (falsely implying certainty).
Second, for values of (probability too low / too close to 0), the interval boundaries exceed (overshoot).
(Another version of the second, overshoot problem, arises when instead falls below the same upper bound: probability too high / too close to 1 .)
An important theoretical derivation of this confidence interval
involves the inversion of a hypothesis test. Under this formulation, the
confidence interval represents those values of the population parameter
that would have large p-values if they were tested as a hypothesized population proportion. The collection of values, for which the normal approximation is valid can be represented as
Since the test in the middle of the inequality is a Wald test, the normal approximation interval is sometimes called the Wald interval or Wald method, after Abraham Wald, but it was first described by Laplace (1812).
Bracketing the confidence interval
Extending the normal approximation and Wald-Laplace interval concepts, Michael Short has shown that inequalities on the approximation error
between the binomial distribution and the normal distribution can be
used to accurately bracket the estimate of the confidence interval
around
with
and where is again the (unknown) proportion of successes in a Bernoulli trial process (as opposed to that estimates it) measured with trials yielding successes, is the quantile of a standard normal distribution (i.e., the probit) corresponding to the target error rate and the constants and are simple algebraic functions of For a fixed (hence fixed ),
the above inequalities give easily computed one- or two-sided intervals
which bracket the exact binomial upper and lower confidence limits
corresponding to the error rate
Standard error of a proportion estimation when using weighted data
Let there be a simple random sample where each is i.i.d from a Bernoulli( p ) distribution and weight is the weight for each observation, with the(positive) weights normalized so they sum to 1 . The weighted sample proportion is: Since each of the is independent from all the others, and each one has variance for every the sampling variance of the proportion therefore is:
The standard error of is the square root of this quantity. Because we do not know we have to estimate it. Although there are many possible estimators, a conventional one is to use the sample mean, and plug this into the formula. That gives:
For otherwise unweighted data, the effective weights are uniform giving The becomes leading to the familiar formulas, showing that the calculation for weighted data is a direct generalization of them.
Wilson score interval
Wilson score intervals plotted on a logistic curve, revealing asymmetry and good performance for small n and where p is at or near 0 or 1.
The Wilson score interval was developed by E.B. Wilson (1927). It is an improvement over the normal approximation interval in multiple
respects: Unlike the symmetric normal approximation interval (above),
the Wilson score interval is asymmetric, and it doesn't suffer from problems of overshoot and zero-width intervals that afflict the normal interval. It can be safely employed with small samples and skewed observations. The observed coverage probability is consistently closer to the nominal value,
Like the normal interval, the interval can be computed directly from a formula.
Wilson started with the normal approximation to the binomial:
where is the standard normal interval half-width corresponding to the desired confidence The analytic formula for a binomial sample standard deviation is
Combining the two, and squaring out the radical, gives an equation that is quadratic in or
Transforming the relation into a standard-form quadratic equation for treating and as known values from the sample (see prior section), and using the value of that corresponds to the desired confidence for the estimate of gives this:
where all of the values bracketed by parentheses are known quantities.
The solution for estimates the upper and lower limits of the confidence interval for Hence the probability of success is estimated by and with confidence bracketed in the interval
where is an abbreviation for
An equivalent expression using the observation counts and is
with the counts as above: the count of observed "successes", the count of observed "failures", and their sum is the total number of observations
In practical tests of the formula's results, users find that this
interval has good properties even for a small number of trials and / or
the extremes of the probability estimate,
Intuitively, the center value of this interval is the weighted average of and with receiving greater weight as the sample size increases. Formally, the center value corresponds to using a pseudocount of
the number of standard deviations of the confidence interval: Add this
number to both the count of successes and of failures to yield the
estimate of the ratio. For the common two standard deviations in each
direction interval (approximately 95% coverage, which itself is
approximately 1.96 standard deviations), this yields the estimate which is known as the "plus four rule".
Although the quadratic can be solved explicitly, in most cases Wilson's equations can also be solved numerically using the fixed-point iteration
with
(with the lower quantile)
can then be solved for to produce the Wilson score interval. The test in the middle of the inequality is a score test.
The interval equality principle
The probability density function (pdf) for the Wilson score interval, plus pdfs at interval bounds. Tail areas are equal.
Since the interval is derived by solving from the normal approximation to the binomial, the Wilson score interval has the property of being guaranteed to obtain the same result as the equivalent z-test or chi-squared test.
This property can be visualised by plotting the probability density function for the Wilson score interval (see Wallis).After that, then also plotting a normal pdf
across each bound. The tail areas of the resulting Wilson and normal
distributions represent the chance of a significant result, in that
direction, must be equal.
The continuity-corrected Wilson score interval and the Clopper-Pearson interval are also compliant with this property. The practical import is that these intervals may be employed as significance tests, with identical results to the source test, and new tests may be derived by geometry.
Wilson score interval with continuity correction
The Wilson interval may be modified by employing a continuity correction, in order to align the minimum coverage probability, rather than the average coverage probability, with the nominal value,
The following formulae for the lower and upper bounds of the Wilson score interval with continuity correction are derived from Newcombe:
for and
If then must instead be set to if then must be instead set to
Wallis (2021) identifies a simpler method for computing continuity-corrected Wilson intervals that employs a special function based on Wilson's lower-bound formula: In Wallis' notation, for the lower bound, let
where is the selected tolerable error level for Then
This method has the advantage of being further decomposable.
Jeffreys interval
Jeffreys intervals plotted on a logistic curve, revealing asymmetry and good performance for small n and where p is at or near 0 or 1.
The Jeffreys interval has a Bayesian derivation, but good
frequentist properties (outperforming most frequentist constructions).
In particular, it has coverage properties that are similar to those of
the Wilson interval, but it is one of the few intervals with the
advantage of being equal-tailed (e.g., for a 95% confidence
interval, the probabilities of the interval lying above or below the
true value are both close to 2.5%). In contrast, the Wilson interval has
a systematic bias such that it is centred too close to .
When and the Jeffreys interval is taken to be the equal-tailed posterior probability interval, i.e., the and quantiles of a Beta distribution with parameters
In order to avoid the coverage probability tending to zero when or 1 , when the upper limit is calculated as before but the lower limit is set to 0 , and when the lower limit is calculated as before but the upper limit is set to 1 .
Jeffreys' interval can also be thought of as a frequentist interval based on inverting the p-value from the G-test after applying the Yates correction to avoid a potentially-infinite value for the test statistic.
Clopper–Pearson interval
The Clopper–Pearson interval is an early and very common method for calculating binomial confidence intervals. This is often called an 'exact' method, as it attains the nominal
coverage level in an exact sense, meaning that the coverage level is
never less than the nominal .
The Clopper–Pearson interval can be written as
or equivalently,
with
and
where is the number of successes observed in the sample and is a binomial random variable with trials and probability of success
Equivalently we can say that the Clopper–Pearson interval is with confidence level if is the infimum of those such that the following tests of hypothesis succeed with significance
H0: with HA:
H0: with HA:
Because of a relationship between the binomial distribution and the beta distribution, the Clopper–Pearson interval is sometimes presented in an alternate format that uses quantiles from the beta distribution.
where is the number of successes, is the number of trials, and is the pth quantile from a beta distribution with shape parameters and
When is either 0 or closed-form expressions for the interval bounds are available: when the interval is
and when
it is
The beta distribution is, in turn, related to the F-distribution so a third formulation of the Clopper–Pearson interval can be written using F quantiles:
where is the number of successes, is the number of trials, and is the quantile from an F-distribution with and degrees of freedom.
The Clopper–Pearson interval is an 'exact' interval, since it is
based directly on the binomial distribution rather than any
approximation to the binomial distribution. This interval never has less
than the nominal coverage for any population proportion, but that means
that it is usually conservative. For example, the true coverage rate of
a 95% Clopper–Pearson interval may be well above 95%, depending on and Thus the interval may be wider than it needs to be to achieve 95%
confidence, and wider than other intervals. In contrast, it is worth
noting that other confidence interval may have coverage levels that are
lower than the nominal i.e., the normal approximation (or "standard") interval, Wilson interval, Agresti–Coull interval, etc., with a nominal coverage of 95% may in fact cover less than 95%, even for large sample sizes.
The definition of the Clopper–Pearson interval can also be
modified to obtain exact confidence intervals for different
distributions. For instance, it can also be applied to the case where
the samples are drawn without replacement from a population of a known
size, instead of repeated draws of a binomial distribution. In this
case, the underlying distribution would be the hypergeometric distribution.
The interval boundaries can be computed with numerical functions qbeta in R and scipy.stats.beta.ppf in Python.
The Agresti–Coull interval is also another approximate binomial confidence interval.
Given successes in trials, define
and
Then, a confidence interval for is given by
where is the quantile of a standard normal distribution, as before (for example, a 95% confidence interval requires thereby producing ). According to Brown, Cai, & DasGupta (2001), taking instead of 1.96 produces the "add 2 successes and 2 failures" interval previously described by Agresti & Coull.
This interval can be summarised as employing the centre-point adjustment, of the Wilson score interval, and then applying the Normal approximation to this point.
The arcsine transformation has the effect of pulling out the ends of the distribution. While it can stabilize the variance (and thus confidence intervals) of
proportion data, its use has been criticized in several contexts.
Let be the number of successes in trials and let The variance of is
Using the arc sine transform, the variance of the arcsine of is
So, the confidence interval itself has the form
where is the quantile of a standard normal distribution.
This method may be used to estimate the variance of but its use is problematic when is close to 0 or 1 .
ta transform
Let be the proportion of successes. For
This family is a generalisation of the logit transform which is a special case with a = 1 and can be used to transform a proportional data distribution to an approximately normal distribution. The parameter a has to be estimated for the data set.
Rule of three — for when no successes are observed
The rule of three is used to provide a simple way of stating an approximate 95% confidence interval for , in the special case that no successes () have been observed. The interval is .
By symmetry, in the case of only successes (), the interval is .
Comparison and discussion
There are several research papers that compare these and other confidence intervals for the binomial proportion.
Both Ross (2003) and Agresti & Coull (1998) point out that exact methods such as the Clopper–Pearson interval may
not work as well as some approximations. The normal approximation
interval and its presentation in textbooks has been heavily criticised,
with many statisticians advocating that it not be used. The principal problems are overshoot (bounds exceed [0, 1]), zero-width intervals at or 1 (falsely implying certainty), and overall inconsistency with significance testing.
Of the approximations listed above, Wilson score interval methods
(with or without continuity correction) have been shown to be the most
accurate and the most robust, though some prefer Agresti & Coulls' approach for larger sample sizes. Wilson and Clopper–Pearson methods obtain consistent results with source significance tests, and this property is decisive for many researchers.
Many of these intervals can be calculated in R using packages like binom.