A Medley of Potpourri

Monday, January 31, 2022

Information theory

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Information_theory

Information theory

Entropy Differential entropy Conditional entropy Joint entropy Mutual information Conditional mutual information Relative entropy Entropy rate Limiting density of discrete points
Asymptotic equipartition property Rate–distortion theory
Shannon's source coding theorem Channel capacity Noisy-channel coding theorem Shannon–Hartley theorem

Information theory is the scientific study of the quantification, storage, and communication of digital information. The field was fundamentally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. The field is at the intersection of probability theory, statistics, computer science, statistical mechanics, information engineering, and electrical engineering.

A key measure in information theory is entropy. Entropy quantifies the amount of uncertainty involved in the value of a random variable or the outcome of a random process. For example, identifying the outcome of a fair coin flip (with two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a die (with six equally likely outcomes). Some other important measures in information theory are mutual information, channel capacity, error exponents, and relative entropy. Important sub-fields of information theory include source coding, algorithmic complexity theory, algorithmic information theory and information-theoretic security.

Applications of fundamental topics of information theory include source coding/data compression (e.g. for ZIP files), and channel coding/error detection and correction (e.g. for DSL). Its impact has been crucial to the success of the Voyager missions to deep space, the invention of the compact disc, the feasibility of mobile phones and the development of the Internet. The theory has also found applications in other areas, including statistical inference, cryptography, neurobiology, perception, linguistics, the evolution and function of molecular codes (bioinformatics), thermal physics, molecular dynamics, quantum computing, black holes, information retrieval, intelligence gathering, plagiarism detection, pattern recognition, anomaly detection, and even art creation.

Overview

Information theory studies the transmission, processing, extraction, and utilization of information. Abstractly, information can be thought of as the resolution of uncertainty. In the case of communication of information over a noisy channel, this abstract concept was formalized in 1948 by Claude Shannon in a paper entitled A Mathematical Theory of Communication, in which information is thought of as a set of possible messages, and the goal is to send these messages over a noisy channel, and to have the receiver reconstruct the message with low probability of error, in spite of the channel noise. Shannon's main result, the noisy-channel coding theorem showed that, in the limit of many channel uses, the rate of information that is asymptotically achievable is equal to the channel capacity, a quantity dependent merely on the statistics of the channel over which the messages are sent.

Coding theory is concerned with finding explicit methods, called codes, for increasing the efficiency and reducing the error rate of data communication over noisy channels to near the channel capacity. These codes can be roughly subdivided into data compression (source coding) and error-correction (channel coding) techniques. In the latter case, it took many years to find the methods Shannon's work proved were possible.

A third class of information theory codes are cryptographic algorithms (both codes and ciphers). Concepts, methods and results from coding theory and information theory are widely used in cryptography and cryptanalysis. See the article ban (unit) for a historical application.

Historical background

The landmark event establishing the discipline of information theory and bringing it to immediate worldwide attention was the publication of Claude E. Shannon's classic paper "A Mathematical Theory of Communication" in the Bell System Technical Journal in July and October 1948.

Prior to this paper, limited information-theoretic ideas had been developed at Bell Labs, all implicitly assuming events of equal probability. Harry Nyquist's 1924 paper, Certain Factors Affecting Telegraph Speed, contains a theoretical section quantifying "intelligence" and the "line speed" at which it can be transmitted by a communication system, giving the relation $W = K log m$ (recalling Boltzmann's constant), where W is the speed of transmission of intelligence, m is the number of different voltage levels to choose from at each time step, and K is a constant. Ralph Hartley's 1928 paper, Transmission of Information, uses the word information as a measurable quantity, reflecting the receiver's ability to distinguish one sequence of symbols from any other, thus quantifying information as $H = log S n = n log S$ , where S was the number of possible symbols, and n the number of symbols in a transmission. The unit of information was therefore the decimal digit, which since has sometimes been called the hartley in his honor as a unit or scale or measure of information. Alan Turing in 1940 used similar ideas as part of the statistical analysis of the breaking of the German second world war Enigma ciphers.

Much of the mathematics behind information theory with events of different probabilities were developed for the field of thermodynamics by Ludwig Boltzmann and J. Willard Gibbs. Connections between information-theoretic entropy and thermodynamic entropy, including the important contributions by Rolf Landauer in the 1960s, are explored in Entropy in thermodynamics and information theory.

In Shannon's revolutionary and groundbreaking paper, the work for which had been substantially completed at Bell Labs by the end of 1944, Shannon for the first time introduced the qualitative and quantitative model of communication as a statistical process underlying information theory, opening with the assertion:

"The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point."

With it came the ideas of

the information entropy and redundancy of a source, and its relevance through the source coding theorem;
the mutual information, and the channel capacity of a noisy channel, including the promise of perfect loss-free communication given by the noisy-channel coding theorem;
the practical result of the Shannon–Hartley law for the channel capacity of a Gaussian channel; as well as
the bit—a new way of seeing the most fundamental unit of information.

Quantities of information

Information theory is based on probability theory and statistics. Information theory often concerns itself with measures of information of the distributions associated with random variables. Important quantities of information are entropy, a measure of information in a single random variable, and mutual information, a measure of information in common between two random variables. The former quantity is a property of the probability distribution of a random variable and gives a limit on the rate at which data generated by independent samples with the given distribution can be reliably compressed. The latter is a property of the joint distribution of two random variables, and is the maximum rate of reliable communication across a noisy channel in the limit of long block lengths, when the channel statistics are determined by the joint distribution.

The choice of logarithmic base in the following formulae determines the unit of information entropy that is used. A common unit of information is the bit, based on the binary logarithm. Other units include the nat, which is based on the natural logarithm, and the decimal digit, which is based on the common logarithm.

In what follows, an expression of the form $p log p$ is considered by convention to be equal to zero whenever $p = 0$ . This is justified because $\lim _{p\rightarrow 0+}p\log p=0$ for any logarithmic base.

Entropy of an information source

Based on the probability mass function of each source symbol to be communicated, the Shannon entropy $H$ , in units of bits (per symbol), is given by

H=-\sum _{i}p_{i}\log _{2}(p_{i})

where $p i$ is the probability of occurrence of the $i$ -th possible value of the source symbol. This equation gives the entropy in the units of "bits" (per symbol) because it uses a logarithm of base 2, and this base-2 measure of entropy has sometimes been called the shannon in his honor. Entropy is also commonly computed using the natural logarithm (base $e$ , where $e$ is Euler's number), which produces a measurement of entropy in nats per symbol and sometimes simplifies the analysis by avoiding the need to include extra constants in the formulas. Other bases are also possible, but less commonly used. For example, a logarithm of base 2⁸ = 256 will produce a measurement in bytes per symbol, and a logarithm of base 10 will produce a measurement in decimal digits (or hartleys) per symbol.

Intuitively, the entropy $H X$ of a discrete random variable $X$ is a measure of the amount of uncertainty associated with the value of $X$ when only its distribution is known.

The entropy of a source that emits a sequence of $N$ symbols that are independent and identically distributed (iid) is $N \cdot H$ bits (per message of $N$ symbols). If the source data symbols are identically distributed but not independent, the entropy of a message of length $N$ will be less than $N \cdot H$ .

The entropy of a Bernoulli trial as a function of success probability, often called the binary entropy function,

H b (p)

. The entropy is maximized at 1 bit per trial when the two possible outcomes are equally probable, as in an unbiased coin toss.

If one transmits 1000 bits (0s and 1s), and the value of each of these bits is known to the receiver (has a specific value with certainty) ahead of transmission, it is clear that no information is transmitted. If, however, each bit is independently equally likely to be 0 or 1, 1000 shannons of information (more often called bits) have been transmitted. Between these two extremes, information can be quantified as follows. If $\mathbb {X}$ is the set of all messages ${x 1, ..., x n}$ that $X$ could be, and $p (x)$ is the probability of some $x\in \mathbb {X}$ , then the entropy, $H$ , of $X$ is defined:

H(X)=\mathbb {E} _{X}[I(x)]=-\sum _{x\in \mathbb {X} }p(x)\log p(x).

(Here, $I (x)$ is the self-information, which is the entropy contribution of an individual message, and $\mathbb {E} _{X}$ is the expected value.) A property of entropy is that it is maximized when all the messages in the message space are equiprobable $p (x) = 1/ n$ ; i.e., most unpredictable, in which case $H (X) = log n$ .

The special case of information entropy for a random variable with two outcomes is the binary entropy function, usually taken to the logarithmic base 2, thus having the shannon (Sh) as unit:

H_{\mathrm {b} }(p)=-p\log _{2}p-(1-p)\log _{2}(1-p).

Joint entropy

The joint entropy of two discrete random variables $X$ and $Y$ is merely the entropy of their pairing: $(X, Y)$ . This implies that if $X$ and $Y$ are independent, then their joint entropy is the sum of their individual entropies.

For example, if $(X, Y)$ represents the position of a chess piece— $X$ the row and $Y$ the column, then the joint entropy of the row of the piece and the column of the piece will be the entropy of the position of the piece.

H(X,Y)=\mathbb {E} _{X,Y}[-\log p(x,y)]=-\sum _{x,y}p(x,y)\log p(x,y)\,

Despite similar notation, joint entropy should not be confused with cross entropy.

Conditional entropy (equivocation)

The conditional entropy or conditional uncertainty of $X$ given random variable $Y$ (also called the equivocation of $X$ about $Y$ ) is the average conditional entropy over $Y$ :

{\displaystyle H(X|Y)=\mathbb {E} _{Y}[H(X|y)]=-\sum _{y\in Y}p(y)\sum _{x\in X}p(x|y)\log p(x|y)=-\sum _{x,y}p(x,y)\log p(x|y).}

Because entropy can be conditioned on a random variable or on that random variable being a certain value, care should be taken not to confuse these two definitions of conditional entropy, the former of which is in more common use. A basic property of this form of conditional entropy is that:

H(X|Y)=H(X,Y)-H(Y).\,

Mutual information (transinformation)

Mutual information measures the amount of information that can be obtained about one random variable by observing another. It is important in communication where it can be used to maximize the amount of information shared between sent and received signals. The mutual information of $X$ relative to $Y$ is given by:

I(X;Y)=\mathbb {E} _{X,Y}[SI(x,y)]=\sum _{x,y}p(x,y)\log {\frac {p(x,y)}{p(x)\,p(y)}}

where $SI$ (Specific mutual Information) is the pointwise mutual information.

A basic property of the mutual information is that

I(X;Y)=H(X)-H(X|Y).\,

That is, knowing Y, we can save an average of $I (X; Y)$ bits in encoding X compared to not knowing Y.

Mutual information is symmetric:

I(X;Y)=I(Y;X)=H(X)+H(Y)-H(X,Y).\,

Mutual information can be expressed as the average Kullback–Leibler divergence (information gain) between the posterior probability distribution of X given the value of Y and the prior distribution on X:

I(X;Y)=\mathbb {E} _{p(y)}[D_{\mathrm {KL} }(p(X|Y=y)\|p(X))].

In other words, this is a measure of how much, on the average, the probability distribution on X will change if we are given the value of Y. This is often recalculated as the divergence from the product of the marginal distributions to the actual joint distribution:

I(X;Y)=D_{\mathrm {KL} }(p(X,Y)\|p(X)p(Y)).

Mutual information is closely related to the log-likelihood ratio test in the context of contingency tables and the multinomial distribution and to Pearson's χ² test: mutual information can be considered a statistic for assessing independence between a pair of variables, and has a well-specified asymptotic distribution.

Kullback–Leibler divergence (information gain)

The Kullback–Leibler divergence (or information divergence, information gain, or relative entropy) is a way of comparing two distributions: a "true" probability distribution $p(X)$ , and an arbitrary probability distribution $q(X)$ . If we compress data in a manner that assumes $q(X)$ is the distribution underlying some data, when, in reality, $p(X)$ is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression. It is thus defined

{\displaystyle D_{\mathrm {KL} }(p(X)\|q(X))=\sum _{x\in X}-p(x)\log {q(x)}\,-\,\sum _{x\in X}-p(x)\log {p(x)}=\sum _{x\in X}p(x)\log {\frac {p(x)}{q(x)}}.}

Although it is sometimes used as a 'distance metric', KL divergence is not a true metric since it is not symmetric and does not satisfy the triangle inequality (making it a semi-quasimetric).

Another interpretation of the KL divergence is the "unnecessary surprise" introduced by a prior from the truth: suppose a number X is about to be drawn randomly from a discrete set with probability distribution $p(x)$ . If Alice knows the true distribution $p(x)$ , while Bob believes (has a prior) that the distribution is $q(x)$ , then Bob will be more surprised than Alice, on average, upon seeing the value of X. The KL divergence is the (objective) expected value of Bob's (subjective) surprisal minus Alice's surprisal, measured in bits if the log is in base 2. In this way, the extent to which Bob's prior is "wrong" can be quantified in terms of how "unnecessarily surprised" it is expected to make him.

Other quantities

Other important information theoretic quantities include Rényi entropy (a generalization of entropy), differential entropy (a generalization of quantities of information to continuous distributions), and the conditional mutual information.

Coding theory

A picture showing scratches on the readable surface of a CD-R. Music and data CDs are coded using error correcting codes and thus can still be read even if they have minor scratches using error detection and correction.

Coding theory is one of the most important and direct applications of information theory. It can be subdivided into source coding theory and channel coding theory. Using a statistical description for data, information theory quantifies the number of bits needed to describe the data, which is the information entropy of the source.

Data compression (source coding): There are two formulations for the compression problem:
- lossless data compression: the data must be reconstructed exactly;
- lossy data compression: allocates bits needed to reconstruct the data, within a specified fidelity level measured by a distortion function. This subset of information theory is called rate–distortion theory.
Error-correcting codes (channel coding): While data compression removes as much redundancy as possible, an error-correcting code adds just the right kind of redundancy (i.e., error correction) needed to transmit the data efficiently and faithfully across a noisy channel.

This division of coding theory into compression and transmission is justified by the information transmission theorems, or source–channel separation theorems that justify the use of bits as the universal currency for information in many contexts. However, these theorems only hold in the situation where one transmitting user wishes to communicate to one receiving user. In scenarios with more than one transmitter (the multiple-access channel), more than one receiver (the broadcast channel) or intermediary "helpers" (the relay channel), or more general networks, compression followed by transmission may no longer be optimal. Network information theory refers to these multi-agent communication models.

Source theory

Any process that generates successive messages can be considered a source of information. A memoryless source is one in which each message is an independent identically distributed random variable, whereas the properties of ergodicity and stationarity impose less restrictive constraints. All such sources are stochastic. These terms are well studied in their own right outside information theory.

Rate

Information rate is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is

r=\lim _{n\to \infty }H(X_{n}|X_{n-1},X_{n-2},X_{n-3},\ldots );

that is, the conditional entropy of a symbol given all the previous symbols generated. For the more general case of a process that is not necessarily stationary, the average rate is

r=\lim _{n\to \infty }{\frac {1}{n}}H(X_{1},X_{2},\dots X_{n});

that is, the limit of the joint entropy per symbol. For stationary sources, these two expressions give the same result.

Information rate is defined as

r=\lim _{n\to \infty }{\frac {1}{n}}I(X_{1},X_{2},\dots X_{n};Y_{1},Y_{2},\dots Y_{n});

It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a source of information is related to its redundancy and how well it can be compressed, the subject of source coding.

Channel capacity

Communications over a channel is the primary motivation of information theory. However, channels often fail to produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality.

Consider the communications process over a discrete channel. A simple model of the process is shown below:

{\displaystyle {\xrightarrow[{\text{Message}}]{W}}{\begin{array}{|c| }\hline {\text{Encoder}}\\f_{n}\\\hline \end{array}}{\xrightarrow[{\mathrm {Encoded \atop sequence} }]{X^{n}}}{\begin{array}{|c| }\hline {\text{Channel}}\\p(y|x)\\\hline \end{array}}{\xrightarrow[{\mathrm {Received \atop sequence} }]{Y^{n}}}{\begin{array}{|c| }\hline {\text{Decoder}}\\g_{n}\\\hline \end{array}}{\xrightarrow[{\mathrm {Estimated \atop message} }]{\hat {W}}}}

Here X represents the space of messages transmitted, and Y the space of messages received during a unit time over our channel. Let $p (y | x)$ be the conditional probability distribution function of Y given X. We will consider $p (y | x)$ to be an inherent fixed property of our communications channel (representing the nature of the noise of our channel). Then the joint distribution of X and Y is completely determined by our channel and by our choice of $f (x)$ , the marginal distribution of messages we choose to send over the channel. Under these constraints, we would like to maximize the rate of information, or the signal, we can communicate over the channel. The appropriate measure for this is the mutual information, and this maximum mutual information is called the channel capacity and is given by:

C=\max _{f}I(X;Y).\!

This capacity has the following property related to communicating at information rate R (where R is usually bits per symbol). For any information rate R < C and coding error ε > 0, for large enough N, there exists a code of length N and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ε; that is, it is always possible to transmit with arbitrarily small block error. In addition, for any rate R > C, it is impossible to transmit with arbitrarily small block error.

Channel coding is concerned with finding such nearly optimal codes that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.

Capacity of particular channel models

A continuous-time analog communications channel subject to Gaussian noise—see Shannon–Hartley theorem.
A binary symmetric channel (BSC) with crossover probability p is a binary input, binary output channel that flips the input bit with probability p. The BSC has a capacity of $1 - H b (p)$ bits per channel use, where $H b$ is the binary entropy function to the base-2 logarithm:

A binary erasure channel (BEC) with erasure probability p is a binary input, ternary output channel. The possible channel outputs are 0, 1, and a third symbol 'e' called an erasure. The erasure represents complete loss of information about an input bit. The capacity of the BEC is 1 − p bits per channel use.

Channels with memory and directed information

In practice many channels have memory. Namely, at time $i$ the channel is given by the conditional probability $P(y_{i}|x_{i},x_{i-1},x_{1-2},...,x_{1},y_{i-1},y_{1-2},...,y_{1}).$ . It is often more comfortable to use the notation $x^{i}=(x_{i},x_{i-1},x_{1-2},...,x_{1})$ and the channel become $P(y_{i}|x^{i},y^{i-1}).$ . In such a case the capacity is given by the mutual information rate when there is no feedback available and the Directed information rate in the case that either there is feedback or not (if there is no feedback the directed information equals the mutual information).

Applications to other fields

Intelligence uses and secrecy applications

Information theoretic concepts apply to cryptography and cryptanalysis. Turing's information unit, the ban, was used in the Ultra project, breaking the German Enigma machine code and hastening the end of World War II in Europe. Shannon himself defined an important concept now called the unicity distance. Based on the redundancy of the plaintext, it attempts to give a minimum amount of ciphertext necessary to ensure unique decipherability.

Information theory leads us to believe it is much more difficult to keep secrets than it might first appear. A brute force attack can break systems based on asymmetric key algorithms or on most commonly used methods of symmetric key algorithms (sometimes called secret key algorithms), such as block ciphers. The security of all such methods currently comes from the assumption that no known attack can break them in a practical amount of time.

Information theoretic security refers to methods such as the one-time pad that are not vulnerable to such brute force attacks. In such cases, the positive conditional mutual information between the plaintext and ciphertext (conditioned on the key) can ensure proper transmission, while the unconditional mutual information between the plaintext and ciphertext remains zero, resulting in absolutely secure communications. In other words, an eavesdropper would not be able to improve his or her guess of the plaintext by gaining knowledge of the ciphertext but not of the key. However, as in any other cryptographic system, care must be used to correctly apply even information-theoretically secure methods; the Venona project was able to crack the one-time pads of the Soviet Union due to their improper reuse of key material.

Pseudorandom number generation

Pseudorandom number generators are widely available in computer language libraries and application programs. They are, almost universally, unsuited to cryptographic use as they do not evade the deterministic nature of modern computer equipment and software. A class of improved random number generators is termed cryptographically secure pseudorandom number generators, but even they require random seeds external to the software to work as intended. These can be obtained via extractors, if done carefully. The measure of sufficient randomness in extractors is min-entropy, a value related to Shannon entropy through Rényi entropy; Rényi entropy is also used in evaluating randomness in cryptographic systems. Although related, the distinctions among these measures mean that a random variable with high Shannon entropy is not necessarily satisfactory for use in an extractor and so for cryptography uses.

Seismic exploration

One early commercial application of information theory was in the field of seismic oil exploration. Work in this field made it possible to strip off and separate the unwanted noise from the desired seismic signal. Information theory and digital signal processing offer a major improvement of resolution and image clarity over previous analog methods.

Semiotics

Semioticians Doede Nauta and Winfried Nöth both considered Charles Sanders Peirce as having created a theory of information in his works on semiotics. Nauta defined semiotic information theory as the study of "the internal processes of coding, filtering, and information processing."

Concepts from information theory such as redundancy and code control have been used by semioticians such as Umberto Eco and Ferruccio Rossi-Landi to explain ideology as a form of message transmission whereby a dominant social class emits its message by using signs that exhibit a high degree of redundancy such that only one message is decoded among a selection of competing ones.

Miscellaneous applications

Information theory also has applications in Gambling and information theory, black holes, and bioinformatics.

Shunning

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Shunning

Shunning can be the act of social rejection, or emotional distance. In a religious context, shunning is a formal decision by a denomination or a congregation to cease interaction with an individual or a group, and follows a particular set of rules. It differs from, but may be associated with, excommunication.

Social rejection occurs when a person or group deliberately avoids association with, and habitually keeps away from an individual or group. This can be a formal decision by a group, or a less formal group action which will spread to all members of the group as a form of solidarity. It is a sanction against association, often associated with religious groups and other tightly knit organizations and communities. Targets of shunning can include persons who have been labeled as apostates, whistleblowers, dissidents, strikebreakers, or anyone the group perceives as a threat or source of conflict. Social rejection has been established to cause psychological damage and has been categorized as torture or punishment. Mental rejection is a more individual action, where a person subconsciously or willfully ignores an idea, or a set of information related to a particular viewpoint. Some groups are made up of people who shun the same ideas.

Social rejection was and is a punishment in many customary legal systems. Such sanctions include the ostracism of ancient Athens and the still-used kasepekang in Balinese society.

Overview

Shunning can be broken down into behaviours and practices that seek to accomplish either or both of two primary goals.

To modify the behaviour of a member. This approach seeks to influence, encourage, or coerce normative behaviours from members, and may seek to dissuade, provide disincentives for, or to compel avoidance of certain behaviours. Shunning may include disassociating from a member by other members of the community who are in good standing. It may include more antagonistic psychological behaviours (described below). This approach may be seen as either corrective or punitive (or both) by the group membership or leadership, and may also be intended as a deterrent.
To remove or limit the influence of a member (or former member) over other members in a community. This approach may seek to isolate, to discredit, or otherwise dis-empower such a member, often in the context of actions or positions advocated by that member. For groups with defined membership criteria, especially based on key behaviours or ideological precepts, this approach may be seen as limiting damage to the community or its leadership. This is often paired with some form of excommunication.

Some less often practiced variants may seek to:

Remove a specific member from general external influence to provide an ideological or psychological buffer against external views or behaviour. The amount can vary from severing ties to opponents of the group up to and including severing all non-group-affiliated intercourse.

Shunning is usually approved of (if sometimes with regret) by the group engaging in the shunning, and usually highly disapproved of by the target of the shunning, resulting in a polarization of views. Those subject to the practice respond differently, usually depending both on the circumstances of the event, and the nature of the practices being applied. Extreme forms of shunning have damaged some individuals' psychological and relational health. Responses to the practice have developed, mostly around anti-shunning advocacy; such advocates highlight the detrimental effects of many of such behaviors, and seek to limit the practice through pressure or law. Such groups often operate supportive organizations or institutions to help victims of shunning to recover from damaging effects, and sometimes to attack the organizations practicing shunning, as a part of their advocacy.

In many civil societies, kinds of shunning are practiced de facto or de jure, to coerce or avert behaviours or associations deemed unhealthy. This can include:

restraining orders or peace bonds (to avoid abusive relationships)
court injunctions to disassociate (to avoid criminal association or temptation)
medical or psychological instructing to avoid associating (to avoid hazardous relations, i.e. alcoholics being instructed to avoid friendship with non-recovering alcoholics, or asthmatics being medically instructed to keep to smoke-free environs)
using background checks to avoid hiring people who have criminal records (to avoid association with felons, even when the crimes have nothing to do with the job description)

Stealth shunning

Stealth shunning is a practice where a person or an action is silently banned. When a person is silently banned, the group they have been banned from does not interact with them. This can be done by secretly distributing a blacklist announcing the person's wrongdoing.

It can happen informally when all people in a group or email list each conclude that they do not want to interact with the person. When an action is silently banned, requests for that action are either ignored or refused with faked explanations.

Effects

Shunning is often used as a pejorative term to describe any organizationally mandated disassociation, and has acquired a connotation of abuse and relational aggression. This is due to the sometimes extreme damage caused by its disruption to normal relationships between individuals, such as friendships and family relations. Disruption of established relationships certainly causes pain, which is at least an unintended consequence of the practices described here, though it may also in many cases be an intended, coercive consequence. This pain, especially when seen as unjustly inflicted, can have secondary general psychological effects on self-worth and self-confidence, trust and trustworthiness, and can, as with other types of trauma, impair psychological function.

Shunning often involves implicit or explicit shame for a member who commits acts seen as wrong by the group or its leadership. Such shame may not be psychologically damaging if the membership is voluntary and the rules of behavior were clear before the person joined. However, if the rules are arbitrary, if the group membership is seen as essential for personal security, safety, or health, or if the application of the rules is inconsistent, such shame can be highly destructive. This can be especially damaging if perceptions are attacked or controlled, or certain tools of psychological pressure applied. Extremes of this cross over the line into psychological torture and can be permanently scarring.

A key detrimental effect of some of the practices associated with shunning relate to their effect on relationships, especially family relationships. At its extremes, the practices may destroy marriages, break up families, and separate children and their parents. The effect of shunning can be very dramatic or even devastating on the shunned, as it can damage or destroy the shunned member's closest familial, spousal, social, emotional, and economic bonds.

Shunning contains aspects of what is known as relational aggression in psychological literature. When used by church members and member-spouse parents against excommunicant parents it contains elements of what psychologists call parental alienation. Extreme shunning may cause traumas to the shunned (and to their dependents) similar to what is studied in the psychology of torture.

Shunning is also a mechanism in family estrangement. When an adult child, sibling, or parent physically and/or emotionally cuts himself off from the family without proper justification, the act traumatizes the family.

Civil rights implications

Some aspects of shunning may also be seen as being at odds with civil rights or human rights, especially those behaviours that coerce and attack. When a group seeks to have an effect through such practices outside its own membership, for instance when a group seeks to cause financial harm through isolation and disassociation, they can come at odds with their surrounding civil society, if such a society enshrines rights such as freedom of association, conscience, or belief. Many civil societies do not extend such protections to the internal operations of communities or organizations so long as an ex-member has the same rights, prerogatives, and power as any other member of the civil society.

In cases where a group or religion is state-sanctioned, a key power, or in the majority (e.g. in Singapore), a shunned former member may face severe social, political, and/or financial costs.

In religion

Christianity

Passages in the New Testament, such as 1 Corinthians 5:11–13 and Matthew 18:15–17, suggest shunning as an internal practice of early Christians and are cited as such by its modern-day practitioners within Christianity. However, not all Christian scholars or denominations agree on this interpretation of these verses. Douglas A. Jacoby interprets 1 Corinthians 5:11 and Titus 3:9–11 as evidence that members can be excluded from fellowship for matters perceived within the church as grave sin without a religiously acceptable repentance.

Amish

Certain sects of the Amish practice shunning or meidung.

Catholicism

Prior to the Code of Canon Law of 1983, in rare cases (known as excommunication vitandi) the Catholic Church expected adherents to shun an excommunicated member in secular matters.

In 1983, the distinction between vitandi and others (tolerandi) was abolished, and thus the expectation is not made anymore.

Jehovah's Witnesses

Jehovah's Witnesses practise a form of shunning which they refer to as "disfellowshipping". A disfellowshipped person is not to be greeted either socially or at their meetings. Disfellowshipping follows a decision of a judicial committee established by a local congregation that a member is unrepentantly guilty of a "serious sin".

Sociologist Andrew Holden's research indicates that many Witnesses who would otherwise defect because of disillusionment with the organization and its teachings retain affiliation out of fear of being shunned and losing contact with friends and family members.

Judaism

Cherem is the highest ecclesiastical censure in the Jewish community. It is the total exclusion of a person from the Jewish community. It is still used in the Ultra-Orthodox and Chassidic community. In the 21st century, sexual abuse victims and their families who have reported abuse to civil authorities have experienced shunning in the Orthodox communities of New York and Australia.

Baháʼí faith

Members of the Baháʼí Faith are expected to shun those that have been declared Covenant-breakers, and expelled from the religion, by the head of their faith. Covenant-breakers are defined as leaders of schismatic groups that resulted from challenges to legitimacy of Baháʼí leadership, as well as those who follow or refuse to shun them. Unity is considered the highest value in the Baháʼí Faith, and any attempt at schism by a Baháʼí is considered a spiritual sickness, and a negation of that for which the religion stands.

Church of Scientology

The Church of Scientology asks its members to quit all communication with Suppressive Persons (those whom the Church deems antagonistic to Scientology). The practice of shunning in Scientology is termed disconnection. Members can disconnect from any person they already know, including existing family members. Many examples of this policy's application have been established in court. It used to be customary to write a "disconnection letter" to the person being disconnected from, and to write a public disconnection notice, but these practices have not continued. The Church states that typically only people with "false data" about Scientology are antagonistic, so it encourages members to first attempt to provide "true data" to these people. According to official Church statements, disconnection is only used as a last resort and only lasts until the antagonism ceases. Failure to disconnect from a Suppressive Person is itself labelled a Suppressive act. In the United States, the Church has tried to argue in court that disconnection is a constitutionally protected religious practice. However, this argument was rejected because the pressure put on individual Scientologists to disconnect means it is not voluntary.

Uncertainty

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Uncertainty

Situations often arise wherein a decision must be made when the results of each possible choice are uncertain.

Uncertainty refers to epistemic situations involving imperfect or unknown information. It applies to predictions of future events, to physical measurements that are already made, or to the unknown. Uncertainty arises in partially observable or stochastic environments, as well as due to ignorance, indolence, or both. It arises in any number of fields, including insurance, philosophy, physics, statistics, economics, finance, medicine, psychology, sociology, engineering, metrology, meteorology, ecology and information science.

Concepts

Although the terms are used in various ways among the general public, many specialists in decision theory, statistics and other quantitative fields have defined uncertainty, risk, and their measurement as:

Uncertainty

The lack of certainty, a state of limited knowledge where it is impossible to exactly describe the existing state, a future outcome, or more than one possible outcome.

Measurement of uncertainty: A set of possible states or outcomes where probabilities are assigned to each possible state or outcome – this also includes the application of a probability density function to continuous variables.
Second order uncertainty: In statistics and economics, second-order uncertainty is represented in probability density functions over (first-order) probabilities.; Opinions in subjective logic carry this type of uncertainty.
Risk: A state of uncertainty where some possible outcomes have an undesired effect or significant loss.
Measurement of risk: A set of measured uncertainties where some possible outcomes are losses, and the magnitudes of those losses – this also includes loss functions over continuous variables.

Uncertainty versus Variability

There is a difference between uncertainty and variability. Uncertainty is quantified by a probability distribution which depends upon our state of information about the likelihood of what the single, true value of the uncertain quantity is. Variability is quantified by a distribution of frequencies of multiple instances of the quantity, derived from observed data.

Knightian uncertainty

In economics, in 1921 Frank Knight distinguished uncertainty from risk with uncertainty being lack of knowledge which is immeasurable and impossible to calculate. Because of the absence of clearly defined statistics in most economic decisions where people face uncertainty, he believed that we cannot measure probabilities in such cases; this is now referred to as Knightian uncertainty.

Uncertainty must be taken in a sense radically distinct from the familiar notion of risk, from which it has never been properly separated.... The essential fact is that 'risk' means in some cases a quantity susceptible of measurement, while at other times it is something distinctly not of this character; and there are far-reaching and crucial differences in the bearings of the phenomena depending on which of the two is really present and operating.... It will appear that a measurable uncertainty, or 'risk' proper, as we shall use the term, is so far different from an unmeasurable one that it is not in effect an uncertainty at all.
— Frank Knight (1885–1972), Risk, Uncertainty, and Profit (1921), University of Chicago.

There is a fundamental distinction between the reward for taking a known risk and that for assuming a risk whose value itself is not known. It is so fundamental, indeed, that … a known risk will not lead to any reward or special payment at all.
— Frank Knight

Knight pointed out that the unfavorable outcome of known risks can be insured during the decision-making process because it has a clearly defined expected probability distribution. Unknown risks have no known expected probability distribution, which can lead to extremely risky company decisions.

Other taxonomies of uncertainties and decisions include a broader sense of uncertainty and how it should be approached from an ethics perspective:

A taxonomy of uncertainty

There are some things that you know to be true, and others that you know to be false; yet, despite this extensive knowledge that you have, there remain many things whose truth or falsity is not known to you. We say that you are uncertain about them. You are uncertain, to varying degrees, about everything in the future; much of the past is hidden from you; and there is a lot of the present about which you do not have full information. Uncertainty is everywhere and you cannot escape from it.

Dennis Lindley, Understanding Uncertainty (2006)

For example, if it is unknown whether or not it will rain tomorrow, then there is a state of uncertainty. If probabilities are applied to the possible outcomes using weather forecasts or even just a calibrated probability assessment, the uncertainty has been quantified. Suppose it is quantified as a 90% chance of sunshine. If there is a major, costly, outdoor event planned for tomorrow then there is a risk since there is a 10% chance of rain, and rain would be undesirable. Furthermore, if this is a business event and $100,000 would be lost if it rains, then the risk has been quantified (a 10% chance of losing $100,000). These situations can be made even more realistic by quantifying light rain vs. heavy rain, the cost of delays vs. outright cancellation, etc.

Some may represent the risk in this example as the "expected opportunity loss" (EOL) or the chance of the loss multiplied by the amount of the loss (10% × $100,000 = $10,000). That is useful if the organizer of the event is "risk neutral", which most people are not. Most would be willing to pay a premium to avoid the loss. An insurance company, for example, would compute an EOL as a minimum for any insurance coverage, then add onto that other operating costs and profit. Since many people are willing to buy insurance for many reasons, then clearly the EOL alone is not the perceived value of avoiding the risk.

Quantitative uses of the terms uncertainty and risk are fairly consistent from fields such as probability theory, actuarial science, and information theory. Some also create new terms without substantially changing the definitions of uncertainty or risk. For example, surprisal is a variation on uncertainty sometimes used in information theory. But outside of the more mathematical uses of the term, usage may vary widely. In cognitive psychology, uncertainty can be real, or just a matter of perception, such as expectations, threats, etc.

Vagueness is a form of uncertainty where the analyst is unable to clearly differentiate between two different classes, such as 'person of average height.' and 'tall person'. This form of vagueness can be modelled by some variation on Zadeh's fuzzy logic or subjective logic.

Ambiguity is a form of uncertainty where even the possible outcomes have unclear meanings and interpretations. The statement "He returns from the bank" is ambiguous because its interpretation depends on whether the word 'bank' is meant as "the side of a river" or "a financial institution". Ambiguity typically arises in situations where multiple analysts or observers have different interpretations of the same statements.

Uncertainty may be a consequence of a lack of knowledge of obtainable facts. That is, there may be uncertainty about whether a new rocket design will work, but this uncertainty can be removed with further analysis and experimentation.

At the subatomic level, uncertainty may be a fundamental and unavoidable property of the universe. In quantum mechanics, the Heisenberg uncertainty principle puts limits on how much an observer can ever know about the position and velocity of a particle. This may not just be ignorance of potentially obtainable facts but that there is no fact to be found. There is some controversy in physics as to whether such uncertainty is an irreducible property of nature or if there are "hidden variables" that would describe the state of a particle even more exactly than Heisenberg's uncertainty principle allows.

Measurements

The most commonly used procedure for calculating measurement uncertainty is described in the "Guide to the Expression of Uncertainty in Measurement" (GUM) published by ISO. A derived work is for example the National Institute of Standards and Technology (NIST) Technical Note 1297, "Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results", and the Eurachem/Citac publication "Quantifying Uncertainty in Analytical Measurement". The uncertainty of the result of a measurement generally consists of several components. The components are regarded as random variables, and may be grouped into two categories according to the method used to estimate their numerical values:

Type A, those evaluated by statistical methods
Type B, those evaluated by other means, e.g., by assigning a probability distribution

By propagating the variances of the components through a function relating the components to the measurement result, the combined measurement uncertainty is given as the square root of the resulting variance. The simplest form is the standard deviation of a repeated observation.

In metrology, physics, and engineering, the uncertainty or margin of error of a measurement, when explicitly stated, is given by a range of values likely to enclose the true value. This may be denoted by error bars on a graph, or by the following notations:

measured value ± uncertainty
measured value ^+uncertainty
_{−uncertainty}
measured value (uncertainty)

In the last notation, parentheses are the concise notation for the ± notation. For example, applying 10 1⁄2 meters in a scientific or engineering application, it could be written 10.5 m or 10.50 m, by convention meaning accurate to within one tenth of a meter, or one hundredth. The precision is symmetric around the last digit. In this case it's half a tenth up and half a tenth down, so 10.5 means between 10.45 and 10.55. Thus it is understood that 10.5 means 10.5±0.05, and 10.50 means 10.50±0.005, also written 10.50(5) and 10.500(5) respectively. But if the accuracy is within two tenths, the uncertainty is ± one tenth, and it is required to be explicit: 10.5±0.1 and 10.50±0.01 or 10.5(1) and 10.50(1). The numbers in parentheses apply to the numeral left of themselves, and are not part of that number, but part of a notation of uncertainty. They apply to the least significant digits. For instance, 1.00794(7) stands for 1.00794±0.00007, while 1.00794(72) stands for 1.00794±0.00072. This concise notation is used for example by IUPAC in stating the atomic mass of elements.

The middle notation is used when the error is not symmetrical about the value – for example 3.4+0.3
−0.2. This can occur when using a logarithmic scale, for example.

Uncertainty of a measurement can be determined by repeating a measurement to arrive at an estimate of the standard deviation of the values. Then, any single value has an uncertainty equal to the standard deviation. However, if the values are averaged, then the mean measurement value has a much smaller uncertainty, equal to the standard error of the mean, which is the standard deviation divided by the square root of the number of measurements. This procedure neglects systematic errors, however.

When the uncertainty represents the standard error of the measurement, then about 68.3% of the time, the true value of the measured quantity falls within the stated uncertainty range. For example, it is likely that for 31.7% of the atomic mass values given on the list of elements by atomic mass, the true value lies outside of the stated range. If the width of the interval is doubled, then probably only 4.6% of the true values lie outside the doubled interval, and if the width is tripled, probably only 0.3% lie outside. These values follow from the properties of the normal distribution, and they apply only if the measurement process produces normally distributed errors. In that case, the quoted standard errors are easily converted to 68.3% ("one sigma"), 95.4% ("two sigma"), or 99.7% ("three sigma") confidence intervals.

In this context, uncertainty depends on both the accuracy and precision of the measurement instrument. The lower the accuracy and precision of an instrument, the larger the measurement uncertainty is. Precision is often determined as the standard deviation of the repeated measures of a given value, namely using the same method described above to assess measurement uncertainty. However, this method is correct only when the instrument is accurate. When it is inaccurate, the uncertainty is larger than the standard deviation of the repeated measures, and it appears evident that the uncertainty does not depend only on instrumental precision.

In the media

Uncertainty in science, and science in general, may be interpreted differently in the public sphere than in the scientific community. This is due in part to the diversity of the public audience, and the tendency for scientists to misunderstand lay audiences and therefore not communicate ideas clearly and effectively. One example is explained by the information deficit model. Also, in the public realm, there are often many scientific voices giving input on a single topic. For example, depending on how an issue is reported in the public sphere, discrepancies between outcomes of multiple scientific studies due to methodological differences could be interpreted by the public as a lack of consensus in a situation where a consensus does in fact exist. This interpretation may have even been intentionally promoted, as scientific uncertainty may be managed to reach certain goals. For example, climate change deniers took the advice of Frank Luntz to frame global warming as an issue of scientific uncertainty, which was a precursor to the conflict frame used by journalists when reporting the issue.

"Indeterminacy can be loosely said to apply to situations in which not all the parameters of the system and their interactions are fully known, whereas ignorance refers to situations in which it is not known what is not known." These unknowns, indeterminacy and ignorance, that exist in science are often "transformed" into uncertainty when reported to the public in order to make issues more manageable, since scientific indeterminacy and ignorance are difficult concepts for scientists to convey without losing credibility. Conversely, uncertainty is often interpreted by the public as ignorance. The transformation of indeterminacy and ignorance into uncertainty may be related to the public's misinterpretation of uncertainty as ignorance.

Journalists may inflate uncertainty (making the science seem more uncertain than it really is) or downplay uncertainty (making the science seem more certain than it really is). One way that journalists inflate uncertainty is by describing new research that contradicts past research without providing context for the change. Journalists may give scientists with minority views equal weight as scientists with majority views, without adequately describing or explaining the state of scientific consensus on the issue. In the same vein, journalists may give non-scientists the same amount of attention and importance as scientists.

Journalists may downplay uncertainty by eliminating "scientists' carefully chosen tentative wording, and by losing these caveats the information is skewed and presented as more certain and conclusive than it really is". Also, stories with a single source or without any context of previous research mean that the subject at hand is presented as more definitive and certain than it is in reality. There is often a "product over process" approach to science journalism that aids, too, in the downplaying of uncertainty. Finally, and most notably for this investigation, when science is framed by journalists as a triumphant quest, uncertainty is erroneously framed as "reducible and resolvable".

Some media routines and organizational factors affect the overstatement of uncertainty; other media routines and organizational factors help inflate the certainty of an issue. Because the general public (in the United States) generally trusts scientists, when science stories are covered without alarm-raising cues from special interest organizations (religious groups, environmental organizations, political factions, etc.) they are often covered in a business related sense, in an economic-development frame or a social progress frame. The nature of these frames is to downplay or eliminate uncertainty, so when economic and scientific promise are focused on early in the issue cycle, as has happened with coverage of plant biotechnology and nanotechnology in the United States, the matter in question seems more definitive and certain.

Sometimes, stockholders, owners, or advertising will pressure a media organization to promote the business aspects of a scientific issue, and therefore any uncertainty claims which may compromise the business interests are downplayed or eliminated.

Applications

Uncertainty is designed into games, most notably in gambling, where chance is central to play.
In scientific modelling, in which the prediction of future events should be understood to have a range of expected values
In optimization, uncertainty permits one to describe situations where the user does not have full control on the final outcome of the optimization procedure, see scenario optimization and stochastic optimization.
- In weather forecasting, it is now commonplace to include data on the degree of uncertainty in a weather forecast.
Uncertainty or error is used in science and engineering notation. Numerical values should only have to be expressed in those digits that are physically meaningful, which are referred to as significant figures. Uncertainty is involved in every measurement, such as measuring a distance, a temperature, etc., the degree depending upon the instrument or technique used to make the measurement. Similarly, uncertainty is propagated through calculations so that the calculated value has some degree of uncertainty depending upon the uncertainties of the measured values and the equation used in the calculation.
In physics, the Heisenberg uncertainty principle forms the basis of modern quantum mechanics.
In metrology, measurement uncertainty is a central concept quantifying the dispersion one may reasonably attribute to a measurement result. Such an uncertainty can also be referred to as a measurement error. In daily life, measurement uncertainty is often implicit ("He is 6 feet tall" give or take a few inches), while for any serious use an explicit statement of the measurement uncertainty is necessary. The expected measurement uncertainty of many measuring instruments (scales, oscilloscopes, force gages, rulers, thermometers, etc.) is often stated in the manufacturers' specifications.
In engineering, uncertainty can be used in the context of validation and verification of material modeling.
Uncertainty has been a common theme in art, both as a thematic device (see, for example, the indecision of Hamlet), and as a quandary for the artist (such as Martin Creed's difficulty with deciding what artworks to make).
Uncertainty is an important factor in economics. According to economist Frank Knight, it is different from risk, where there is a specific probability assigned to each outcome (as when flipping a fair coin). Knightian uncertainty involves a situation that has unknown probabilities.
Investing in financial markets such as the stock market involves Knightian uncertainty when the probability of a rare but catastrophic event is unknown.

Philosophy

In Western philosophy the first philosopher to embrace uncertainty was Pyrrho resulting in the Hellenistic philosophies of Pyrrhonism and Academic Skepticism, the first schools of philosophical skepticism. Aporia and acatalepsy represent key concepts in ancient Greek philosophy regarding uncertainty.

Search This Blog

Monday, January 31, 2022

Information theory

Overview

Historical background

Quantities of information

Entropy of an information source

Joint entropy

Conditional entropy (equivocation)

Mutual information (transinformation)

Kullback–Leibler divergence (information gain)

Other quantities

Coding theory

Source theory

Rate

Channel capacity

Capacity of particular channel models

Channels with memory and directed information

Applications to other fields

Intelligence uses and secrecy applications

Pseudorandom number generation

Seismic exploration

Semiotics

Miscellaneous applications

Shunning

Overview

Stealth shunning

Effects

Civil rights implications

In religion

Christianity

Amish

Catholicism

Jehovah's Witnesses

Judaism

Baháʼí faith

Church of Scientology

Uncertainty

Concepts

Uncertainty

Uncertainty versus Variability

Knightian uncertainty

Measurements

In the media

Applications

Philosophy

Introduction to M-theory