A Medley of Potpourri

Thursday, July 27, 2023

Inductive probability

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Inductive_probability

Inductive probability attempts to give the probability of future events based on past events. It is the basis for inductive reasoning, and gives the mathematical basis for learning and the perception of patterns. It is a source of knowledge about the world.

There are three sources of knowledge: inference, communication, and deduction. Communication relays information found using other methods. Deduction establishes new facts based on existing facts. Inference establishes new facts from data. Its basis is Bayes' theorem.

Information describing the world is written in a language. For example, a simple mathematical language of propositions may be chosen. Sentences may be written down in this language as strings of characters. But in the computer it is possible to encode these sentences as strings of bits (1s and 0s). Then the language may be encoded so that the most commonly used sentences are the shortest. This internal language implicitly represents probabilities of statements.

Occam's razor says the "simplest theory, consistent with the data is most likely to be correct". The "simplest theory" is interpreted as the representation of the theory written in this internal language. The theory with the shortest encoding in this internal language is most likely to be correct.

History

Probability and statistics was focused on probability distributions and tests of significance. Probability was formal, well defined, but limited in scope. In particular its application was limited to situations that could be defined as an experiment or trial, with a well defined population.

Bayes's theorem is named after Rev. Thomas Bayes 1701–1761. Bayesian inference broadened the application of probability to many situations where a population was not well defined. But Bayes' theorem always depended on prior probabilities, to generate new probabilities. It was unclear where these prior probabilities should come from.

Ray Solomonoff developed algorithmic probability which gave an explanation for what randomness is and how patterns in the data may be represented by computer programs, that give shorter representations of the data circa 1964.

Chris Wallace and D. M. Boulton developed minimum message length circa 1968. Later Jorma Rissanen developed the minimum description length circa 1978. These methods allow information theory to be related to probability, in a way that can be compared to the application of Bayes' theorem, but which give a source and explanation for the role of prior probabilities.

Marcus Hutter combined decision theory with the work of Ray Solomonoff and Andrey Kolmogorov to give a theory for the Pareto optimal behavior for an Intelligent agent, circa 1998.

Minimum description/message length

The program with the shortest length that matches the data is the most likely to predict future data. This is the thesis behind the minimum message length and minimum description length methods.

At first sight Bayes' theorem appears different from the minimimum message/description length principle. At closer inspection it turns out to be the same. Bayes' theorem is about conditional probabilities, and states the probability that event B happens if firstly event A happens:

P (A \land B) = P (B) \cdot P (A | B) = P (A) \cdot P (B | A)

becomes in terms of message length L,

L (A \land B) = L (B) + L (A | B) = L (A) + L (B | A) .

This means that if all the information is given describing an event then the length of the information may be used to give the raw probability of the event. So if the information describing the occurrence of A is given, along with the information describing B given A, then all the information describing A and B has been given.

Overfitting

Overfitting occurs when the model matches the random noise and not the pattern in the data. For example, take the situation where a curve is fitted to a set of points. If a polynomial with many terms is fitted then it can more closely represent the data. Then the fit will be better, and the information needed to describe the deviations from the fitted curve will be smaller. Smaller information length means higher probability.

However, the information needed to describe the curve must also be considered. The total information for a curve with many terms may be greater than for a curve with fewer terms, that has not as good a fit, but needs less information to describe the polynomial.

Inference based on program complexity

Solomonoff's theory of inductive inference is also inductive inference. A bit string x is observed. Then consider all programs that generate strings starting with x. Cast in the form of inductive inference, the programs are theories that imply the observation of the bit string x.

The method used here to give probabilities for inductive inference is based on Solomonoff's theory of inductive inference.

Detecting patterns in the data

If all the bits are 1, then people infer that there is a bias in the coin and that it is more likely also that the next bit is 1 also. This is described as learning from, or detecting a pattern in the data.

Such a pattern may be represented by a computer program. A short computer program may be written that produces a series of bits which are all 1. If the length of the program K is $L (K)$ bits then its prior probability is,

P (K) = 2^{- L (K)}

The length of the shortest program that represents the string of bits is called the Kolmogorov complexity.

Kolmogorov complexity is not computable. This is related to the halting problem. When searching for the shortest program some programs may go into an infinite loop.

Considering all theories

The Greek philosopher Epicurus is quoted as saying "If more than one theory is consistent with the observations, keep all theories".

As in a crime novel all theories must be considered in determining the likely murderer, so with inductive probability all programs must be considered in determining the likely future bits arising from the stream of bits.

Programs that are already longer than n have no predictive power. The raw (or prior) probability that the pattern of bits is random (has no pattern) is $2^{- n}$ .

Each program that produces the sequence of bits, but is shorter than the n is a theory/pattern about the bits with a probability of $2^{- k}$ where k is the length of the program.

The probability of receiving a sequence of bits y after receiving a series of bits x is then the conditional probability of receiving y given x, which is the probability of x with y appended, divided by the probability of x.

Universal priors

The programming language affects the predictions of the next bit in the string. The language acts as a prior probability. This is particularly a problem where the programming language codes for numbers and other data types. Intuitively we think that 0 and 1 are simple numbers, and that prime numbers are somehow more complex than numbers that may be composite.

Using the Kolmogorov complexity gives an unbiased estimate (a universal prior) of the prior probability of a number. As a thought experiment an intelligent agent may be fitted with a data input device giving a series of numbers, after applying some transformation function to the raw numbers. Another agent might have the same input device with a different transformation function. The agents do not see or know about these transformation functions. Then there appears no rational basis for preferring one function over another. A universal prior insures that although two agents may have different initial probability distributions for the data input, the difference will be bounded by a constant.

So universal priors do not eliminate an initial bias, but they reduce and limit it. Whenever we describe an event in a language, either using a natural language or other, the language has encoded in it our prior expectations. So some reliance on prior probabilities are inevitable.

A problem arises where an intelligent agent's prior expectations interact with the environment to form a self reinforcing feed back loop. This is the problem of bias or prejudice. Universal priors reduce but do not eliminate this problem.

Universal artificial intelligence

The theory of universal artificial intelligence applies decision theory to inductive probabilities. The theory shows how the best actions to optimize a reward function may be chosen. The result is a theoretical model of intelligence.

It is a fundamental theory of intelligence, which optimizes the agents behavior in,

Exploring the environment; performing actions to get responses that broaden the agents knowledge.
Competing or co-operating with another agent; games.
Balancing short and long term rewards.

In general no agent will always provide the best actions in all situations. A particular choice made by an agent may be wrong, and the environment may provide no way for the agent to recover from an initial bad choice. However the agent is Pareto optimal in the sense that no other agent will do better than this agent in this environment, without doing worse in another environment. No other agent may, in this sense, be said to be better.

At present the theory is limited by incomputability (the halting problem). Approximations may be used to avoid this. Processing speed and combinatorial explosion remain the primary limiting factors for artificial intelligence.

Probability

Probability is the representation of uncertain or partial knowledge about the truth of statements. Probabilities are subjective and personal estimates of likely outcomes based on past experience and inferences made from the data.

This description of probability may seem strange at first. In natural language we refer to "the probability" that the sun will rise tomorrow. We do not refer to "your probability" that the sun will rise. But in order for inference to be correctly modeled probability must be personal, and the act of inference generates new posterior probabilities from prior probabilities.

Probabilities are personal because they are conditional on the knowledge of the individual. Probabilities are subjective because they always depend, to some extent, on prior probabilities assigned by the individual. Subjective should not be taken here to mean vague or undefined.

The term intelligent agent is used to refer to the holder of the probabilities. The intelligent agent may be a human or a machine. If the intelligent agent does not interact with the environment then the probability will converge over time to the frequency of the event.

If however the agent uses the probability to interact with the environment there may be a feedback, so that two agents in the identical environment starting with only slightly different priors, end up with completely different probabilities. In this case optimal decision theory as in Marcus Hutter's Universal Artificial Intelligence will give Pareto optimal performance for the agent. This means that no other intelligent agent could do better in one environment without doing worse in another environment.

Comparison to deductive probability

In deductive probability theories, probabilities are absolutes, independent of the individual making the assessment. But deductive probabilities are based on,

Shared knowledge.
Assumed facts, that should be inferred from the data.

For example, in a trial the participants are aware the outcome of all the previous history of trials. They also assume that each outcome is equally probable. Together this allows a single unconditional value of probability to be defined.

But in reality each individual does not have the same information. And in general the probability of each outcome is not equal. The dice may be loaded, and this loading needs to be inferred from the data.

Probability as estimation

The principle of indifference has played a key role in probability theory. It says that if N statements are symmetric so that one condition cannot be preferred over another then all statements are equally probable.

Taken seriously, in evaluating probability this principle leads to contradictions. Suppose there are 3 bags of gold in the distance and one is asked to select one. Then because of the distance one cannot see the bag sizes. You estimate using the principle of indifference that each bag has equal amounts of gold, and each bag has one third of the gold.

Now, while one of us is not looking, the other takes one of the bags and divide it into 3 bags. Now there are 5 bags of gold. The principle of indifference now says each bag has one fifth of the gold. A bag that was estimated to have one third of the gold is now estimated to have one fifth of the gold.

Taken as a value associated with the bag the values are different therefore contradictory. But taken as an estimate given under a particular scenario, both values are separate estimates given under different circumstances and there is no reason to believe they are equal.

Estimates of prior probabilities are particularly suspect. Estimates will be constructed that do not follow any consistent frequency distribution. For this reason prior probabilities are considered as estimates of probabilities rather than probabilities.

A full theoretical treatment would associate with each probability,

The statement
Prior knowledge
Prior probabilities
The estimation procedure used to give the probability.

Combining probability approaches

Inductive probability combines two different approaches to probability.

Probability and information
Probability and frequency

Each approach gives a slightly different viewpoint. Information theory is used in relating probabilities to quantities of information. This approach is often used in giving estimates of prior probabilities.

Frequentist probability defines probabilities as objective statements about how often an event occurs. This approach may be stretched by defining the trials to be over possible worlds. Statements about possible worlds define events.

Probability and information

Whereas logic represents only two values; true and false as the values of statement, probability associates a number in [0,1] to each statement. If the probability of a statement is 0, the statement is false. If the probability of a statement is 1 the statement is true.

In considering some data as a string of bits the prior probabilities for a sequence of 1s and 0s, the probability of 1 and 0 is equal. Therefore, each extra bit halves the probability of a sequence of bits. This leads to the conclusion that,

P (x) = 2^{- L (x)}

Where $P (x)$ is the probability of the string of bits $x$ and $L (x)$ is its length.

The prior probability of any statement is calculated from the number of bits needed to state it. See also information theory.

Combining information

Two statements $A$ and $B$ may be represented by two separate encodings. Then the length of the encoding is,

L (A \land B) = L (A) + L (B)

or in terms of probability,

P (A \land B) = P (A) P (B)

But this law is not always true because there may be a shorter method of encoding $B$ if we assume $A$ . So the above probability law applies only if $A$ and $B$ are "independent".

The internal language of information

The primary use of the information approach to probability is to provide estimates of the complexity of statements. Recall that Occam's razor states that "All things being equal, the simplest theory is the most likely to be correct". In order to apply this rule, first there needs to be a definition of what "simplest" means. Information theory defines simplest to mean having the shortest encoding.

Knowledge is represented as statements. Each statement is a Boolean expression. Expressions are encoded by a function that takes a description (as against the value) of the expression and encodes it as a bit string.

The length of the encoding of a statement gives an estimate of the probability of a statement. This probability estimate will often be used as the prior probability of a statement.

Technically this estimate is not a probability because it is not constructed from a frequency distribution. The probability estimates given by it do not always obey the law of total of probability. Applying the law of total probability to various scenarios will usually give a more accurate probability estimate of the prior probability than the estimate from the length of the statement.

Encoding expressions

An expression is constructed from sub expressions,

Constants (including function identifier).
Application of functions.
quantifiers.

A Huffman code must distinguish the 3 cases. The length of each code is based on the frequency of each type of sub expressions.

Initially constants are all assigned the same length/probability. Later constants may be assigned a probability using the Huffman code based on the number of uses of the function id in all expressions recorded so far. In using a Huffman code the goal is to estimate probabilities, not to compress the data.

The length of a function application is the length of the function identifier constant plus the sum of the sizes of the expressions for each parameter.

The length of a quantifier is the length of the expression being quantified over.

Distribution of numbers

No explicit representation of natural numbers is given. However natural numbers may be constructed by applying the successor function to 0, and then applying other arithmetic functions. A distribution of natural numbers is implied by this, based on the complexity of constructing each number.

Rational numbers are constructed by the division of natural numbers. The simplest representation has no common factors between the numerator and the denominator. This allows the probability distribution of natural numbers may be extended to rational numbers.

Probability and frequency

The probability of an event may be interpreted as the frequencies of outcomes where the statement is true divided by the total number of outcomes. If the outcomes form a continuum the frequency may need to be replaced with a measure.

Events are sets of outcomes. Statements may be related to events. A Boolean statement B about outcomes defines a set of outcomes b,

b = {x : B (x)}

Conditional probability

Each probability is always associated with the state of knowledge at a particular point in the argument. Probabilities before an inference are known as prior probabilities, and probabilities after are known as posterior probabilities.

Probability depends on the facts known. The truth of a fact limits the domain of outcomes to the outcomes consistent with the fact. Prior probabilities are the probabilities before a fact is known. Posterior probabilities are after a fact is known. The posterior probabilities are said to be conditional on the fact. the probability that $B$ is true given that $A$ is true is written as: $P (B | A) .$

All probabilities are in some sense conditional. The prior probability of $B$ is,

P (B) = P (B | ⊤)

The frequentist approach applied to possible worlds

In the frequentist approach, probabilities are defined as the ratio of the number of outcomes within an event to the total number of outcomes. In the possible world model each possible world is an outcome, and statements about possible worlds define events. The probability of a statement being true is the number of possible worlds where the statement is true divided by the total number of possible worlds. The probability of a statement $A$ being true about possible worlds is then,

P (A) = \frac{| {x : A (x)} |}{| x : ⊤ |}

For a conditional probability.

P (B | A) = \frac{| {x : A (x) \land B (x)} |}{| x : A (x) |}

then

\begin{aligned} P (A \land B) & = \frac{| {x : A (x) \land B (x)} |}{| x : ⊤ |} \\ = \frac{| {x : A (x) \land B (x)} |}{| {x : A (x)} |} \frac{| {x : A (x)} |}{| x : ⊤ |} \\ = P (A) P (B | A) \end{aligned}

Using symmetry this equation may be written out as Bayes' law.

P (A \land B) = P (A) P (B | A) = P (B) P (A | B)

This law describes the relationship between prior and posterior probabilities when new facts are learnt.

Written as quantities of information Bayes' Theorem becomes,

L (A \land B) = L (A) + L (B | A) = L (B) + L (A | B)

Two statements A and B are said to be independent if knowing the truth of A does not change the probability of B. Mathematically this is,

P (B) = P (B | A)

then Bayes' Theorem reduces to,

P (A \land B) = P (A) P (B)

The law of total of probability

For a set of mutually exclusive possibilities $A_{i}$ , the sum of the posterior probabilities must be 1.

\sum_{i} P (A_{i} | B) = 1

Substituting using Bayes' theorem gives the law of total probability

\sum_{i} P (B | A_{i}) P (A_{i}) = \sum_{i} P (A_{i} | B) P (B)

P (B) = \sum_{i} P (B | A_{i}) P (A_{i})

This result is used to give the extended form of Bayes' theorem,

P (A_{i} | B) = \frac{P (B | A_{i}) P (A_{i})}{\sum_{j} P (B | A_{j}) P (A_{j})}

This is the usual form of Bayes' theorem used in practice, because it guarantees the sum of all the posterior probabilities for $A_{i}$ is 1.

Alternate possibilities

For mutually exclusive possibilities, the probabilities add.

P (A \lor B) = P (A) + P (B), if P (A \land B) = 0

Using

A \lor B = (A \land \neg (A \land B)) \lor (B \land \neg (A \land B)) \lor (A \land B)

Then the alternatives

A \land \neg (A \land B), B \land \neg (A \land B), A \land B

are all mutually exclusive. Also,

(A \land \neg (A \land B)) \lor (A \land B) = A

P (A \land \neg (A \land B)) + P (A \land B) = P (A)

P (A \land \neg (A \land B)) = P (A) - P (A \land B)

so, putting it all together,

\begin{aligned} P (A \lor B) & = P ((A \land \neg (A \land B)) \lor (B \land \neg (A \land B)) \lor (A \land B)) \\ = P (A \land \neg (A \land B) + P (B \land \neg (A \land B)) + P (A \land B) \\ = P (A) - P (A \land B) + P (B) - P (A \land B) + P (A \land B) \\ = P (A) + P (B) - P (A \land B) \end{aligned}

Negation

As,

A \lor \neg A = ⊤

then

P (A) + P (\neg A) = 1

Implication and condition probability

Implication is related to conditional probability by the following equation,

A \to B ⟺ P (B | A) = 1

Derivation,

\begin{aligned} A \to B & ⟺ P (A \to B) = 1 \\ ⟺ P (A \land B \lor \neg A) = 1 \\ ⟺ P (A \land B) + P (\neg A) = 1 \\ ⟺ P (A \land B) = P (A) \\ ⟺ P (A) \cdot P (B | A) = P (A) \\ ⟺ P (B | A) = 1 \end{aligned}

Bayesian hypothesis testing

Bayes' theorem may be used to estimate the probability of a hypothesis or theory H, given some facts F. The posterior probability of H is then

P (H | F) = \frac{P (H) P (F | H)}{P (F)}

or in terms of information,

P (H | F) = 2^{- (L (H) + L (F | H) - L (F))}

By assuming the hypothesis is true, a simpler representation of the statement F may be given. The length of the encoding of this simpler representation is $L (F | H) .$

$L (H) + L (F | H)$ represents the amount of information needed to represent the facts F, if H is true. $L (F)$ is the amount of information needed to represent F without the hypothesis H. The difference is how much the representation of the facts has been compressed by assuming that H is true. This is the evidence that the hypothesis H is true.

If $L (F)$ is estimated from encoding length then the probability obtained will not be between 0 and 1. The value obtained is proportional to the probability, without being a good probability estimate. The number obtained is sometimes referred to as a relative probability, being how much more probable the theory is than not holding the theory.

If a full set of mutually exclusive hypothesis that provide evidence is known, a proper estimate may be given for the prior probability $P (F)$ .

Set of hypothesis

Probabilities may be calculated from the extended form of Bayes' theorem. Given all mutually exclusive hypothesis $H_{i}$ which give evidence, such that,

L (H_{i}) + L (F | H_{i}) < L (F)

and also the hypothesis R, that none of the hypothesis is true, then,

\begin{aligned} P (H_{i} | F) & = \frac{P (H_{i}) P (F | H_{i})}{P (F | R) + \sum_{j} P (H_{j}) P (F | H_{j})} \\ P (R | F) & = \frac{P (F | R)}{P (F | R) + \sum_{j} P (H_{j}) P (F | H_{j})} \end{aligned}

In terms of information,

\begin{aligned} P (H_{i} | F) & = \frac{2^{- (L (H_{i}) + L (F | H_{i}))}}{2^{- L (F | R)} + \sum_{j} 2^{- (L (H_{j}) + L (F | H_{j}))}} \\ P (R | F) & = \frac{2^{- L (F | R)}}{2^{- L (F | R)} + \sum_{j} 2^{- (L (H_{j}) + L (F | H_{j}))}} \end{aligned}

In most situations it is a good approximation to assume that $F$ is independent of $R$ , which means $P (F | R) = P (F)$ giving,

\begin{aligned} P (H_{i} | F) & \approx \frac{2^{- (L (H_{i}) + L (F | H_{i}))}}{2^{- L (F)} + \sum_{j} 2^{- (L (H_{j}) + L (F | H_{j}))}} \\ P (R | F) & \approx \frac{2^{- L (F)}}{2^{- L (F)} + \sum_{j} 2^{- (L (H_{j}) + L (F | H_{j}))}} \end{aligned}

Boolean inductive inference

Abductive inference starts with a set of facts F which is a statement (Boolean expression). Abductive reasoning is of the form,

A theory T implies the statement F. As the theory T is simpler than F, abduction says that there is a probability that the theory T is implied by F.

The theory T, also called an explanation of the condition F, is an answer to the ubiquitous factual "why" question. For example, for the condition F is "Why do apples fall?". The answer is a theory T that implies that apples fall;

F = G \frac{m_{1} m_{2}}{r^{2}}

Inductive inference is of the form,

All observed objects in a class C have a property P. Therefore there is a probability that all objects in a class C have a property P.

In terms of abductive inference, all objects in a class C or set have a property P is a theory that implies the observed condition, All observed objects in a class C have a property P.

So inductive inference is a special case of abductive inference. In common usage the term inductive inference is often used to refer to both abductive and inductive inference.

Generalization and specialization

Inductive inference is related to generalization. Generalizations may be formed from statements by replacing a specific value with membership of a category, or by replacing membership of a category with membership of a broader category. In deductive logic, generalization is a powerful method of generating new theories that may be true. In inductive inference generalization generates theories that have a probability of being true.

The opposite of generalization is specialization. Specialization is used in applying a general rule to a specific case. Specializations are created from generalizations by replacing membership of a category by a specific value, or by replacing a category with a sub category.

The Linnaen classification of living things and objects forms the basis for generalization and specification. The ability to identify, recognize and classify is the basis for generalization. Perceiving the world as a collection of objects appears to be a key aspect of human intelligence. It is the object oriented model, in the non computer science sense.

The object oriented model is constructed from our perception. In particularly vision is based on the ability to compare two images and calculate how much information is needed to morph or map one image into another. Computer vision uses this mapping to construct 3D images from stereo image pairs.

Inductive logic programming is a means of constructing theory that implies a condition. Plotkin's "relative least general generalization (rlgg)" approach constructs the simplest generalization consistent with the condition.

Newton's use of induction

Isaac Newton used inductive arguments in constructing his law of universal gravitation. Starting with the statement,

The center of an apple falls towards the center of the earth.

Generalizing by replacing apple for object, and earth for object gives, in a two body system,

The center of an object falls towards the center of another object.

The theory explains all objects falling, so there is strong evidence for it. The second observation,

The planets appear to follow an elliptical path.

After some complicated mathematical calculus, it can be seen that if the acceleration follows the inverse square law then objects will follow an ellipse. So induction gives evidence for the inverse square law.

Using Galileo's observation that all objects drop with the same speed,

F_{1} = m_{1} a_{1} = \frac{m_{1} k_{1}}{r^{2}} i_{1}

F_{2} = m_{2} a_{2} = \frac{m_{2} k_{2}}{r^{2}} i_{2}

where $i_{1}$ and $i_{2}$ vectors towards the center of the other object. Then using Newton's third law $F_{1} = - F_{2}$

F = G \frac{m_{1} m_{2}}{r^{2}}

Probabilities for inductive inference

Implication determines condition probability as,

T \to F ⟺ P (F | T) = 1

So,

P (F | T) = 1

L (F | T) = 0

This result may be used in the probabilities given for Bayesian hypothesis testing. For a single theory, H = T and,

P (T | F) = \frac{P (T)}{P (F)}

or in terms of information, the relative probability is,

P (T | F) = 2^{- (L (T) - L (F))}

Note that this estimate for P(T|F) is not a true probability. If $L (T_{i}) < L (F)$ then the theory has evidence to support it. Then for a set of theories $T_{i} = H_{i}$ , such that $L (T_{i}) < L (F)$ ,

P (T_{i} | F) = \frac{P (T_{i})}{P (F | R) + \sum_{j} P (T_{j})}

P (R | F) = \frac{P (F | R)}{P (F | R) + \sum_{j} P (T_{j})}

giving,

P (T_{i} | F) \approx \frac{2^{- L (T_{i})}}{2^{- L (F)} + \sum_{j} 2^{- L (T_{j})}}

P (R | F) \approx \frac{2^{- L (F)}}{2^{- L (F)} + \sum_{j} 2^{- L (T_{j})}}

Derivations

Derivation of inductive probability

Make a list of all the shortest programs $K_{i}$ that each produce a distinct infinite string of bits, and satisfy the relation,

T_{n} (R (K_{i})) = x

where $R (K_{i})$ is the result of running the program $K_{i}$ and $T_{n}$ truncates the string after n bits.

The problem is to calculate the probability that the source is produced by program $K_{i},$ given that the truncated source after n bits is x. This is represented by the conditional probability,

P (s = R (K_{i}) | T_{n} (s) = x)

Using the extended form of Bayes' theorem

P (s = R (K_{i}) | T_{n} (s) = x) = \frac{P (T_{n} (s) = x | s = R (K_{i})) P (s = R (K_{i}))}{\sum_{j} P (T_{n} (s) = x | s = R (K_{j})) P (s = R (K_{j}))} .

The extended form relies on the law of total probability. This means that the $s = R (K_{i})$ must be distinct possibilities, which is given by the condition that each $K_{i}$ produce a different infinite string. Also one of the conditions $s = R (K_{i})$ must be true. This must be true, as in the limit as $n \to \infty,$ there is always at least one program that produces $T_{n} (s)$ .

As $K_{i}$ are chosen so that $T_{n} (R (K_{i})) = x,$ then,

P (T_{n} (s) = x | s = R (K_{i})) = 1

The apriori probability of the string being produced from the program, given no information about the string, is based on the size of the program,

P (s = R (K_{i})) = 2^{- I (K_{i})}

giving,

P (s = R (K_{i}) | T_{n} (s) = x) = \frac{2^{- I (K_{i})}}{\sum_{j} 2^{- I (K_{j})}} .

Programs that are the same or longer than the length of x provide no predictive power. Separate them out giving,

P (s = R (K_{i}) | T_{n} (s) = x) = \frac{2^{- I (K_{i})}}{\sum_{j : I (K_{j}) < n} 2^{- I (K_{j})} + \sum_{j : I (K_{j}) ⩾ n} 2^{- I (K_{j})}} .

Then identify the two probabilities as,

P (x has pattern) = \sum_{j : I (K_{j}) < n} 2^{- I (K_{j})}

P (x is random) = \sum_{j : I (K_{j}) ⩾ n} 2^{- I (K_{j})}

But the prior probability that x is a random set of bits is $2^{- n}$ . So,

P (s = R (K_{i}) | T_{n} (s) = x) = \frac{2^{- I (K_{i})}}{2^{- n} + \sum_{j : I (K_{j}) < n} 2^{- I (K_{j})}} .

The probability that the source is random, or unpredictable is,

P (random (s) | T_{n} (s) = x) = \frac{2^{- n}}{2^{- n} + \sum_{j : I (K_{j}) < n} 2^{- I (K_{j})}} .

A model for inductive inference

A model of how worlds are constructed is used in determining the probabilities of theories,

A random bit string is selected.
A condition is constructed from the bit string.
A world is constructed that is consistent with the condition.

If w is the bit string then the world is created such that $R (w)$ is true. An intelligent agent has some facts about the word, represented by the bit string c, which gives the condition,

C = R (c)

The set of bit strings identical with any condition x is $E (x)$ .

\forall x, E (x) = {w : R (w) \equiv x}

A theory is a simpler condition that explains (or implies) C. The set of all such theories is called T,

T (C) = {t : t \to C}

Applying Bayes' theorem

extended form of Bayes' theorem may be applied

P (A_{i} | B) = \frac{P (B | A_{i}) P (A_{i})}{\sum_{j} P (B | A_{j}) P (A_{j})},

where,

B = E (C)

A_{i} = E (t)

To apply Bayes' theorem the following must hold: $A_{i}$ is a partition of the event space.

For $T (C)$ to be a partition, no bit string n may belong to two theories. To prove this assume they can and derive a contradiction,

(N \in T) \land (N \in M) \land (N \neq M) \land (n \in E (N) \land n \in E (M))

⟹ (N \neq M) \land R (n) \equiv N \land R (n) \equiv M

⟹ ⊥

Secondly prove that T includes all outcomes consistent with the condition. As all theories consistent with C are included then $R (w)$ must be in this set.

So Bayes theorem may be applied as specified giving,

\forall t \in T (C), P (E (t) | E (C)) = \frac{P (E (t)) \cdot P (E (C) | E (t))}{\sum_{j \in T (C)} P (E (j)) \cdot P (E (C) | E (j))}

Using the implication and condition probability law, the definition of $T (C)$ implies,

\forall t \in T (C), P (E (C) | E (t)) = 1

The probability of each theory in T is given by,

\forall t \in T (C), P (E (t)) = \sum_{n : R (n) \equiv t} 2^{- L (n)}

so,

\forall t \in T (C), P (E (t) | E (C)) = \frac{\sum_{n : R (n) \equiv t} 2^{- L (n)}}{\sum_{j \in T (C)} \sum_{m : R (m) \equiv j} 2^{- L (m)}}

Finally the probabilities of the events may be identified with the probabilities of the condition which the outcomes in the event satisfy,

\forall t \in T (C), P (E (t) | E (C)) = P (t | C)

giving

\forall t \in T (C), P (t | C) = \frac{\sum_{n : R (n) \equiv t} 2^{- L (n)}}{\sum_{j \in T (C)} \sum_{m : R (m) \equiv j} 2^{- L (m)}}

This is the probability of the theory t after observing that the condition C holds.

Removing theories without predictive power

Theories that are less probable than the condition C have no predictive power. Separate them out giving,

\forall t \in T (C), P (t | C) = \frac{P (E (t))}{(\sum_{j : j \in T (C) \land P (E (j)) > P (E (C))} P (E (j))) + (\sum_{j : j \in T (C) \land P (E (j)) \leq P (E (C))} P (j))}

The probability of the theories without predictive power on C is the same as the probability of C. So,

P (E (C)) = \sum_{j : j \in T (C) \land P (E (j)) \leq P (E (C))} P (j)

So the probability

\forall t \in T (C), P (t | C) = \frac{P (E (t))}{P (E (C)) + \sum_{j : j \in T (C) \land P (E (j)) > P (E (C))} P (E (j))}

and the probability of no prediction for C, written as $random (C)$ ,

P (random (C) | C) = \frac{P (E (C))}{P (E (C)) + \sum_{j : j \in T (C) \land P (E (j)) > P (E (C))} P (E (j))}

The probability of a condition was given as,

\forall t, P (E (t)) = \sum_{n : R (n) \equiv t} 2^{- L (n)}

Bit strings for theories that are more complex than the bit string given to the agent as input have no predictive power. There probabilities are better included in the random case. To implement this a new definition is given as F in,

\forall t, P (F (t, c)) = \sum_{n : R (n) \equiv t \land L (n) < L (c)} 2^{- L (n)}

Using F, an improved version of the abductive probabilities is,

\forall t \in T (C), P (t | C) = \frac{P (F (t, c))}{P (F (C, c)) + \sum_{j : j \in T (C) \land P (F (j, c)) > P (F (C, c))} P (E (j, c))}

P (random (C) | C) = \frac{P (F (C, c))}{P (F (C, c)) + \sum_{j : j \in T (C) \land P (F (j, c)) > P (F (C, c))} P (F (j, c))}

Same-sex marriage and the family

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Same-sex_marriage_and_the_family

Concerns regarding same-sex marriage and the family are at the forefront of the controversies over legalization of same-sex marriage. In the United States, an estimated 1 million to 9 million children have at least one lesbian, gay, bi, trans, intersex, or queer parent. Concern for these children and others to come are the basis for both opposition to and support for marriage for LGBT couples.

Research and positions of professional scientific organizations

Scientific research has been consistent in showing that lesbian and gay parents are as fit and capable as heterosexual parents, and their children are as psychologically healthy and well-adjusted as children reared by heterosexual parents. According to scientific literature reviews published in prestigious peer-reviewed journals and statements of mainstream professional associations, there is no evidence to the contrary. The American Psychological Association reports that some studies suggest parenting skills of gays and lesbians might be "superior." Biblarz and Stacey state that while research has found that families headed by (at least) two parents are generally best for children, outcomes of more than two parents (as in some cooperative stepfamilies, intergenerational families, and coparenting alliances among lesbians and gay men) have not yet been studied.

United States

As noted by Professor Judith Stacey of New York University: “Rarely is there as much consensus in any area of social science as in the case of gay parenting, which is why the American Academy of Pediatrics and all of the major professional organizations with expertise in child welfare have issued reports and resolutions in support of gay and lesbian parental rights”. Among these mainstream organizations in the United States are the American Psychiatric Association, the National Association of Social Workers, Child Welfare League of America, the American Bar Association, the North American Council on Adoptable Children, the American Academy of Pediatrics, the American Psychoanalytic Association and the American Academy of Family Physicians.

In 2013, the American Academy of Pediatrics stated in Pediatrics:

On the basis of this comprehensive review of the literature regarding the development and adjustment of children whose parents are the same gender, as well as the existing evidence for the legal, social, and health benefits of marriage to children, the AAP concludes that it is in the best interests of children that they be able to partake in the security of permanent nurturing and care that comes with the civil marriage of their parents, without regard to their parents’ gender or sexual orientation.

In 2006, the American Psychological Association, American Psychiatric Association and National Association of Social Workers stated in an amicus brief presented to the Supreme Court of the State of California:

Although it is sometimes asserted in policy debates that heterosexual couples are inherently better parents than same-sex couples, or that the children of lesbian or gay parents fare worse than children raised by heterosexual parents, those assertions find no support in the scientific research literature. When comparing the outcomes of different forms of parenting, it is critically important to make appropriate comparisons. For example, differences resulting from the number of parents in a household cannot be attributed to the parents’ gender or sexual orientation. Research in households with heterosexual parents generally indicates that – all else being equal – children do better with two parenting figures rather than just one. The specific research studies typically cited in this regard do not address parents’ sexual orientation, however, and therefore do not permit any conclusions to be drawn about the consequences of having heterosexual versus nonheterosexual parents, or two parents who are of the same versus different genders. Indeed, the scientific research that has directly compared outcomes for children with gay and lesbian parents with outcomes for children with heterosexual parents has been remarkably consistent in showing that lesbian and gay parents are every bit as fit and capable as heterosexual parents, and their children are as psychologically healthy and well-adjusted as children reared by heterosexual parents. Amici emphasize that the abilities of gay and lesbian persons as parents and the positive outcomes for their children are not areas where credible scientific researchers disagree. Statements by the leading associations of experts in this area reflect professional consensus that children raised by lesbian or gay parents do not differ in any important respects from those raised by heterosexual parents. No credible empirical research suggests otherwise. Allowing same-sex couples to legally marry will not have any detrimental effect on children raised in heterosexual households, but it will benefit children being raised by same-sex couples.

Peer-reviewed studies indicate that no research supports the widely held conviction that the gender of parents matters for child well-being. The methodologies used in the major studies of same-sex parenting meet the standards for research in the field of developmental psychology and psychology generally and are considered reliable by members of the respective professions. A roundup of related research on Journalist's Resource, a project of the Joan Shorenstein Center on the Press, Politics and Public Policy at Harvard's John F. Kennedy School of Government, found few if any downsides to children being raised by a same-sex couple, and some positive effects.

Canada

The Canadian Psychological Association stated in 2004 and 2006:

Beliefs that gay and lesbian adults are not fit parents, or that the psychosocial development of the children of gay and lesbian parents is compromised, have no basis in science. Our position is based on a review representing approximately 50 empirical studies and at least another 50 articles and book chapters and does not rest on the results of any one study. A review of the psychological research into the well-being of children raised by same-sex and opposite-sex parents continues to indicate that there are no reliable differences in their mental health or social adjustment and that lesbian mothers and gay fathers are not less fit as parents than are their heterosexual counterparts. The opposition to marriage of same-sex couples, on the grounds that it fails to consider the needs
or rights of children, does not consider the most relevant body of psychological research into this topic or draws inaccurate conclusions from it. Further, opposition to marriage of same-sex couples often incorrectly pre-supposes that, by preventing marriage of same-sex couples, no children will be born or raised within families where parents are of the same sex. Such as argument ignores the reality that children are, and will continue to be, born to and raised by parents who are married, those who are unmarried, those who are cohabitating, and those who are single – most of whom will be heterosexual, some of whom will be gay, and some of whom will be lesbian. Further, the literature (including the literature on which opponents to marriage of same-sex couples appear to rely) indicates that parents’ financial, psychological and physical well-being is enhanced by marriage and that children benefit from being raised by two parents within a legally-recognized union. As the CPA stated in 2003, the stressors encountered by gay and lesbian parents and their children are more likely the result of the way in which society treats them than because of any
deficiencies in fitness to parent. The CPA recognizes and appreciates that persons and institutions are entitled to their opinions and positions on this issue. However, CPA is concerned that some are mis-interpreting the findings of psychological research to support their positions, when their positions are more accurately based on other systems of belief or values.

Australia

In 2007, the Australian Psychological Society stated: "The family studies literature indicates that it is family processes (such as the quality of parenting and relationships within the family) that contribute to determining children’s wellbeing and ‘outcomes’, rather than family structures, per se, such as the number, gender, sexuality and co-habitation status of parents. The research indicates that parenting practices and children's outcomes in families parented by lesbian and gay parents are likely to be at least as favourable as those in families of heterosexual parents, despite the reality that considerable legal discrimination and inequity remain significant challenges for these families. The main reason given (by lawmakers) for not allowing people to marry the person of their choice if that person is of the same gender has been the inaccurate assertion that this is in the best interest of children, and that children ‘need’ or ‘do better’ in a family with one parent of each gender. As the reviews, statements, and recommendations written by many expert and professional bodies indicate, this assertion is not supported by the family studies research, and in fact, the promotion of this notion, and the laws and public policies that embody it, are clearly counter to the well-being of children."

Many states and territories with the exception of Northern and Western Australia have laws that allow couples to register their domestic relationships, which is called de facto relationship. There is a concern regarding the rights accorded to family members in this type of relationship. The reason for this is that there is currently no Australian law covering the rights of these families except for a few domestic partner employment benefits. Families also do not have access to a set of benefits and mechanisms provided to married couples. For instance, same-sex couples usually face the burden of proof complexities required by institutions in order to avail of their services and this complicates the lives of members in cases of interpersonal or family conflict, affecting their psychological well-being. This is demonstrated in the case of being considered a legal parent for two women who used in vitro fertilization (IVF) to have a child. The biological parent is the legal parent whereas the other need to undergo the ordeal of having to prove the existence of a relationship with the mother of the child.

There are also reports that the current debate on same-sex marriage results in the increasing discrimination against lesbian, gay, bisexual, transgender, intersex, and queer (LGBTIQ) people. In a statement by the National Mental Health Commission, it was stated that “LGBTIQ people have experienced damaging behaviour in their workplaces, communities and in social and traditional media" and that it is "alarmed about potential negative health impacts these debates are having on individuals, couples and families who face scrutiny and judgment.”

In U.S. federal and state law

In Anderson et al. v. King County, a case that challenged Washington's Defense of Marriage Act, the Washington Supreme Court ruled 5 to 4 that the law survive constitutional review. The majority concluded that the legislature had rational basis, that is, it was entitled to believe, and to act on such belief, that only allowing opposite-sex marriages "furthers procreation". In response, a group of same-sex marriage advocates filed what became Initiative 957 which, if passed, would have made procreation a legal requirement for marriage in Washington State. The Maryland Supreme Court used similar grounds to rule that it was permissible to confer the benefits of marriage only on opposite-sex couples.

In 1996, Congress passed the Defense of Marriage Act that defines marriage in Federal law as "a legal union between one man and one woman as husband and wife". Congressional record, a House Report (H.R. 104–664 at 33, 104th Congress, 2nd Session, 1996), states that procreation is key to the requirement of a valid marriage being a union and of one man and one woman.

It has been suggested that Congress acted in anticipation to legal challenges based on the Defense of Marriage Act that might rely on a dicta made in a 1965 Supreme Court ruling, Griswold v. Connecticut (381 U.S. 479) procreation is not essential to marriage:

Marriage is a coming together for better or for worse, hopefully enduring, and intimate to the degree of being sacred. It is an association that promotes a way of life, not causes; a harmony in living, not political faiths; a bilateral loyalty, not commercial or social projects. Griswold v. Connecticut

In Conaway v. Deane (2003), the Maryland Court of Appeals ruled that the State has a legitimate interest in encouraging a family structure in which children are born. The court then refrained from deciding whether this interest was served by the status quo, leaving it to the other branches to decide. The Massachusetts Supreme Court concluded in Goodridge v. Department of Public Health that even if it were the case that children fare better when raised by opposite-sex parents, the argument against same-sex marriage is unsound because the state failed to show how banning same-sex marriages discouraged gay and lesbian individuals from forming families or how restricting marriage to heterosexual couples discouraged heterosexual individuals from having nonmarital children.

In June 2005, a New Jersey state appeals court, in the decision Lewis v. Harris, upheld a state law defining marriage as the union of one man and one woman, in part, by accepting that the marriage procreation link although maybe not wise wasn't irrational. However, in 2006, the New Jersey state Supreme Court unanimously overruled that decision, requiring the state to make available to all couples in New Jersey the equal protection of family laws irrespective of the gender of the participants but not necessarily the title.

In 2003, the Arizona Court of Appeals, in a decision Standhardt v. Superior Court (77 P.3d 451, 463-464) with regards to Arizona's state marriage law, a three judge panel concluded that the petitioners had failed to prove that the State's prohibition of same sex marriage is not rationally related to a legitimate state interest holding that the State has a legitimate interest in encouraging procreation and child-rearing within the marital relationship, and that limiting marriage to the union of one man and one woman is rationally related to that interest and that even assuming that the State's reasoning for prohibiting same sex marriage was debatable, it was not 'arbitrary' or 'irrational'.

In 1971, the Supreme Court of Minnesota, in the decision Baker v. Nelson (191 N.W.2d 185^[24]), ruled the state definition survived constitutional scrutiny. The case was appareled to the US Supreme Court who refused to hear the case for want of a substantial federal question.

In the 2010 US case Perry v. Schwarzenegger, the trial judge found that "[c]hildren raised by gay or lesbian parents are as likely as children raised by heterosexual parents to be healthy, successful and well-adjusted," and that this conclusion was "accepted beyond serious debate in the field of developmental psychology."

Controversy

There is debate over the impact of same-sex marriage upon families and children.

Social conservatives and other opponents of same-sex marriage may not see marriage as a legal construct of the state, but as a naturally occurring "pre-political institution" that the state must recognize; one such conservative voice, Jennifer Roback Morse, reasons that "government does not create marriages any more than government creates jobs."The article, Marriage and the Limits of Contract, argues that the definition proposed by same-sex marriage advocates changes the social importance of marriage from its natural function of reproduction into a mere legality or freedom to have sex. Dennis Prager, in arguing that marriage should be defined exclusively as the union of one woman and one man, claims that families provide the procreative foundation that is the chief building block of civilization. The focus of the argument is that relationships between same-sex couples should not be described as "marriages," and that a rationale for this is that the putative ability to have natural offspring should be a formal requirement for a couple to be able to marry.

Opponents of same-sex marriage, including the Church of Jesus Christ of Latter-day Saints, the United States Conference of Catholic Bishops, the Southern Baptist Convention, and National Organization for Marriage, argue that children do best when raised by a mother and father, and that legalizing same-sex marriage is, therefore, contrary to the best interests of children. David Blankenhorn cites the United Nations Convention on the Rights of the Childich says that a child has "the right to know and be cared for by his or her parents," in support of this argument (before he reversed position on the issue). Some same-sex marriage opponents argue that having and raising children is the underlying purpose of marriage. The opponents of same-sex marriage assume that same-sex unions implicitly lack the everyday ability of opposite-sex couples to produce and raise offspring by natural means. They also argue that children raised by same-sex partners are disadvantaged in various ways and that same-sex unions thus cannot be recognized within the scope of "marriage." The argument that a child has the right to know and be cared for by his or her parents leaves a number of issues open to debate involving same-sex marriage, including infertile heterosexual couples or couples not wishing for children, as well as same-sex unions where a family exists with children from previous relationships, adoption, artificial insemination, surrogacy, or co-parenting. Social consequences are also heavily debated, such as whether marriage should be defined in terms of procreation.

In contrast, same-sex marriage advocates argue that by expanding marriage to gay and lesbian individuals, the state actually protects the rights of all married couples and of children raised by same-sex partners while in no way affecting the rights of opposite-sex married couples and their children, natural or adopted. Some same-sex marriage supporters also claim that the historic definition of marriage is viewed as a license to sexual intercourse and is a license to treat the wife as a possession of her husband, has already been changed by social progress. The legal equality men and women enjoy in modern marriage makes it no longer illegal to have sexual intercourse before marriage.

Some proponents of same-sex marriage argue that laws limiting civil marriage to opposite-sex couples are underinclusive because they do not prohibit marriages between sterile opposite-sex couples or to women past menopause; therefore, they take the view that the procreation argument cannot reasonably be used against same-sex marriages. Proponents also consider these laws restricting marriage to be unconstitutionally overinclusive, as gay and lesbian couples can have children either through natural or artificial means or by adoption. In 2002, in a leading Canadian same-sex marriage case encaptioned Halpern v. Canada (Attorney General), a Canada court found that "excluding gays and lesbians from marriage disregards the needs, capacities, and circumstances of same-sex spouses and their children."

NARTH and American College of Pediatricians (a religious conservative organization; not to be confused with American Academy of Pediatrics) argue that mainstream health and mental health organizations have, in many cases, taken public positions on homosexuality and same-sex marriage that are based on their own social and political views rather than the available science. The American Psychological Association, on the other hand, considers positions of NARTH unscientific, and the Canadian Psychological Association has expressed concern that "some are mis-interpreting the findings of psychological research to support their positions, when their positions are more accurately based on other systems of belief or values." Views held by the American College of Pediatricians are also contrary to views of American Academy of Pediatrics and other medical and child welfare authorities which take the view that sexual orientation has no correlation with the ability to be a good parent and raise healthy and well-adjusted children.

Stanley Kurtz of the Hoover Institution contends that same-sex marriage separates the ideas of marriage and parenthood, thereby accelerating marital decline. Kurtz points to Scandinavia as an example of such a place, though he admits that in that case, other factors have also led to the decline of marriage.

Divorce rates

Internationally, the most comprehensive study to date as it appears in the Issues in Legal Scholarship Journal is Nordic Bliss? Scandinavian Registered Partnerships and the Same-Sex Marriage Debate. It notes the effect of same-sex partnerships on opposite-sex marriage and divorce rates was conducted looking at over 15 years of data from the Scandinavian countries. The study by researcher Darren Spedale found that 15 years after Denmark had granted same-sex couples the rights of marriage, rates of opposite-sex marriage in those countries had gone up, and rates of opposite-sex divorce had gone down – contradicting the concept that same-sex marriages would have a negative effect on opposite-sex marriages.

A multi-method, multi-informant comparison of community samples of committed gay male and lesbian (30 participants each) couples with both committed (50 young engaged and 40 older married participants) and non-committed (109 exclusively dating) opposite-sex pairs was conducted in 2008. Specifically, in this study the quality of same- and opposite-sex relationships was examined at multiple levels of analysis via self-reports and partner reports, laboratory observations, and measures of physiological reactivity during dyadic interactions. Additionally, individuals in same-sex, engaged, and marital relationships were compared with one another on adult attachment security as assessed through the coherence of participants' narratives about their childhood experiences. Results indicated that individuals in committed same-sex relationships were generally not distinguishable from their committed opposite-sex counterparts.

Garbage collection (computer science)

From Wikipedia, the free encyclopedia

In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector attempts to reclaim memory which was allocated by the program, but is no longer referenced; such memory is called garbage. Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in Lisp.

Garbage collection relieves the programmer from doing manual memory management, where the programmer specifies what objects to de-allocate and return to the memory system and when to do so. Other, similar techniques include stack allocation, region inference, and memory ownership, and combinations thereof. Garbage collection may take a significant proportion of a program's total processing time, and affect performance as a result.

Resources other than memory, such as network sockets, database handles, windows, file descriptors, and device descriptors, are not typically handled by garbage collection, but rather by other methods (e.g. destructors). Some such methods de-allocate memory also.

Overview

Many programming languages require garbage collection, either as part of the language specification (e.g., RPL, Java, C#, D, Go, and most scripting languages) or effectively for practical implementation (e.g., formal languages like lambda calculus). These are said to be garbage-collected languages. Other languages, such as C and C++, were designed for use with manual memory management, but have garbage-collected implementations available. Some languages, like Ada, Modula-3, and C++/CLI, allow both garbage collection and manual memory management to co-exist in the same application by using separate heaps for collected and manually managed objects. Still others, like D, are garbage-collected but allow the user to manually delete objects or even disable garbage collection entirely when speed is required.

Although many languages integrate GC into their compiler and runtime system, post-hoc GC systems also exist, such as Automatic Reference Counting (ARC). Some of these post-hoc GC systems do not require recompilation. Post-hoc GC is sometimes called litter collection, to distinguish it from ordinary GC.

Advantages

GC frees the programmer from manually de-allocating memory. This helps avoid some kinds of errors:

Dangling pointers, which occur when a piece of memory is freed while there are still pointers to it, and one of those pointers is dereferenced. By then the memory may have been reassigned to another use, with unpredictable results.
Double free bugs, which occur when the program tries to free a region of memory that has already been freed, and perhaps already been allocated again.
Certain kinds of memory leaks, in which a program fails to free memory occupied by objects that have become unreachable, which can lead to memory exhaustion.

Disadvantages

GC uses computing resources to decide which memory to free. Therefore, the penalty for the convenience of not annotating object lifetime manually in the source code is overhead, which can impair program performance. A peer-reviewed paper from 2005 concluded that GC needs five times the memory to compensate for this overhead and to perform as fast as the same program using idealised explicit memory management. The comparison however is made to a program generated by inserting deallocation calls using an oracle, implemented by collecting traces from programs run under a profiler, and the program is only correct for one particular execution of the program. Interaction with memory hierarchy effects can make this overhead intolerable in circumstances that are hard to predict or to detect in routine testing. The impact on performance was given by Apple as a reason for not adopting garbage collection in iOS, despite it being the most desired feature.

The moment when the garbage is actually collected can be unpredictable, resulting in stalls (pauses to shift/free memory) scattered throughout a session. Unpredictable stalls can be unacceptable in real-time environments, in transaction processing, or in interactive programs. Incremental, concurrent, and real-time garbage collectors address these problems, with varying trade-offs.

Strategies

Tracing

Tracing garbage collection is the most common type of garbage collection, so much so that "garbage collection" often refers to tracing garbage collection, rather than other methods such as reference counting. The overall strategy consists of determining which objects should be garbage collected by tracing which objects are reachable by a chain of references from certain root objects, and considering the rest as garbage and collecting them. However, there are a large number of algorithms used in implementation, with widely varying complexity and performance characteristics.

Reference counting

Reference counting garbage collection is where each object has a count of the number of references to it. Garbage is identified by having a reference count of zero. An object's reference count is incremented when a reference to it is created, and decremented when a reference is destroyed. When the count reaches zero, the object's memory is reclaimed.

As with manual memory management, and unlike tracing garbage collection, reference counting guarantees that objects are destroyed as soon as their last reference is destroyed, and usually only accesses memory which is either in CPU caches, in objects to be freed, or directly pointed to by those, and thus tends to not have significant negative side effects on CPU cache and virtual memory operation.

There are a number of disadvantages to reference counting; this can generally be solved or mitigated by more sophisticated algorithms:

Cycles: If two or more objects refer to each other, they can create a cycle whereby neither will be collected as their mutual references never let their reference counts become zero. Some garbage collection systems using reference counting (like the one in CPython) use specific cycle-detecting algorithms to deal with this issue. Another strategy is to use weak references for the "backpointers" which create cycles. Under reference counting, a weak reference is similar to a weak reference under a tracing garbage collector. It is a special reference object whose existence does not increment the reference count of the referent object. Furthermore, a weak reference is safe in that when the referent object becomes garbage, any weak reference to it lapses, rather than being permitted to remain dangling, meaning that it turns into a predictable value, such as a null reference.

Space overhead (reference count): Reference counting requires space to be allocated for each object to store its reference count. The count may be stored adjacent to the object's memory or in a side table somewhere else, but in either case, every single reference-counted object requires additional storage for its reference count. Memory space with the size of an unsigned pointer is commonly used for this task, meaning that 32 or 64 bits of reference count storage must be allocated for each object. On some systems, it may be possible to mitigate this overhead by using a tagged pointer to store the reference count in unused areas of the object's memory. Often, an architecture does not actually allow programs to access the full range of memory addresses that could be stored in its native pointer size; certain number of high bits in the address is either ignored or required to be zero. If an object reliably has a pointer at a certain location, the reference count can be stored in the unused bits of the pointer. For example, each object in Objective-C has a pointer to its class at the beginning of its memory; on the ARM64 architecture using iOS 7, 19 unused bits of this class pointer are used to store the object's reference count.

Speed overhead (increment/decrement): In naive implementations, each assignment of a reference and each reference falling out of scope often require modifications of one or more reference counters. However, in a common case when a reference is copied from an outer scope variable into an inner scope variable, such that the lifetime of the inner variable is bounded by the lifetime of the outer one, the reference incrementing can be eliminated. The outer variable "owns" the reference. In the programming language C++, this technique is readily implemented and demonstrated with the use of const references. Reference counting in C++ is usually implemented using "smart pointers" whose constructors, destructors and assignment operators manage the references. A smart pointer can be passed by reference to a function, which avoids the need to copy-construct a new smart pointer (which would increase the reference count on entry into the function and decrease it on exit). Instead the function receives a reference to the smart pointer which is produced inexpensively. The Deutsch-Bobrow method of reference counting capitalizes on the fact that most reference count updates are in fact generated by references stored in local variables. It ignores these references, only counting references in the heap, but before an object with reference count zero can be deleted, the system must verify with a scan of the stack and registers that no other reference to it still exists. A further substantial decrease in the overhead on counter updates can be obtained by update coalescing introduced by Levanoni and Petrank. Consider a pointer that in a given interval of the execution is updated several times. It first points to an object O1, then to an object O2, and so forth until at the end of the interval it points to some object On. A reference counting algorithm would typically execute rc(O1)--, rc(O2)++, rc(O2)--, rc(O3)++, rc(O3)--, ..., rc(On)++. But most of these updates are redundant. In order to have the reference count properly evaluated at the end of the interval it is enough to perform rc(O1)-- and rc(On)++. Levanoni and Petrank measured an elimination of more than 99% of the counter updates in typical Java benchmarks.

Requires atomicity: When used in a multithreaded environment, these modifications (increment and decrement) may need to be atomic operations such as compare-and-swap, at least for any objects which are shared, or potentially shared among multiple threads. Atomic operations are expensive on a multiprocessor, and even more expensive if they have to be emulated with software algorithms. It is possible to avoid this issue by adding per-thread or per-CPU reference counts and only accessing the global reference count when the local reference counts become or are no longer zero (or, alternatively, using a binary tree of reference counts, or even giving up deterministic destruction in exchange for not having a global reference count at all), but this adds significant memory overhead and thus tends to be only useful in special cases (it is used, for example, in the reference counting of Linux kernel modules). Update coalescing by Levanoni and Petrank can be used to eliminate all atomic operations from the write-barrier. Counters are never updated by the program threads in the course of program execution. They are only modified by the collector which executes as a single additional thread with no synchronization. This method can be used as a stop-the-world mechanism for parallel programs, and also with a concurrent reference counting collector.

Not real-time: Naive implementations of reference counting do not generally provide real-time behavior, because any pointer assignment can potentially cause a number of objects bounded only by total allocated memory size to be recursively freed while the thread is unable to perform other work. It is possible to avoid this issue by delegating the freeing of unreferenced objects to other threads, at the cost of extra overhead.

Escape analysis

Escape analysis is a compile-time technique that can convert heap allocations to stack allocations, thereby reducing the amount of garbage collection to be done. This analysis determines whether an object allocated inside a function is accessible outside of it. If a function-local allocation is found to be accessible to another function or thread, the allocation is said to "escape" and cannot be done on the stack. Otherwise, the object may be allocated directly on the stack and released when the function returns, bypassing the heap and associated memory management costs.

Availability

Generally speaking, higher-level programming languages are more likely to have garbage collection as a standard feature. In some languages lacking built in garbage collection, it can be added through a library, as with the Boehm garbage collector for C and C++.

Most functional programming languages, such as ML, Haskell, and APL, have garbage collection built in. Lisp is especially notable as both the first functional programming language and the first language to introduce garbage collection.

Other dynamic languages, such as Ruby and Julia (but not Perl 5 or PHP before version 5.3, which both use reference counting), JavaScript and ECMAScript also tend to use GC. Object-oriented programming languages such as Smalltalk, RPL and Java usually provide integrated garbage collection. Notable exceptions are C++ and Delphi, which have destructors.

BASIC

BASIC and Logo have often used garbage collection for variable-length data types, such as strings and lists, so as not to burden programmers with memory management details. On the Altair 8800, programs with many string variables and little string space could cause long pauses due to garbage collection. Similarly the Applesoft BASIC interpreter's garbage collection algorithm repeatedly scans the string descriptors for the string having the highest address in order to compact it toward high memory, resulting in $O (n^{2})$ performance and pauses anywhere from a few seconds to a few minutes. A replacement garbage collector for Applesoft BASIC by Randy Wigginton identifies a group of strings in every pass over the heap, reducing collection time dramatically. BASIC.System, released with ProDOS in 1983, provides a windowing garbage collector for BASIC that is many times faster.

Objective-C

While the Objective-C traditionally had no garbage collection, with the release of OS X 10.5 in 2007 Apple introduced garbage collection for Objective-C 2.0, using an in-house developed runtime collector. However, with the 2012 release of OS X 10.8, garbage collection was deprecated in favor of LLVM's automatic reference counter (ARC) that was introduced with OS X 10.7. Furthermore, since May 2015 Apple even forbids the usage of garbage collection for new OS X applications in the App Store. For iOS, garbage collection has never been introduced due to problems in application responsivity and performance; instead, iOS uses ARC.

Limited environments

Garbage collection is rarely used on embedded or real-time systems because of the usual need for very tight control over the use of limited resources. However, garbage collectors compatible with many limited environments have been developed. The Microsoft .NET Micro Framework, .NET nanoFramework and Java Platform, Micro Edition are embedded software platforms that, like their larger cousins, include garbage collection.

Java

Garbage collectors available in Java JDKs include:

G1
Parallel
Concurrent mark sweep collector (CMS)
Serial
C4 (Continuously Concurrent Compacting Collector)
Shenandoah
ZGC

Compile-time use

Compile-time garbage collection is a form of static analysis allowing memory to be reused and reclaimed based on invariants known during compilation.

This form of garbage collection has been studied in the Mercury programming language, and it saw greater usage with the introduction of LLVM's automatic reference counter (ARC) into Apple's ecosystem (iOS and OS X) in 2011.

Real-time systems

Incremental, concurrent, and real-time garbage collectors have been developed, for example by Henry Baker and by Henry Lieberman.

In Baker's algorithm, the allocation is done in either half of a single region of memory. When it becomes half full, a garbage collection is performed which moves the live objects into the other half and the remaining objects are implicitly deallocated. The running program (the 'mutator') has to check that any object it references is in the correct half, and if not move it across, while a background task is finding all of the objects.

Generational garbage collection schemes are based on the empirical observation that most objects die young. In generational garbage collection two or more allocation regions (generations) are kept, which are kept separate based on object's age. New objects are created in the "young" generation that is regularly collected, and when a generation is full, the objects that are still referenced from older regions are copied into the next oldest generation. Occasionally a full scan is performed.

Some high-level language computer architectures include hardware support for real-time garbage collection.

Most implementations of real-time garbage collectors use tracing. Such real-time garbage collectors meet hard real-time constraints when used with a real-time operating system.

Search This Blog