Search This Blog

Monday, December 4, 2023

Inductive reasoning

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Inductive_reasoning

Inductive reasoning is a method of reasoning in which a general principle is derived from a body of observations. It consists of making broad generalizations based on specific observations. Inductive reasoning is distinct from deductive reasoning, where the conclusion of a deductive argument is certain given the premises are correct; in contrast, the truth of the conclusion of an inductive argument is probable, based upon the evidence given.

Types

The types of inductive reasoning include generalization, prediction, statistical syllogism, argument from analogy, and causal inference.

Inductive generalization

A generalization (more accurately, an inductive generalization) proceeds from premises about a sample to a conclusion about the population. The observation obtained from this sample is projected onto the broader population.

The proportion Q of the sample has attribute A.
Therefore, the proportion Q of the population has attribute A.

For example, if there are 20 balls—either black or white—in an urn, to estimate their respective numbers, a sample of four balls is drawn, three are black and one is white. An inductive generalization is that there are 15 black and five white balls in the urn.

How much the premises support the conclusion depends upon the number in the sample group, the number in the population, and the degree to which the sample represents the population (which may be achieved by taking a random sample). The greater the sample size relative to the population and the more closely the sample represents the population, the stronger the generalization is. The hasty generalization and the biased sample are generalization fallacies.

Statistical generalization

A statistical generalization is a type of inductive argument in which a conclusion about a population is inferred using a statistically-representative sample. For example:

Of a sizeable random sample of voters surveyed, 66% support Measure Z.
Therefore, approximately 66% of voters support Measure Z.

The measure is highly reliable within a well-defined margin of error provided the sample is large and random. It is readily quantifiable. Compare the preceding argument with the following. "Six of the ten people in my book club are Libertarians. Therefore, about 60% of people are Libertarians." The argument is weak because the sample is non-random and the sample size is very small.

Statistical generalizations are also called statistical projections and sample projections.

Anecdotal generalization

An anecdotal generalization is a type of inductive argument in which a conclusion about a population is inferred using a non-statistical sample. In other words, the generalization is based on anecdotal evidence. For example:

So far, this year his son's Little League team has won 6 of 10 games.
Therefore, by season's end, they will have won about 60% of the games.

This inference is less reliable (and thus more likely to commit the fallacy of hasty generalization) than a statistical generalization, first, because the sample events are non-random, and second because it is not reducible to a mathematical expression. Statistically speaking, there is simply no way to know, measure and calculate the circumstances affecting performance that will occur in the future. On a philosophical level, the argument relies on the presupposition that the operation of future events will mirror the past. In other words, it takes for granted a uniformity of nature, an unproven principle that cannot be derived from the empirical data itself. Arguments that tacitly presuppose this uniformity are sometimes called Humean after the philosopher who was first to subject them to philosophical scrutiny.

Prediction

An inductive prediction draws a conclusion about a future, current, or past instance from a sample of other instances. Like an inductive generalization, an inductive prediction relies on a data set consisting of specific instances of a phenomenon. But rather than conclude with a general statement, the inductive prediction concludes with a specific statement about the probability that a single instance will (or will not) have an attribute shared (or not shared) by the other instances.

Proportion Q of observed members of group G have had attribute A.
Therefore, there is a probability corresponding to Q that other members of group G will have attribute A when next observed.

Statistical syllogism

A statistical syllogism proceeds from a generalization about a group to a conclusion about an individual.

Proportion Q of the known instances of population P has attribute A.
Individual I is another member of P.
Therefore, there is a probability corresponding to Q that I has A.

For example:

90% of graduates from Excelsior Preparatory school go on to university.
Bob is a graduate of Excelsior Preparatory school.
Therefore, Bob will go on to university.

This is a statistical syllogism.[10] Even though one cannot be sure Bob will attend university, the exact probability of this outcome is fully assured (given no further information). Two dicto simpliciter fallacies can occur in statistical syllogisms: "accident" and "converse accident".

Argument from analogy

The process of analogical inference involves noting the shared properties of two or more things and from this basis inferring that they also share some further property:

P and Q are similar with respect to properties a, b, and c.
Object P has been observed to have further property x.
Therefore, Q probably has property x also.

Analogical reasoning is very frequent in common sense, science, philosophy, law, and the humanities, but sometimes it is accepted only as an auxiliary method. A refined approach is case-based reasoning.

Mineral A and Mineral B are both igneous rocks often containing veins of quartz and are most commonly found in South America in areas of ancient volcanic activity.
Mineral A is also a soft stone suitable for carving into jewelry.
Therefore, mineral B is probably a soft stone suitable for carving into jewelry.

This is analogical induction, according to which things alike in certain ways are more prone to be alike in other ways. This form of induction was explored in detail by philosopher John Stuart Mill in his System of Logic, where he states, "[t]here can be no doubt that every resemblance [not known to be irrelevant] affords some degree of probability, beyond what would otherwise exist, in favor of the conclusion." See Mill's Methods.

Some thinkers contend that analogical induction is a subcategory of inductive generalization because it assumes a pre-established uniformity governing events. Analogical induction requires an auxiliary examination of the relevancy of the characteristics cited as common to the pair. In the preceding example, if a premise were added stating that both stones were mentioned in the records of early Spanish explorers, this common attribute is extraneous to the stones and does not contribute to their probable affinity.

A pitfall of analogy is that features can be cherry-picked: while objects may show striking similarities, two things juxtaposed may respectively possess other characteristics not identified in the analogy that are characteristics sharply dissimilar. Thus, analogy can mislead if not all relevant comparisons are made.

Causal inference

A causal inference draws a conclusion about a causal connection based on the conditions of the occurrence of an effect. Premises about the correlation of two things can indicate a causal relationship between them, but additional factors must be confirmed to establish the exact form of the causal relationship.

Methods

The two principal methods used to reach inductive generalizations are enumerative induction and eliminative induction.

Enumerative induction

Enumerative induction is an inductive method in which a generalization is constructed based on the number of instances that support it. The more supporting instances, the stronger the conclusion.

The most basic form of enumerative induction reasons from particular instances to all instances and is thus an unrestricted generalization. If one observes 100 swans, and all 100 were white, one might infer a universal categorical proposition of the form All swans are white. As this reasoning form's premises, even if true, do not entail the conclusion's truth, this is a form of inductive inference. The conclusion might be true, and might be thought probably true, yet it can be false. Questions regarding the justification and form of enumerative inductions have been central in philosophy of science, as enumerative induction has a pivotal role in the traditional model of the scientific method.

All life forms so far discovered are composed of cells.
Therefore, all life forms are composed of cells.

This is enumerative induction, also known as simple induction or simple predictive induction. It is a subcategory of inductive generalization. In everyday practice, this is perhaps the most common form of induction. For the preceding argument, the conclusion is tempting but makes a prediction well in excess of the evidence. First, it assumes that life forms observed until now can tell us how future cases will be: an appeal to uniformity. Second, the conclusion All is a bold assertion. A single contrary instance foils the argument. And last, quantifying the level of probability in any mathematical form is problematic. By what standard do we measure our Earthly sample of known life against all (possible) life? Suppose we do discover some new organism—such as some microorganism floating in the mesosphere or an asteroid—and it is cellular. Does the addition of this corroborating evidence oblige us to raise our probability assessment for the subject proposition? It is generally deemed reasonable to answer this question "yes," and for a good many this "yes" is not only reasonable but incontrovertible. So then just how much should this new data change our probability assessment? Here, consensus melts away, and in its place arises a question about whether we can talk of probability coherently at all without numerical quantification.

All life forms so far discovered have been composed of cells.
Therefore, the next life form discovered will be composed of cells.

This is enumerative induction in its weak form. It truncates "all" to a mere single instance and, by making a far weaker claim, considerably strengthens the probability of its conclusion. Otherwise, it has the same shortcomings as the strong form: its sample population is non-random, and quantification methods are elusive.

Eliminative induction

Eliminative induction, also called variative induction, is an inductive method in which a generalization is constructed based on the variety of instances that support it. Unlike enumerative induction, eliminative induction reasons based on the various kinds of instances that support a conclusion, rather than the number of instances that support it. As the variety of instances increases, the more possible conclusions based on those instances can be identified as incompatible and eliminated. This, in turn, increases the strength of any conclusion that remains consistent with the various instances. This type of induction may use different methodologies such as quasi-experimentation, which tests and, where possible, eliminates rival hypotheses. Different evidential tests may also be employed to eliminate possibilities that are entertained.

Eliminative induction is crucial to the scientific method and is used to eliminate hypotheses that are inconsistent with observations and experiments. It focuses on possible causes instead of observed actual instances of causal connections.

History

Ancient philosophy

For a move from particular to universal, Aristotle in the 300s BCE used the Greek word epagogé, which Cicero translated into the Latin word inductio.

Aristotle and the Peripatetic School

Aristotle's Posterior Analytics covers the methods of inductive proof in natural philosophy and in the social sciences. The first book of Posterior Analytics describes the nature and science of demonstration and its elements: including definition, division, intuitive reason of first principles, particular and universal demonstration, affirmative and negative demonstration, the difference between science and opinion, etc.

Pyrrhonism

The ancient Pyrrhonists were the first Western philosophers to point out the Problem of induction: that induction cannot, according to them, justify the acceptance of universal statements as true.

Ancient medicine

The Empiric school of ancient Greek medicine employed epilogism as a method of inference. 'Epilogism' is a theory-free method that looks at history through the accumulation of facts without major generalization and with consideration of the consequences of making causal claims. Epilogism is an inference which moves entirely within the domain of visible and evident things, it tries not to invoke unobservables.

The Dogmatic school of ancient Greek medicine employed analogismos as a method of inference. This method used analogy to reason from what was observed to unobservable forces.

Early modern philosophy

In 1620, early modern philosopher Francis Bacon repudiated the value of mere experience and enumerative induction alone. His method of inductivism required that minute and many-varied observations that uncovered the natural world's structure and causal relations needed to be coupled with enumerative induction in order to have knowledge beyond the present scope of experience. Inductivism therefore required enumerative induction as a component.

David Hume

The empiricist David Hume's 1740 stance found enumerative induction to have no rational, let alone logical, basis; instead, induction was the product of instinct rather than reason, a custom of the mind and an everyday requirement to live. While observations, such as the motion of the sun, could be coupled with the principle of the uniformity of nature to produce conclusions that seemed to be certain, the problem of induction arose from the fact that the uniformity of nature was not a logically valid principle, therefore it could not be defended as deductively rational, but also could not be defended as inductively rational by appealing to the fact that the uniformity of nature has accurately described the past and therefore, will likely accurately describe the future because that is an inductive argument and therefore circular since induction is what needs to be justified.

Since Hume first wrote about the dilemma between the invalidity of deductive arguments and the circularity of inductive arguments in support of the uniformity of nature, this supposed dichotomy between merely two modes of inference, deduction and induction, has been contested with the discovery of a third mode of inference known as abduction, or abductive reasoning, which was first formulated and advanced by Charles Sanders Peirce, in 1886, where he referred to it as "reasoning by hypothesis." Inference to the best explanation is often yet arguably treated as synonymous to abduction as it was first identified by Gilbert Harman in 1965 where he referred to it as "abductive reasoning," yet his definition of abduction slightly differs from Pierce's definition. Regardless, if abduction is in fact a third mode of inference rationally independent from the other two, then either the uniformity of nature can be rationally justified through abduction, or Hume's dilemma is more of a trilemma. Hume was also skeptical of the application of enumerative induction and reason to reach certainty about unobservables and especially the inference of causality from the fact that modifying an aspect of a relationship prevents or produces a particular outcome.

Immanuel Kant

Awakened from "dogmatic slumber" by a German translation of Hume's work, Kant sought to explain the possibility of metaphysics. In 1781, Kant's Critique of Pure Reason introduced rationalism as a path toward knowledge distinct from empiricism. Kant sorted statements into two types. Analytic statements are true by virtue of the arrangement of their terms and meanings, thus analytic statements are tautologies, merely logical truths, true by necessity. Whereas synthetic statements hold meanings to refer to states of facts, contingencies. Against both rationalist philosophers like Descartes and Leibniz as well as against empiricist philosophers like Locke and Hume, Kant's Critique of Pure Reason is a sustained argument that in order to have knowledge we need both a contribution of our mind (concepts) as well as a contribution of our senses (intuitions). Knowledge proper is for Kant thus restricted to what we can possibly perceive (phenomena), whereas objects of mere thought ("things in themselves") are in principle unknowable due to the impossibility of ever perceiving them.

Reasoning that the mind must contain its own categories for organizing sense data, making experience of objects in space and time (phenomena) possible, Kant concluded that the uniformity of nature was an a priori truth. A class of synthetic statements that was not contingent but true by necessity, was then synthetic a priori. Kant thus saved both metaphysics and Newton's law of universal gravitation. On the basis of the argument that what goes beyond our knowledge is "nothing to us," he discarded scientific realism. Kant's position that knowledge comes about by a cooperation of perception and our capacity to think (transcendental idealism) gave birth to the movement of German idealism. Hegel's absolute idealism subsequently flourished across continental Europe and England.

Late modern philosophy

Positivism, developed by Henri de Saint-Simon and promulgated in the 1830s by his former student Auguste Comte, was the first late modern philosophy of science. In the aftermath of the French Revolution, fearing society's ruin, Comte opposed metaphysics. Human knowledge had evolved from religion to metaphysics to science, said Comte, which had flowed from mathematics to astronomy to physics to chemistry to biology to sociology—in that order—describing increasingly intricate domains. All of society's knowledge had become scientific, with questions of theology and of metaphysics being unanswerable. Comte found enumerative induction reliable as a consequence of its grounding in available experience. He asserted the use of science, rather than metaphysical truth, as the correct method for the improvement of human society.

According to Comte, scientific method frames predictions, confirms them, and states laws—positive statements—irrefutable by theology or by metaphysics. Regarding experience as justifying enumerative induction by demonstrating the uniformity of nature, the British philosopher John Stuart Mill welcomed Comte's positivism, but thought scientific laws susceptible to recall or revision and Mill also withheld from Comte's Religion of Humanity. Comte was confident in treating scientific law as an irrefutable foundation for all knowledge, and believed that churches, honouring eminent scientists, ought to focus public mindset on altruism—a term Comte coined—to apply science for humankind's social welfare via sociology, Comte's leading science.

During the 1830s and 1840s, while Comte and Mill were the leading philosophers of science, William Whewell found enumerative induction not nearly as convincing, and, despite the dominance of inductivism, formulated "superinduction". Whewell argued that "the peculiar import of the term Induction" should be recognised: "there is some Conception superinduced upon the facts", that is, "the Invention of a new Conception in every inductive inference". The creation of Conceptions is easily overlooked and prior to Whewell was rarely recognised. Whewell explained:

"Although we bind together facts by superinducing upon them a new Conception, this Conception, once introduced and applied, is looked upon as inseparably connected with the facts, and necessarily implied in them. Having once had the phenomena bound together in their minds in virtue of the Conception, men can no longer easily restore them back to detached and incoherent condition in which they were before they were thus combined."

These "superinduced" explanations may well be flawed, but their accuracy is suggested when they exhibit what Whewell termed consilience—that is, simultaneously predicting the inductive generalizations in multiple areas—a feat that, according to Whewell, can establish their truth. Perhaps to accommodate the prevailing view of science as inductivist method, Whewell devoted several chapters to "methods of induction" and sometimes used the phrase "logic of induction", despite the fact that induction lacks rules and cannot be trained.

In the 1870s, the originator of pragmatism, C S Peirce performed vast investigations that clarified the basis of deductive inference as a mathematical proof (as, independently, did Gottlob Frege). Peirce recognized induction but always insisted on a third type of inference that Peirce variously termed abduction or retroduction or hypothesis or presumption. Later philosophers termed Peirce's abduction, etc., Inference to the Best Explanation (IBE).

Contemporary philosophy

Bertrand Russell

Having highlighted Hume's problem of induction, John Maynard Keynes posed logical probability as its answer, or as near a solution as he could arrive at. Bertrand Russell found Keynes's Treatise on Probability the best examination of induction, and believed that if read with Jean Nicod's Le Probleme logique de l'induction as well as R B Braithwaite's review of Keynes's work in the October 1925 issue of Mind, that would cover "most of what is known about induction", although the "subject is technical and difficult, involving a good deal of mathematics". Two decades later, Russell proposed enumerative induction as an "independent logical principle". Russell found:

"Hume's skepticism rests entirely upon his rejection of the principle of induction. The principle of induction, as applied to causation, says that, if A has been found very often accompanied or followed by B, then it is probable that on the next occasion on which A is observed, it will be accompanied or followed by B. If the principle is to be adequate, a sufficient number of instances must make the probability not far short of certainty. If this principle, or any other from which it can be deduced, is true, then the casual inferences which Hume rejects are valid, not indeed as giving certainty, but as giving a sufficient probability for practical purposes. If this principle is not true, every attempt to arrive at general scientific laws from particular observations is fallacious, and Hume's skepticism is inescapable for an empiricist. The principle itself cannot, of course, without circularity, be inferred from observed uniformities, since it is required to justify any such inference. It must, therefore, be, or be deduced from, an independent principle not based on experience. To this extent, Hume has proved that pure empiricism is not a sufficient basis for science. But if this one principle is admitted, everything else can proceed in accordance with the theory that all our knowledge is based on experience. It must be granted that this is a serious departure from pure empiricism, and that those who are not empiricists may ask why, if one departure is allowed, others are forbidden. These, however, are not questions directly raised by Hume's arguments. What these arguments prove—and I do not think the proof can be controverted—is that induction is an independent logical principle, incapable of being inferred either from experience or from other logical principles, and that without this principle, science is impossible."

Gilbert Harman

In a 1965 paper, Gilbert Harman explained that enumerative induction is not an autonomous phenomenon, but is simply a disguised consequence of Inference to the Best Explanation (IBE). IBE is otherwise synonymous with C S Peirce's abduction. Many philosophers of science espousing scientific realism have maintained that IBE is the way that scientists develop approximately true scientific theories about nature.

Comparison with deductive reasoning

Argument terminology

Inductive reasoning is a form of argument that—in contrast to deductive reasoning—allows for the possibility that a conclusion can be false, even if all of the premises are true. This difference between deductive and inductive reasoning is reflected in the terminology used to describe deductive and inductive arguments. In deductive reasoning, an argument is "valid" when, assuming the argument's premises are true, the conclusion must be true. If the argument is valid and the premises are true, then the argument is "sound". In contrast, in inductive reasoning, an argument's premises can never guarantee that the conclusion must be true; therefore, inductive arguments can never be valid or sound. Instead, an argument is "strong" when, assuming the argument's premises are true, the conclusion is probably true. If the argument is strong and the premises are true, then the argument is "cogent". Less formally, an inductive argument may be called "probable", "plausible", "likely", "reasonable", or "justified", but never "certain" or "necessary". Logic affords no bridge from the probable to the certain.

The futility of attaining certainty through some critical mass of probability can be illustrated with a coin-toss exercise. Suppose someone tests whether a coin is either a fair one or two-headed. They flip the coin ten times, and ten times it comes up heads. At this point, there is a strong reason to believe it is two-headed. After all, the chance of ten heads in a row is .000976: less than one in one thousand. Then, after 100 flips, every toss has come up heads. Now there is “virtual” certainty that the coin is two-headed. Still, one can neither logically nor empirically rule out that the next toss will produce tails. No matter how many times in a row it comes up heads this remains the case. If one programmed a machine to flip a coin over and over continuously at some point the result would be a string of 100 heads. In the fullness of time, all combinations will appear.

As for the slim prospect of getting ten out of ten heads from a fair coin—the outcome that made the coin appear biased—many may be surprised to learn that the chance of any sequence of heads or tails is equally unlikely (e.g., H-H-T-T-H-T-H-H-H-T) and yet it occurs in every trial of ten tosses. That means all results for ten tosses have the same probability as getting ten out of ten heads, which is 0.000976. If one records the heads-tails sequences, for whatever result, that exact sequence had a chance of 0.000976.

An argument is deductive when the conclusion is necessary given the premises. That is, the conclusion must be true if the premises are true.

If a deductive conclusion follows duly from its premises, then it is valid; otherwise, it is invalid (that an argument is invalid is not to say it is false; it may have a true conclusion, just not on account of the premises). An examination of the following examples will show that the relationship between premises and conclusion is such that the truth of the conclusion is already implicit in the premises. Bachelors are unmarried because we say they are; we have defined them so. Socrates is mortal because we have included him in a set of beings that are mortal. The conclusion for a valid deductive argument is already contained in the premises since its truth is strictly a matter of logical relations. It cannot say more than its premises. Inductive premises, on the other hand, draw their substance from fact and evidence, and the conclusion accordingly makes a factual claim or prediction. Its reliability varies proportionally with the evidence. Induction wants to reveal something new about the world. One could say that induction wants to say more than is contained in the premises.

To better see the difference between inductive and deductive arguments, consider that it would not make sense to say: "all rectangles so far examined have four right angles, so the next one I see will have four right angles." This would treat logical relations as something factual and discoverable, and thus variable and uncertain. Likewise, speaking deductively we may permissibly say. "All unicorns can fly; I have a unicorn named Charlie; thus Charlie can fly." This deductive argument is valid because the logical relations hold; we are not interested in their factual soundness.

Inductive reasoning is inherently uncertain. It only deals with the extent to which, given the premises, the conclusion is credible according to some theory of evidence. Examples include a many-valued logic, Dempster–Shafer theory, or probability theory with rules for inference such as Bayes' rule. Unlike deductive reasoning, it does not rely on universals holding over a closed domain of discourse to draw conclusions, so it can be applicable even in cases of epistemic uncertainty (technical issues with this may arise however; for example, the second axiom of probability is a closed-world assumption).

Another crucial difference between these two types of argument is that deductive certainty is impossible in non-axiomatic systems such as reality, leaving inductive reasoning as the primary route to (probabilistic) knowledge of such systems.

Given that "if A is true then that would cause B, C, and D to be true", an example of deduction would be "A is true therefore we can deduce that B, C, and D are true". An example of induction would be "B, C, and D are observed to be true therefore A might be true". A is a reasonable explanation for B, C, and D being true.

For example:

A large enough asteroid impact would create a very large crater and cause a severe impact winter that could drive the non-avian dinosaurs to extinction.
We observe that there is a very large crater in the Gulf of Mexico dating to very near the time of the extinction of the non-avian dinosaurs.
Therefore, it is possible that this impact could explain why the non-avian dinosaurs became extinct.

Note, however, that the asteroid explanation for the mass extinction is not necessarily correct. Other events with the potential to affect global climate also coincide with the extinction of the non-avian dinosaurs. For example, the release of volcanic gases (particularly sulfur dioxide) during the formation of the Deccan Traps in India.

Another example of an inductive argument:

All biological life forms that we know of depend on liquid water to exist.
Therefore, if we discover a new biological life form, it will probably depend on liquid water to exist.

This argument could have been made every time a new biological life form was found, and would have been correct every time; however, it is still possible that in the future a biological life form not requiring liquid water could be discovered. As a result, the argument may be stated less formally as:

All biological life forms that we know of depend on liquid water to exist.
Therefore, all biological life probably depends on liquid water to exist.

A classical example of an incorrect inductive argument was presented by John Vickers:

All of the swans we have seen are white.
Therefore, we know that all swans are white.

The correct conclusion would be: we expect all swans to be white.

Succinctly put: deduction is about certainty/necessity; induction is about probability. Any single assertion will answer to one of these two criteria. Another approach to the analysis of reasoning is that of modal logic, which deals with the distinction between the necessary and the possible in a way not concerned with probabilities among things deemed possible.

The philosophical definition of inductive reasoning is more nuanced than a simple progression from particular/individual instances to broader generalizations. Rather, the premises of an inductive logical argument indicate some degree of support (inductive probability) for the conclusion but do not entail it; that is, they suggest truth but do not ensure it. In this manner, there is the possibility of moving from general statements to individual instances (for example, statistical syllogisms).

Note that the definition of inductive reasoning described here differs from mathematical induction, which, in fact, is a form of deductive reasoning. Mathematical induction is used to provide strict proofs of the properties of recursively defined sets. The deductive nature of mathematical induction derives from its basis in a non-finite number of cases, in contrast with the finite number of cases involved in an enumerative induction procedure like proof by exhaustion. Both mathematical induction and proof by exhaustion are examples of complete induction. Complete induction is a masked type of deductive reasoning.

Problem of induction

Although philosophers at least as far back as the Pyrrhonist philosopher Sextus Empiricus have pointed out the unsoundness of inductive reasoning, the classic philosophical critique of the problem of induction was given by the Scottish philosopher David Hume. Although the use of inductive reasoning demonstrates considerable success, the justification for its application has been questionable. Recognizing this, Hume highlighted the fact that our mind often draws conclusions from relatively limited experiences that appear correct but which are actually far from certain. In deduction, the truth value of the conclusion is based on the truth of the premise. In induction, however, the dependence of the conclusion on the premise is always uncertain. For example, let us assume that all ravens are black. The fact that there are numerous black ravens supports the assumption. Our assumption, however, becomes invalid once it is discovered that there are white ravens. Therefore, the general rule "all ravens are black" is not the kind of statement that can ever be certain. Hume further argued that it is impossible to justify inductive reasoning: this is because it cannot be justified deductively, so our only option is to justify it inductively. Since this argument is circular, with the help of Hume's fork he concluded that our use of induction is unjustifiable .

Hume nevertheless stated that even if induction were proved unreliable, we would still have to rely on it. So instead of a position of severe skepticism, Hume advocated a practical skepticism based on common sense, where the inevitability of induction is accepted. Bertrand Russell illustrated Hume's skepticism in a story about a chicken, fed every morning without fail, who following the laws of induction concluded that this feeding would always continue, until his throat was eventually cut by the farmer.

In 1963, Karl Popper wrote, "Induction, i.e. inference based on many observations, is a myth. It is neither a psychological fact, nor a fact of ordinary life, nor one of scientific procedure." Popper's 1972 book Objective Knowledge—whose first chapter is devoted to the problem of induction—opens, "I think I have solved a major philosophical problem: the problem of induction". In Popper's schema, enumerative induction is "a kind of optical illusion" cast by the steps of conjecture and refutation during a problem shift. An imaginative leap, the tentative solution is improvised, lacking inductive rules to guide it. The resulting, unrestricted generalization is deductive, an entailed consequence of all explanatory considerations. Controversy continued, however, with Popper's putative solution not generally accepted.

Donald A. Gillies argues that rules of inferences related to inductive reasoning are overwhelmingly absent from science, and describes most scientific inferences as "involv[ing] conjectures thought up by human ingenuity and creativity, and by no means inferred in any mechanical fashion, or according to precisely specified rules." Gillies also provides a rare counterexample "in the machine learning programs of AI."

Biases

Inductive reasoning is also known as hypothesis construction because any conclusions made are based on current knowledge and predictions. As with deductive arguments, biases can distort the proper application of inductive argument, thereby preventing the reasoner from forming the most logical conclusion based on the clues. Examples of these biases include the availability heuristic, confirmation bias, and the predictable-world bias.

The availability heuristic causes the reasoner to depend primarily upon information that is readily available. People have a tendency to rely on information that is easily accessible in the world around them. For example, in surveys, when people are asked to estimate the percentage of people who died from various causes, most respondents choose the causes that have been most prevalent in the media such as terrorism, murders, and airplane accidents, rather than causes such as disease and traffic accidents, which have been technically "less accessible" to the individual since they are not emphasized as heavily in the world around them.

Confirmation bias is based on the natural tendency to confirm rather than deny a hypothesis. Research has demonstrated that people are inclined to seek solutions to problems that are more consistent with known hypotheses rather than attempt to refute those hypotheses. Often, in experiments, subjects will ask questions that seek answers that fit established hypotheses, thus confirming these hypotheses. For example, if it is hypothesized that Sally is a sociable individual, subjects will naturally seek to confirm the premise by asking questions that would produce answers confirming that Sally is, in fact, a sociable individual.

The predictable-world bias revolves around the inclination to perceive order where it has not been proved to exist, either at all or at a particular level of abstraction. Gambling, for example, is one of the most popular examples of predictable-world bias. Gamblers often begin to think that they see simple and obvious patterns in the outcomes and therefore believe that they are able to predict outcomes based on what they have witnessed. In reality, however, the outcomes of these games are difficult to predict and highly complex in nature. In general, people tend to seek some type of simplistic order to explain or justify their beliefs and experiences, and it is often difficult for them to realise that their perceptions of order may be entirely different from the truth.

Bayesian inference

As a logic of induction rather than a theory of belief, Bayesian inference does not determine which beliefs are a priori rational, but rather determines how we should rationally change the beliefs we have when presented with evidence. We begin by committing to a prior probability for a hypothesis based on logic or previous experience and, when faced with evidence, we adjust the strength of our belief in that hypothesis in a precise manner using Bayesian logic.

Inductive inference

Around 1960, Ray Solomonoff founded the theory of universal inductive inference, a theory of prediction based on observations, for example, predicting the next symbol based upon a given series of symbols. This is a formal inductive framework that combines algorithmic information theory with the Bayesian framework. Universal inductive inference is based on solid philosophical foundations, and can be considered as a mathematically formalized Occam's razor. Fundamental ingredients of the theory are the concepts of algorithmic probability and Kolmogorov complexity.

Mitochondrial Eve

From Wikipedia, the free encyclopedia
 
Haplogroup L
Possible time of originc. 100–230 kya
Possible place of originEast Africa
Ancestorn/a
DescendantsMitochondrial macro-haplogroups L0, L1, and L5
Defining mutationsNone

In human genetics, the Mitochondrial Eve (also mt-Eve, mt-MRCA) is the matrilineal most recent common ancestor (MRCA) of all living humans. In other words, she is defined as the most recent woman from whom all living humans descend in an unbroken line purely through their mothers and through the mothers of those mothers, back until all lines converge on one woman.

In terms of mitochondrial haplogroups, the mt-MRCA is situated at the divergence of macro-haplogroup L into L0 and L1–6. As of 2013, estimates on the age of this split ranged at around 155,000 years ago, consistent with a date later than the speciation of Homo sapiens but earlier than the recent out-of-Africa dispersal.

The male analog to the "Mitochondrial Eve" is the "Y-chromosomal Adam" (or Y-MRCA), the individual from whom all living humans are patrilineally descended. As the identity of both matrilineal and patrilineal MRCAs is dependent on genealogical history (pedigree collapse), they need not have lived at the same time. As of 2015, estimates of the age of the Y-MRCA range around 200,000 to 300,000 years ago, roughly consistent with the emergence of anatomically modern humans.

The name "Mitochondrial Eve" alludes to the biblical Eve, which has led to repeated misrepresentations or misconceptions in journalistic accounts on the topic. Popular science presentations of the topic usually point out such possible misconceptions by emphasizing the fact that the position of mt-MRCA is neither fixed in time (as the position of mt-MRCA moves forward in time as mitochondrial DNA (mtDNA) lineages become extinct), nor does it refer to a "first woman", nor the only living female of her time, nor the first member of a "new species".

History

Early research

Early research using molecular clock methods was done during the late 1970s to early 1980s. Allan Wilson, Mark Stoneking, Rebecca L. Cann and Wesley Brown found that mutation in human mtDNA was unexpectedly fast, at 0.02 substitution per base (1%) in a million years, which is 5–10 times faster than in nuclear DNA. Related work allowed for an analysis of the evolutionary relationships among gorillas, chimpanzees (common chimpanzee and bonobo) and humans. With data from 21 human individuals, Brown published the first estimate on the age of the mt-MRCA at 180,000 years ago in 1980. A statistical analysis published in 1982 was taken as evidence for recent African origin (a hypothesis which at the time was competing with Asian origin of H. sapiens).

1987 publication

By 1985, data from the mtDNA of 145 women of different populations, and of two cell lines, HeLa and GM 3043, derived from an African American and a !Kung respectively, were available. After more than 40 revisions of the draft, the manuscript was submitted to Nature in late 1985 or early 1986 and published on 1 January 1987. The published conclusion was that all current human mtDNA originated from a single population from Africa, at the time dated to between 140,000 and 200,000 years ago.

The dating for "Eve" was a blow to the multiregional hypothesis, which was debated at the time, and a boost to the theory of the recent origin model.

Cann, Stoneking and Wilson did not use the term "Mitochondrial Eve" or even the name "Eve" in their original paper. It is however used by Cann in an article entitled "In Search of Eve" in the September–October 1987 issue of The Sciences. It also appears in the October 1987 article in Science by Roger Lewin, headlined "The Unmasking of Mitochondrial Eve." The biblical connotation was very clear from the start. The accompanying research news in Nature had the title "Out of the garden of Eden."

Wilson himself preferred the term "Lucky Mother" and thought the use of the name Eve "regrettable." But the concept of Eve caught on with the public and was repeated in a Newsweek cover story (11 January 1988 issue featured a depiction of Adam and Eve on the cover, with the title "The Search for Adam and Eve"), and a cover story in Time on 26 January 1987.

Criticism and later research

Shortly after the 1987 publication, criticism of its methodology and secondary conclusions was published. Both the dating of mt-Eve and the relevance of the age of the purely matrilineal descent for population replacement were subjects of controversy during the 1990s; Alan Templeton (1997) asserted that the study did "not support the hypothesis of a recent African origin for all of humanity following a split between Africans and non-Africans 100,000 years ago" and also did "not support the hypothesis of a recent global replacement of humans coming out of Africa."

Cann, Stoneking & Wilson (1987)'s placement of a relatively small population of humans in sub-Saharan Africa was consistent with the hypothesis of Cann (1982) and lent considerable support for the "recent out-of-Africa" scenario.

In 1999, Krings et al. eliminated problems in molecular clocking postulated by Nei (1992) when it was found that the mtDNA sequence for the same region was substantially different from the MRCA relative to any human sequence.

In 1997, Parsons et al. (1997) published a study of mtDNA mutation rates in a single, well-documented family (the Romanov family of Russian royalty). In this study, they calculated a mutation rate upwards of twenty times higher than previous results.

Although the original research did have analytical limitations, the estimate on the age of the mt-MRCA has proven robust. More recent age estimates have remained consistent with the 140–200 kya estimate published in 1987: A 2013 estimate dated Mitochondrial Eve to about 160 kya (within the reserved estimate of the original research) and Out of Africa II to about 95 kya. Another 2013 study (based on genome sequencing of 69 people from 9 different populations) reported the age of Mitochondrial Eve between 99 and 148 kya and that of the Y-MRCA between 120 and 156 kya.

Female and mitochondrial ancestry

Through random drift or selection the female lineage will trace back to a single female, such as Mitochondrial Eve. In this example over five generations colors represent extinct matrilineal lines and black the matrilineal line descended from mtDNA MRCA.

Without a DNA sample, it is not possible to reconstruct the complete genetic makeup (genome) of any individual who died very long ago. By analysing descendants' DNA, however, parts of ancestral genomes are estimated by scientists. Mitochondrial DNA (mtDNA, the DNA located in mitochondria, different from the one in the cells nucleus) and Y-chromosome DNA are commonly used to trace ancestry in this manner. mtDNA is generally passed un-mixed from mothers to children of both sexes, along the maternal line, or matrilineally. Matrilineal descent goes back through mothers, to their mothers, until all female lineages converge.

Branches are identified by one or more unique markers which give a mitochondrial "DNA signature" or "haplotype" (e.g. the CRS is a haplotype). Each marker is a DNA base-pair that has resulted from an SNP mutation. Scientists sort mitochondrial DNA results into more or less related groups, with more or less recent common ancestors. This leads to the construction of a DNA family tree where the branches are in biological terms clades, and the common ancestors such as Mitochondrial Eve sit at branching points in this tree. Major branches are said to define a haplogroup (e.g. CRS belongs to haplogroup H), and large branches containing several haplogroups are called "macro-haplogroups".

Simplified human mitochondrial phylogeny

The mitochondrial clade which Mitochondrial Eve defines is the species Homo sapiens sapiens itself, or at least the current population or "chronospecies" as it exists today. In principle, earlier Eves can also be defined going beyond the species, for example one who is ancestral to both modern humanity and Neanderthals, or, further back, an "Eve" ancestral to all members of genus Homo and chimpanzees in genus Pan. According to current nomenclature, Mitochondrial Eve's haplogroup was within mitochondrial haplogroup L because this macro-haplogroup contains all surviving human mitochondrial lineages today, and she must predate the emergence of L0.

The variation of mitochondrial DNA between different people can be used to estimate the time back to a common ancestor, such as Mitochondrial Eve. This works because, along any particular line of descent, mitochondrial DNA accumulates mutations at the rate of approximately one every 3,500 years per nucleotide. A certain number of these new variants will survive into modern times and be identifiable as distinct lineages. At the same time some branches, including even very old ones, come to an end when the last family in a distinct branch has no daughters.

Mitochondrial Eve is the most recent common matrilineal ancestor for all modern humans. Whenever one of the two most ancient branch lines dies out (by producing only non-matrilinear descendants at that time), the MRCA will move to a more recent female ancestor, always the most recent mother to have more than one daughter with living maternal line descendants alive today. The number of mutations that can be found distinguishing modern people is determined by two criteria: first and most obviously, the time back to her, but second and less obviously by the varying rates at which new branches have come into existence and old branches have become extinct. By looking at the number of mutations which have been accumulated in different branches of this family tree, and looking at which geographical regions have the widest range of least related branches, the region where Eve lived can be proposed.

Popular reception and misconceptions

Newsweek reported on Mitochondrial Eve based on the Cann et al. study in January 1988, under a heading of "Scientists Explore a Controversial Theory About Man's Origins". The edition sold a record number of copies.

The popular name "mitochondrial Eve", of 1980s coinage, has contributed to a number of popular misconceptions. At first, the announcement of a "mitochondrial Eve" was even greeted with endorsement from young earth creationists, who viewed the theory as a validation of the biblical creation story.

Due to such misunderstandings, authors of popular science publications since the 1990s have been emphatic in pointing out that the name is merely a popular convention, and that the mt-MRCA was not in any way the "first woman". Her position is purely the result of genealogical history of human populations later, and as matrilineal lineages die out, the position of mt-MRCA keeps moving forward to younger individuals over time.

In River Out of Eden (1995), Richard Dawkins discussed human ancestry in the context of a "river of genes", including an explanation of the concept of Mitochondrial Eve. The Seven Daughters of Eve (2002) presented the topic of human mitochondrial genetics to a general audience. The Real Eve: Modern Man's Journey Out of Africa by Stephen Oppenheimer (2003) was adapted into a Discovery Channel documentary.

Not the only woman

One common misconception surrounding Mitochondrial Eve is that since all women alive today descended in a direct unbroken female line from her, she must have been the only woman alive at the time. However, nuclear DNA studies indicate that the effective population size of the ancient human never dropped below tens of thousands. Other women living during Eve's time may have descendants alive today but not in a direct female line.

Not a fixed individual over time

The definition of Mitochondrial Eve is fixed, but the woman in prehistory who fits this definition can change. That is, not only can our knowledge of when and where Mitochondrial Eve lived change due to new discoveries, but the actual Mitochondrial Eve can change. The Mitochondrial Eve can change, when a mother-daughter line comes to an end. It follows from the definition of Mitochondrial Eve that she had at least two daughters who both have unbroken female lineages that have survived to the present day. In every generation mitochondrial lineages end – when a woman with unique mtDNA dies with no daughters. When the mitochondrial lineages of daughters of Mitochondrial Eve die out, then the title of "Mitochondrial Eve" shifts forward from the remaining daughter through her matrilineal descendants, until the first descendant is reached who had two or more daughters who together have all living humans as their matrilineal descendants. Once a lineage has died out it is irretrievably lost and this mechanism can thus only shift the title of "Mitochondrial Eve" forward in time.

Because mtDNA mapping of humans is very incomplete, the discovery of living mtDNA lines which predate our current concept of "Mitochondrial Eve" could result in the title moving to an earlier woman. This happened to her male counterpart, "Y-chromosomal Adam," when an older Y line, haplogroup A-00, was discovered.

Not necessarily a contemporary of "Y-chromosomal Adam"

Sometimes Mitochondrial Eve is assumed to have lived at the same time as Y-chromosomal Adam (from whom all living males are descended patrilineally), and perhaps even met and mated with him. Even if this were true, which is currently regarded as highly unlikely, this would only be a coincidence. Like Mitochondrial "Eve", Y-chromosomal "Adam" probably lived in Africa. A recent study (March 2013) concluded however that "Eve" lived much later than "Adam" – some 140,000 years later. (Earlier studies considered, conversely, that "Eve" lived earlier than "Adam".) More recent studies indicate that Mitochondrial Eve and Y-chromosomal Adam may indeed have lived around the same time.

Not the most recent ancestor shared by all humans

Mitochondrial Eve is the most recent common matrilineal ancestor, not the most recent common ancestor. Since the mtDNA is inherited maternally and recombination is either rare or absent, it is relatively easy to track the ancestry of the lineages back to a MRCA; however, this MRCA is valid only when discussing mitochondrial DNA. An approximate sequence from newest to oldest can list various important points in the ancestry of modern human populations:

  • The human MRCA. The time period that human MRCA lived is unknown. Rohde et. al put forth a "rough guess" that the MRCA could have existed 5000 years ago; however, the authors state that this estimate is "extremely tentative, and the model contains several obvious sources of error, as it was motivated more by considerations of theoretical insight and tractability than by realism." Just a few thousand years before the most recent single ancestor shared by all living humans was the time at which all humans who were then alive either left no descendants alive today or were common ancestors of all humans alive today. However, such a late date is difficult to reconcile with the geographical spread of our species and the consequent isolation of different groups from each other. For example, it is generally accepted that the indigenous population of Tasmania was isolated from all other humans between the rise in sea level after the last ice age some 8000 years ago and the arrival of Europeans. Estimates of the MRCA of even closely related human populations have been much more than 5000 years ago.
  • The identical ancestors point. In other words, "each present-day human has exactly the same set of genealogical ancestors" alive at the "identical ancestors point" in time. This is far more recent than when Mitochondrial Eve was proposed to have lived.
  • Mitochondrial Eve, the most recent female-line common ancestor of all living people.
  • "Y-chromosomal Adam", the most recent male-line common ancestor of all living people.

Intron

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Intron

An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word intron is derived from the term intragenic region, i.e., a region inside a gene. The term intron refers to both the DNA sequence within a gene and the corresponding RNA sequence in RNA transcripts. The non-intron sequences that become joined by this RNA processing to form the mature RNA are called exons.

Introns are found in the genes of most organisms and many viruses and they can be located in both protein-coding genes and genes that function as RNA (noncoding genes). There are four main types of introns: tRNA introns, group I introns, group II introns, and spliceosomal introns (see below). Introns are rare in Bacteria and Archaea (prokaryotes), but most eukaryotic genes contain multiple spliceosomal introns.

Discovery and etymology

Introns were first discovered in protein-coding genes of adenovirus, and were subsequently identified in genes encoding transfer RNA and ribosomal RNA genes. Introns are now known to occur within a wide variety of genes throughout organisms, bacteria, and viruses within all of the biological kingdoms.

The fact that genes were split or interrupted by introns was discovered independently in 1977 by Phillip Allen Sharp and Richard J. Roberts, for which they shared the Nobel Prize in Physiology or Medicine in 1993. The term intron was introduced by American biochemist Walter Gilbert:

"The notion of the cistron [i.e., gene] ... must be replaced by that of a transcription unit containing regions which will be lost from the mature messenger – which I suggest we call introns (for intragenic regions) – alternating with regions which will be expressed – exons." (Gilbert 1978)

The term intron also refers to intracistron, i.e., an additional piece of DNA that arises within a cistron.

Although introns are sometimes called intervening sequences, the term "intervening sequence" can refer to any of several families of internal nucleic acid sequences that are not present in the final gene product, including inteins, untranslated regions (UTR), and nucleotides removed by RNA editing, in addition to introns.

Distribution

The frequency of introns within different genomes is observed to vary widely across the spectrum of biological organisms. For example, introns are extremely common within the nuclear genome of jawed vertebrates (e.g. humans, mice, and pufferfish (fugu)), where protein-coding genes almost always contain multiple introns, while introns are rare within the nuclear genes of some eukaryotic microorganisms, for example baker's/brewer's yeast (Saccharomyces cerevisiae). In contrast, the mitochondrial genomes of vertebrates are entirely devoid of introns, while those of eukaryotic microorganisms may contain many introns.

Simple illustration of an unspliced mRNA precursor, with two introns and three exons (top). After the introns have been removed via splicing, the mature mRNA sequence is ready for translation (bottom).

A particularly extreme case is the Drosophila dhc7 gene containing a ≥3.6 megabase (Mb) intron, which takes roughly three days to transcribe. On the other extreme, a 2015 study suggests that the shortest known metazoan intron length is 30 base pairs (bp) belonging to the human MST1L gene. The shortest known introns belong to the heterotrich ciliates, such as Stentor coeruleus, in which most (> 95%) introns are 15 or 16 bp long.

Classification

Splicing of all intron-containing RNA molecules is superficially similar, as described above. However, different types of introns were identified through the examination of intron structure by DNA sequence analysis, together with genetic and biochemical analysis of RNA splicing reactions. At least four distinct classes of introns have been identified:

Group III introns are proposed to be a fifth family, but little is known about the biochemical apparatus that mediates their splicing. They appear to be related to group II introns, and possibly to spliceosomal introns.

Spliceosomal introns

Nuclear pre-mRNA introns (spliceosomal introns) are characterized by specific intron sequences located at the boundaries between introns and exons. These sequences are recognized by spliceosomal RNA molecules when the splicing reactions are initiated. In addition, they contain a branch point, a particular nucleotide sequence near the 3' end of the intron that becomes covalently linked to the 5' end of the intron during the splicing process, generating a branched (lariat) intron. Apart from these three short conserved elements, nuclear pre-mRNA intron sequences are highly variable. Nuclear pre-mRNA introns are often much longer than their surrounding exons.

tRNA introns

Transfer RNA introns that depend upon proteins for removal occur at a specific location within the anticodon loop of unspliced tRNA precursors, and are removed by a tRNA splicing endonuclease. The exons are then linked together by a second protein, the tRNA splicing ligase. Note that self-splicing introns are also sometimes found within tRNA genes.

Group I and group II introns

Group I and group II introns are found in genes encoding proteins (messenger RNA), transfer RNA and ribosomal RNA in a very wide range of living organisms. Following transcription into RNA, group I and group II introns also make extensive internal interactions that allow them to fold into a specific, complex three-dimensional architecture. These complex architectures allow some group I and group II introns to be self-splicing, that is, the intron-containing RNA molecule can rearrange its own covalent structure so as to precisely remove the intron and link the exons together in the correct order. In some cases, particular intron-binding proteins are involved in splicing, acting in such a way that they assist the intron in folding into the three-dimensional structure that is necessary for self-splicing activity. Group I and group II introns are distinguished by different sets of internal conserved sequences and folded structures, and by the fact that splicing of RNA molecules containing group II introns generates branched introns (like those of spliceosomal RNAs), while group I introns use a non-encoded guanosine nucleotide (typically GTP) to initiate splicing, adding it on to the 5'-end of the excised intron.

On the accuracy of splicing

The spliceosome is a very complex structure containing up to one hundred proteins and five different RNAs. The substrate of the reaction is a long RNA molecule and the transesterification reactions catalyzed by the spliceosome require the bringing together of sites that may be thousands of nucleotides apart. All biochemical reactions are associated with known error rates and the more complicated the reaction the higher the error rate. Therefore, it is not surprising that the splicing reaction catalyzed by the spliceosome has a significant error rate even though there are spliceosome accessory factors that suppress the accidental cleavage of cryptic splice sites.

Under ideal circumstances, the splicing reaction is likely to be 99.999% accurate (error rate of 10−5) and the correct exons will be joined and the correct intron will be deleted. However, these ideal conditions require very close matches to the best splice site sequences and the absence of any competing cryptic splice site sequences within the introns and those conditions are rarely met in large eukaryotic genes that may cover more than 40 kilobase pairs. Recent studies have shown that the actual error rate can be considerably higher than 10−5 and may be as high as 2% or 3% errors (error rate of 2 or 3 x 10−2) per gene. Additional studies suggest that the error rate is no less than 0.1% per intron. This relatively high level of splicing errors explains why most splice variants are rapidly degraded by nonsense-mediated decay.

The presence of sloppy binding sites within genes causes splicing errors and it may seem strange that these sites haven't been eliminated by natural selection. The argument for their persistence is similar to the argument for junk DNA.

Although mutations which create or disrupt binding sites may be slightly deleterious, the large number of possible such mutations makes it inevitable that some will reach fixation in a population. This is particularly relevant in species, such as humans, with relatively small long-term effective population sizes. It is plausible, then, that the human genome carries a substantial load of suboptimal sequences which cause the generation of aberrant transcript isoforms. In this study, we present direct evidence that this is indeed the case.

While the catalytic reaction may be accurate enough for effective processing most of the time, the overall error rate may be partly limited by the fidelity of transcription because transcription errors will introduce mutations that create cryptic splice sites. In addition, the transcription error rate of 10−5 – 10−6 is high enough that one in every 25,000 transcribed exons will have an incorporation error in one of the splice sites leading to a skipped intron or a skipped exon. Almost all multi-exon genes will produce incorrectly spliced transcripts but the frequency of this background noise will depend on the size of the genes, the number of introns, and the quality of the splice site sequences.

In some cases, splice variants will be produced by mutations in the gene (DNA). These can be SNP polymorphisms that create a cryptic splice site or mutate a functional site. They can also be somatic cell mutations that affect splicing in a particular tissue or a cell line. When the mutant allele is in a heterozygous state this will result in production of two abundant splice variants; one functional and one non-functional. In the homozygous state the mutant alleles may cause a genetic disease such as the hemophilia found in descendants of Queen Victoria where a mutation in one of the introns in a blood clotting factor gene creates a cryptic 3' splice site resulting in aberrant splicing. A significant fraction of human deaths by disease may be caused by mutations that interfere with normal splicing; mostly by creating cryptic splice sites.

Incorrectly spliced transcripts can easily be detected and their sequences entered into the online databases. They are usually described as "alternatively spliced" transcripts, which can be confusing because the term does not distinguish between real, biologically relevant, alternative splicing and processing noise due to splicing errors. One of the central issues in the field of alternative splicing is working out the differences between these two possibilities. Many scientists have argued that the null hypothesis should be splicing noise, putting the burden of proof on those who claim biologically relevant alternative splicing. According to those scientists, the claim of function must be accompanied by convincing evidence that multiple functional products are produced from the same gene.

Biological functions and evolution

While introns do not encode protein products, they are integral to gene expression regulation. Some introns themselves encode functional RNAs through further processing after splicing to generate noncoding RNA molecules. Alternative splicing is widely used to generate multiple proteins from a single gene. Furthermore, some introns play essential roles in a wide range of gene expression regulatory functions such as nonsense-mediated decay and mRNA export.

After the initial discovery of introns in protein-coding genes of the eukaryotic nucleus, there was significant debate as to whether introns in modern-day organisms were inherited from a common ancient ancestor (termed the introns-early hypothesis), or whether they appeared in genes rather recently in the evolutionary process (termed the introns-late hypothesis). Another theory is that the spliceosome and the intron-exon structure of genes is a relic of the RNA world (the introns-first hypothesis). There is still considerable debate about the extent to which of these hypotheses is most correct but the popular consensus at the moment is that following the formation of the first eukaryotic cell, group II introns from the bacterial endosymbiont invaded the host genome. In the beginning these self-splicing introns excised themselves from the mRNA precursor but over time some of them lost that ability and their excision had to be aided in trans by other group II introns. Eventually a number of specific trans-acting introns evolved and these became the precursors to the snRNAs of the spliceosome. The efficiency of splicing was improved by association with stabilizing proteins to form the primitive spliceosome.

Early studies of genomic DNA sequences from a wide range of organisms show that the intron-exon structure of homologous genes in different organisms can vary widely. More recent studies of entire eukaryotic genomes have now shown that the lengths and density (introns/gene) of introns varies considerably between related species. For example, while the human genome contains an average of 8.4 introns/gene (139,418 in the genome), the unicellular fungus Encephalitozoon cuniculi contains only 0.0075 introns/gene (15 introns in the genome). Since eukaryotes arose from a common ancestor (common descent), there must have been extensive gain or loss of introns during evolutionary time. This process is thought to be subject to selection, with a tendency towards intron gain in larger species due to their smaller population sizes, and the converse in smaller (particularly unicellular) species. Biological factors also influence which genes in a genome lose or accumulate introns.

Alternative splicing of exons within a gene after intron excision acts to introduce greater variability of protein sequences translated from a single gene, allowing multiple related proteins to be generated from a single gene and a single precursor mRNA transcript. The control of alternative RNA splicing is performed by a complex network of signaling molecules that respond to a wide range of intracellular and extracellular signals.

Introns contain several short sequences that are important for efficient splicing, such as acceptor and donor sites at either end of the intron as well as a branch point site, which are required for proper splicing by the spliceosome. Some introns are known to enhance the expression of the gene that they are contained in by a process known as intron-mediated enhancement (IME).

Actively transcribed regions of DNA frequently form R-loops that are vulnerable to DNA damage. In highly expressed yeast genes, introns inhibit R-loop formation and the occurrence of DNA damage. Genome-wide analysis in both yeast and humans revealed that intron-containing genes have decreased R-loop levels and decreased DNA damage compared to intronless genes of similar expression. Insertion of an intron within an R-loop prone gene can also suppress R-loop formation and recombination. Bonnet et al. (2017) speculated that the function of introns in maintaining genetic stability may explain their evolutionary maintenance at certain locations, particularly in highly expressed genes.

Starvation adaptation

The physical presence of introns promotes cellular resistance to starvation via intron enhanced repression of ribosomal protein genes of nutrient-sensing pathways.

As mobile genetic elements

Introns may be lost or gained over evolutionary time, as shown by many comparative studies of orthologous genes. Subsequent analyses have identified thousands of examples of intron loss and gain events, and it has been proposed that the emergence of eukaryotes, or the initial stages of eukaryotic evolution, involved an intron invasion. Two definitive mechanisms of intron loss, reverse transcriptase-mediated intron loss (RTMIL) and genomic deletions, have been identified, and are known to occur. The definitive mechanisms of intron gain, however, remain elusive and controversial. At least seven mechanisms of intron gain have been reported thus far: intron transposition, transposon insertion, tandem genomic duplication, intron transfer, intron gain during double-strand break repair (DSBR), insertion of a group II intron, and intronization. In theory it should be easiest to deduce the origin of recently gained introns due to the lack of host-induced mutations, yet even introns gained recently did not arise from any of the aforementioned mechanisms. These findings thus raise the question of whether or not the proposed mechanisms of intron gain fail to describe the mechanistic origin of many novel introns because they are not accurate mechanisms of intron gain, or if there are other, yet to be discovered, processes generating novel introns.

In intron transposition, the most commonly purported intron gain mechanism, a spliced intron is thought to reverse splice into either its own mRNA or another mRNA at a previously intron-less position. This intron-containing mRNA is then reverse transcribed and the resulting intron-containing cDNA may then cause intron gain via complete or partial recombination with its original genomic locus. Transposon insertions can also result in intron creation. Such an insertion could intronize the transposon without disrupting the coding sequence when a transposon inserts into the sequence AGGT, resulting in the duplication of this sequence on each side of the transposon. It is not yet understood why these elements are spliced, whether by chance, or by some preferential action by the transposon. In tandem genomic duplication, due to the similarity between consensus donor and acceptor splice sites, which both closely resemble AGGT, the tandem genomic duplication of an exonic segment harboring an AGGT sequence generates two potential splice sites. When recognized by the spliceosome, the sequence between the original and duplicated AGGT will be spliced, resulting in the creation of an intron without alteration of the coding sequence of the gene. Double-stranded break repair via non-homologous end joining was recently identified as a source of intron gain when researchers identified short direct repeats flanking 43% of gained introns in Daphnia. These numbers must be compared to the number of conserved introns flanked by repeats in other organisms, though, for statistical relevance. For group II intron insertion, the retrohoming of a group II intron into a nuclear gene was proposed to cause recent spliceosomal intron gain.

Intron transfer has been hypothesized to result in intron gain when a paralog or pseudogene gains an intron and then transfers this intron via recombination to an intron-absent location in its sister paralog. Intronization is the process by which mutations create novel introns from formerly exonic sequence. Thus, unlike other proposed mechanisms of intron gain, this mechanism does not require the insertion or generation of DNA to create a novel intron.

The only hypothesized mechanism of recent intron gain lacking any direct evidence is that of group II intron insertion, which when demonstrated in vivo, abolishes gene expression. Group II introns are therefore likely the presumed ancestors of spliceosomal introns, acting as site-specific retroelements, and are no longer responsible for intron gain. Tandem genomic duplication is the only proposed mechanism with supporting in vivo experimental evidence: a short intragenic tandem duplication can insert a novel intron into a protein-coding gene, leaving the corresponding peptide sequence unchanged. This mechanism also has extensive indirect evidence lending support to the idea that tandem genomic duplication is a prevalent mechanism for intron gain. The testing of other proposed mechanisms in vivo, particularly intron gain during DSBR, intron transfer, and intronization, is possible, although these mechanisms must be demonstrated in vivo to solidify them as actual mechanisms of intron gain. Further genomic analyses, especially when executed at the population level, may then quantify the relative contribution of each mechanism, possibly identifying species-specific biases that may shed light on varied rates of intron gain amongst different species.

Inequality (mathematics)

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Inequality...