A Medley of Potpourri: Sep 25, 2023

Monday, September 25, 2023

Maximum parsimony (phylogenetics)

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Maximum_parsimony_(phylogenetics)

In phylogenetics, maximum parsimony is an optimality criterion under which the phylogenetic tree that minimizes the total number of character-state changes (or minimizes the cost of differentially weighted character-state changes). Under the maximum-parsimony criterion, the optimal tree will minimize the amount of homoplasy (i.e., convergent evolution, parallel evolution, and evolutionary reversals). In other words, under this criterion, the shortest possible tree that explains the data is considered best. Some of the basic ideas behind maximum parsimony were presented by James S. Farris in 1970 and Walter M. Fitch in 1971.

Maximum parsimony is an intuitive and simple criterion, and it is popular for this reason. However, although it is easy to score a phylogenetic tree (by counting the number of character-state changes), there is no algorithm to quickly generate the most-parsimonious tree. Instead, the most-parsimonious tree must be sought in "tree space" (i.e., amongst all possible trees). For a small number of taxa (i.e., fewer than nine) it is possible to do an exhaustive search, in which every possible tree is scored, and the best one is selected. For nine to twenty taxa, it will generally be preferable to use branch-and-bound, which is also guaranteed to return the best tree. For greater numbers of taxa, a heuristic search must be performed.

Because the most-parsimonious tree is always the shortest possible tree, this means that—in comparison to a hypothetical "true" tree that actually describes the unknown evolutionary history of the organisms under study—the "best" tree according to the maximum-parsimony criterion will often underestimate the actual evolutionary change that could have occurred. In addition, maximum parsimony is not statistically consistent. That is, it is not guaranteed to produce the true tree with high probability, given sufficient data. As demonstrated in 1978 by Joe Felsenstein, maximum parsimony can be inconsistent under certain conditions, such as long-branch attraction. Of course, any phylogenetic algorithm could also be statistically inconsistent if the model it employs to estimate the preferred tree does not accurately match the way that evolution occurred in that clade. This is unknowable. Therefore, while statistical consistency is an interesting theoretical property, it lies outside the realm of testability, and is irrelevant to empirical phylogenetic studies.

Alternate characterization and rationale

In phylogenetics, parsimony is mostly interpreted as favoring the trees that minimize the amount of evolutionary change required (see for example ). Alternatively, phylogenetic parsimony can be characterized as favoring the trees that maximize explanatory power by minimizing the number of observed similarities that cannot be explained by inheritance and common descent. Minimization of required evolutionary change on the one hand and maximization of observed similarities that can be explained as homology on the other may result in different preferred trees when some observed features are not applicable in some groups that are included in the tree, and the latter can be seen as the more general approach.

While evolution is not an inherently parsimonious process, centuries of scientific experience lend support to the aforementioned principle of parsimony (Occam's razor). Namely, the supposition of a simpler, more parsimonious chain of events is preferable to the supposition of a more complicated, less parsimonious chain of events. Hence, parsimony (sensu lato) is typically sought in inferring phylogenetic trees, and in scientific explanation generally.

In detail

Parsimony is part of a class of character-based tree estimation methods which use a matrix of discrete phylogenetic characters and character states to infer one or more optimal phylogenetic trees for a set of taxa, commonly a set of species or reproductively isolated populations of a single species. These methods operate by evaluating candidate phylogenetic trees according to an explicit optimality criterion; the tree with the most favorable score is taken as the best hypothesis of the phylogenetic relationships of the included taxa. Maximum parsimony is used with most kinds of phylogenetic data; until recently, it was the only widely used character-based tree estimation method used for morphological data.

Inferring phylogenies is not a trivial problem. A huge number of possible phylogenetic trees exist for any reasonably sized set of taxa; for example, a mere ten species gives over two million possible unrooted trees. These possibilities must be searched to find a tree that best fits the data according to the optimality criterion. However, the data themselves do not lead to a simple, arithmetic solution to the problem. Ideally, we would expect the distribution of whatever evolutionary characters (such as phenotypic traits or alleles) to directly follow the branching pattern of evolution. Thus we could say that if two organisms possess a shared character, they should be more closely related to each other than to a third organism that lacks this character (provided that character was not present in the last common ancestor of all three, in which case it would be a symplesiomorphy). We would predict that bats and monkeys are more closely related to each other than either is to an elephant, because male bats and monkeys possess external testicles, which elephants lack. However, we cannot say that bats and monkeys are more closely related to one another than they are to whales, though the two have external testicles absent in whales, because we believe that the males in the last common ancestral species of the three had external testicles.

However, the phenomena of convergent evolution, parallel evolution, and evolutionary reversals (collectively termed homoplasy) add an unpleasant wrinkle to the problem of inferring phylogeny. For a number of reasons, two organisms can possess a trait inferred to have not been present in their last common ancestor: If we naively took the presence of this trait as evidence of a relationship, we would infer an incorrect tree. Empirical phylogenetic data may include substantial homoplasy, with different parts of the data suggesting sometimes very different relationships. Methods used to estimate phylogenetic trees are explicitly intended to resolve the conflict within the data by picking the phylogenetic tree that is the best fit to all the data overall, accepting that some data simply will not fit. It is often mistakenly believed that parsimony assumes that convergence is rare; in fact, even convergently derived characters have some value in maximum-parsimony-based phylogenetic analyses, and the prevalence of convergence does not systematically affect the outcome of parsimony-based methods.

Data that do not fit a tree perfectly are not simply "noise", they can contain relevant phylogenetic signal in some parts of a tree, even if they conflict with the tree overall. In the whale example given above, the lack of external testicles in whales is homoplastic: It reflects a return to the condition inferred to have been present in ancient ancestors of mammals, whose testicles were internal. This inferred similarity between whales and ancient mammal ancestors is in conflict with the tree we accept based on the weight of other characters, since it implies that the mammals with external testicles should form a group excluding whales. However, among the whales, the reversal to internal testicles actually correctly associates the various types of whales (including dolphins and porpoises) into the group Cetacea. Still, the determination of the best-fitting tree—and thus which data do not fit the tree—is a complex process. Maximum parsimony is one method developed to do this.

Character data

The input data used in a maximum parsimony analysis is in the form of "characters" for a range of taxa. There is no generally agreed-upon definition of a phylogenetic character, but operationally a character can be thought of as an attribute, an axis along which taxa are observed to vary. These attributes can be physical (morphological), molecular, genetic, physiological, or behavioral. The only widespread agreement on characters seems to be that variation used for character analysis should reflect heritable variation. Whether it must be directly heritable, or whether indirect inheritance (e.g., learned behaviors) is acceptable, is not entirely resolved.

Each character is divided into discrete character states, into which the variations observed are classified. Character states are often formulated as descriptors, describing the condition of the character substrate. For example, the character "eye color" might have the states "blue" and "brown." Characters can have two or more states (they can have only one, but these characters lend nothing to a maximum parsimony analysis, and are often excluded).

Coding characters for phylogenetic analysis is not an exact science, and there are numerous complicating issues. Typically, taxa are scored with the same state if they are more similar to one another in that particular attribute than each is to taxa scored with a different state. This is not straightforward when character states are not clearly delineated or when they fail to capture all of the possible variation in a character. How would one score the previously mentioned character for a taxon (or individual) with hazel eyes? Or green? As noted above, character coding is generally based on similarity: Hazel and green eyes might be lumped with blue because they are more similar to that color (being light), and the character could be then recoded as "eye color: light; dark." Alternatively, there can be multi-state characters, such as "eye color: brown; hazel, blue; green."

Ambiguities in character state delineation and scoring can be a major source of confusion, dispute, and error in phylogenetic analysis using character data. Note that, in the above example, "eyes: present; absent" is also a possible character, which creates issues because "eye color" is not applicable if eyes are not present. For such situations, a "?" ("unknown") is scored, although sometimes "X" or "-" (the latter usually in sequence data) are used to distinguish cases where a character cannot be scored from a case where the state is simply unknown. Current implementations of maximum parsimony generally treat unknown values in the same manner: the reasons the data are unknown have no particular effect on analysis. Effectively, the program treats a ? as if it held the state that would involve the fewest extra steps in the tree (see below), although this is not an explicit step in the algorithm.

Genetic data are particularly amenable to character-based phylogenetic methods such as maximum parsimony because protein and nucleotide sequences are naturally discrete: A particular position in a nucleotide sequence can be either adenine, cytosine, guanine, or thymine / uracil, or a sequence gap; a position (residue) in a protein sequence will be one of the basic amino acids or a sequence gap. Thus, character scoring is rarely ambiguous, except in cases where sequencing methods fail to produce a definitive assignment for a particular sequence position. Sequence gaps are sometimes treated as characters, although there is no consensus on how they should be coded.

Characters can be treated as unordered or ordered. For a binary (two-state) character, this makes little difference. For a multi-state character, unordered characters can be thought of as having an equal "cost" (in terms of number of "evolutionary events") to change from any one state to any other; complementarily, they do not require passing through intermediate states. Ordered characters have a particular sequence in which the states must occur through evolution, such that going between some states requires passing through an intermediate. This can be thought of complementarily as having different costs to pass between different pairs of states. In the eye-color example above, it is possible to leave it unordered, which imposes the same evolutionary "cost" to go from brown-blue, green-blue, green-hazel, etc. Alternatively, it could be ordered brown-hazel-green-blue; this would normally imply that it would cost two evolutionary events to go from brown-green, three from brown-blue, but only one from brown-hazel. This can also be thought of as requiring eyes to evolve through a "hazel stage" to get from brown to green, and a "green stage" to get from hazel to blue, etc. For many characters, it is not obvious if and how they should be ordered. On the contrary, for characters that represent discretization of an underlying continuous variable, like shape, size, and ratio characters, ordering is logical, and simulations have shown that this improves ability to recover correct clades, while decreasing the recovering of erroneous clades.

There is a lively debate on the utility and appropriateness of character ordering, but no consensus. Some authorities order characters when there is a clear logical, ontogenetic, or evolutionary transition among the states (for example, "legs: short; medium; long"). Some accept only some of these criteria. Some run an unordered analysis, and order characters that show a clear order of transition in the resulting tree (which practice might be accused of circular reasoning). Some authorities refuse to order characters at all, suggesting that it biases an analysis to require evolutionary transitions to follow a particular path.

It is also possible to apply differential weighting to individual characters. This is usually done relative to a "cost" of 1. Thus, some characters might be seen as more likely to reflect the true evolutionary relationships among taxa, and thus they might be weighted at a value 2 or more; changes in these characters would then count as two evolutionary "steps" rather than one when calculating tree scores (see below). There has been much discussion in the past about character weighting. Most authorities now weight all characters equally, although exceptions are common. For example, allele frequency data is sometimes pooled in bins and scored as an ordered character. In these cases, the character itself is often downweighted so that small changes in allele frequencies count less than major changes in other characters. Also, the third codon position in a coding nucleotide sequence is particularly labile, and is sometimes downweighted, or given a weight of 0, on the assumption that it is more likely to exhibit homoplasy. In some cases, repeated analyses are run, with characters reweighted in inverse proportion to the degree of homoplasy discovered in the previous analysis (termed successive weighting); this is another technique that might be considered circular reasoning.

Character state changes can also be weighted individually. This is often done for nucleotide sequence data; it has been empirically determined that certain base changes (A-C, A-T, G-C, G-T, and the reverse changes) occur much less often than others (A-G, C-T, and their reverse changes). These changes are therefore often weighted more. As shown above in the discussion of character ordering, ordered characters can be thought of as a form of character state weighting.

Some systematists prefer to exclude characters known to be, or suspected to be, highly homoplastic or that have a large number of unknown entries ("?"). As noted below, theoretical and simulation work has demonstrated that this is likely to sacrifice accuracy rather than improve it. This is also the case with characters that are variable in the terminal taxa: theoretical, congruence, and simulation studies have all demonstrated that such polymorphic characters contain significant phylogenetic information.

Taxon sampling

The time required for a parsimony analysis (or any phylogenetic analysis) is proportional to the number of taxa (and characters) included in the analysis. Also, because more taxa require more branches to be estimated, more uncertainty may be expected in large analyses. Because data collection costs in time and money often scale directly with the number of taxa included, most analyses include only a fraction of the taxa that could have been sampled. Indeed, some authors have contended that four taxa (the minimum required to produce a meaningful unrooted tree) are all that is necessary for accurate phylogenetic analysis, and that more characters are more valuable than more taxa in phylogenetics. This has led to a raging controversy about taxon sampling.

Empirical, theoretical, and simulation studies have led to a number of dramatic demonstrations of the importance of adequate taxon sampling. Most of these can be summarized by a simple observation: a phylogenetic data matrix has dimensions of characters times taxa. Doubling the number of taxa doubles the amount of information in a matrix just as surely as doubling the number of characters. Each taxon represents a new sample for every character, but, more importantly, it (usually) represents a new combination of character states. These character states can not only determine where that taxon is placed on the tree, they can inform the entire analysis, possibly causing different relationships among the remaining taxa to be favored by changing estimates of the pattern of character changes.

The most disturbing weakness of parsimony analysis, that of long-branch attraction (see below) is particularly pronounced with poor taxon sampling, especially in the four-taxon case. This is a well-understood case in which additional character sampling may not improve the quality of the estimate. As taxa are added, they often break up long branches (especially in the case of fossils), effectively improving the estimation of character state changes along them. Because of the richness of information added by taxon sampling, it is even possible to produce highly accurate estimates of phylogenies with hundreds of taxa using only a few thousand characters.

Although many studies have been performed, there is still much work to be done on taxon sampling strategies. Because of advances in computer performance, and the reduced cost and increased automation of molecular sequencing, sample sizes overall are on the rise, and studies addressing the relationships of hundreds of taxa (or other terminal entities, such as genes) are becoming common. Of course, this is not to say that adding characters is not also useful; the number of characters is increasing as well.

Some systematists prefer to exclude taxa based on the number of unknown character entries ("?") they exhibit, or because they tend to "jump around" the tree in analyses (i.e., they are "wildcards"). As noted below, theoretical and simulation work has demonstrated that this is likely to sacrifice accuracy rather than improve it. Although these taxa may generate more most-parsimonious trees (see below), methods such as agreement subtrees and reduced consensus can still extract information on the relationships of interest.

It has been observed that inclusion of more taxa tends to lower overall support values (bootstrap percentages or decay indices, see below). The cause of this is clear: as additional taxa are added to a tree, they subdivide the branches to which they attach, and thus dilute the information that supports that branch. While support for individual branches is reduced, support for the overall relationships is actually increased. Consider analysis that produces the following tree: (fish, (lizard, (whale, (cat, monkey)))). Adding a rat and a walrus will probably reduce the support for the (whale, (cat, monkey)) clade, because the rat and the walrus may fall within this clade, or outside of the clade, and since these five animals are all relatively closely related, there should be more uncertainty about their relationships. Within error, it may be impossible to determine any of these animals' relationships relative to one another. However, the rat and the walrus will probably add character data that cements the grouping any two of these mammals exclusive of the fish or the lizard; where the initial analysis might have been misled, say, by the presence of fins in the fish and the whale, the presence of the walrus, with blubber and fins like a whale but whiskers like a cat and a rat, firmly ties the whale to the mammals.

To cope with this problem, agreement subtrees, reduced consensus, and double-decay analysis seek to identify supported relationships (in the form of "n-taxon statements," such as the four-taxon statement "(fish, (lizard, (cat, whale)))") rather than whole trees. If the goal of an analysis is a resolved tree, as is the case for comparative phylogenetics, these methods cannot solve the problem. However, if the tree estimate is so poorly supported, the results of any analysis derived from the tree will probably be too suspect to use anyway.

Analysis

A maximum parsimony analysis runs in a very straightforward fashion. Trees are scored according to the degree to which they imply a parsimonious distribution of the character data. The most parsimonious tree for the dataset represents the preferred hypothesis of relationships among the taxa in the analysis.

Trees are scored (evaluated) by using a simple algorithm to determine how many "steps" (evolutionary transitions) are required to explain the distribution of each character. A step is, in essence, a change from one character state to another, although with ordered characters some transitions require more than one step. Contrary to popular belief, the algorithm does not explicitly assign particular character states to nodes (branch junctions) on a tree: the fewest steps can involve multiple, equally costly assignments and distributions of evolutionary transitions. What is optimized is the total number of changes.

There are many more possible phylogenetic trees than can be searched exhaustively for more than eight taxa or so. A number of algorithms are therefore used to search among the possible trees. Many of these involve taking an initial tree (usually the favored tree from the last iteration of the algorithm), and perturbing it to see if the change produces a higher score.

The trees resulting from parsimony search are unrooted: They show all the possible relationships of the included taxa, but they lack any statement on relative times of divergence. A particular branch is chosen to root the tree by the user. This branch is then taken to be outside all the other branches of the tree, which together form a monophyletic group. This imparts a sense of relative time to the tree. Incorrect choice of a root can result in incorrect relationships on the tree, even if the tree is itself correct in its unrooted form.

Parsimony analysis often returns a number of equally most-parsimonious trees (MPTs). A large number of MPTs is often seen as an analytical failure, and is widely believed to be related to the number of missing entries ("?") in the dataset, characters showing too much homoplasy, or the presence of topologically labile "wildcard" taxa (which may have many missing entries). Numerous methods have been proposed to reduce the number of MPTs, including removing characters or taxa with large amounts of missing data before analysis, removing or downweighting highly homoplastic characters (successive weighting) or removing wildcard taxa (the phylogenetic trunk method) a posteriori and then reanalyzing the data.

Numerous theoretical and simulation studies have demonstrated that highly homoplastic characters, characters and taxa with abundant missing data, and "wildcard" taxa contribute to the analysis. Although excluding characters or taxa may appear to improve resolution, the resulting tree is based on less data, and is therefore a less reliable estimate of the phylogeny (unless the characters or taxa are non informative, see safe taxonomic reduction). Today's general consensus is that having multiple MPTs is a valid analytical result; it simply indicates that there is insufficient data to resolve the tree completely. In many cases, there is substantial common structure in the MPTs, and differences are slight and involve uncertainty in the placement of a few taxa. There are a number of methods for summarizing the relationships within this set, including consensus trees, which show common relationships among all the taxa, and pruned agreement subtrees, which show common structure by temporarily pruning "wildcard" taxa from every tree until they all agree. Reduced consensus takes this one step further, by showing all subtrees (and therefore all relationships) supported by the input trees.

Even if multiple MPTs are returned, parsimony analysis still basically produces a point-estimate, lacking confidence intervals of any sort. This has often been levelled as a criticism, since there is certainly error in estimating the most-parsimonious tree, and the method does not inherently include any means of establishing how sensitive its conclusions are to this error. Several methods have been used to assess support.

Jackknifing and bootstrapping, well-known statistical resampling procedures, have been employed with parsimony analysis. The jackknife, which involves resampling without replacement ("leave-one-out") can be employed on characters or taxa; interpretation may become complicated in the latter case, because the variable of interest is the tree, and comparison of trees with different taxa is not straightforward. The bootstrap, resampling with replacement (sample x items randomly out of a sample of size x, but items can be picked multiple times), is only used on characters, because adding duplicate taxa does not change the result of a parsimony analysis. The bootstrap is much more commonly employed in phylogenetics (as elsewhere); both methods involve an arbitrary but large number of repeated iterations involving perturbation of the original data followed by analysis. The resulting MPTs from each analysis are pooled, and the results are usually presented on a 50% Majority Rule Consensus tree, with individual branches (or nodes) labelled with the percentage of bootstrap MPTs in which they appear. This "bootstrap percentage" (which is not a P-value, as is sometimes claimed) is used as a measure of support. Technically, it is supposed to be a measure of repeatability, the probability that that branch (node, clade) would be recovered if the taxa were sampled again. Experimental tests with viral phylogenies suggest that the bootstrap percentage is not a good estimator of repeatability for phylogenetics, but it is a reasonable estimator of accuracy.^{[citation needed]} In fact, it has been shown that the bootstrap percentage, as an estimator of accuracy, is biased, and that this bias results on average in an underestimate of confidence (such that as little as 70% support might really indicate up to 95% confidence). However, the direction of bias cannot be ascertained in individual cases, so assuming that high values bootstrap support indicate even higher confidence is unwarranted.

Another means of assessing support is Bremer support, or the decay index which is a parameter of a given data set, rather than an estimate based on pseudoreplicated subsamples, as are the bootstrap and jackknife procedures described above. Bremer support (also known as branch support) is simply the difference in number of steps between the score of the MPT(s), and the score of the most parsimonious tree that does not contain a particular clade (node, branch). It can be thought of as the number of steps you have to add to lose that clade; implicitly, it is meant to suggest how great the error in the estimate of the score of the MPT must be for the clade to no longer be supported by the analysis, although this is not necessarily what it does. Branch support values are often fairly low for modestly-sized data sets (one or two steps being typical), but they often appear to be proportional to bootstrap percentages. As data matrices become larger, branch support values often continue to increase as bootstrap values plateau at 100%. Thus, for large data matrices, branch support values may provide a more informative means to compare support for strongly-supported branches. However, interpretation of decay values is not straightforward, and they seem to be preferred by authors with philosophical objections to the bootstrap (although many morphological systematists, especially paleontologists, report both). Double-decay analysis is a decay counterpart to reduced consensus that evaluates the decay index for all possible subtree relationships (n-taxon statements) within a tree.

Problems with maximum parsimony phylogenetic inference

An example of long branch attraction. If branches A & C have a high number of substitutions in the "true tree" (assumed, never actually known except in simulations), then parsimony might interpret parallel changes as synapomorphies and group A and C together.

Maximum parsimony is an epistemologically straightforward approach that makes few mechanistic assumptions, and is popular for this reason. However, it may not be statistically consistent under certain circumstances. Consistency, here meaning the monotonic convergence on the correct answer with the addition of more data, is a desirable property of statistical methods. As demonstrated in 1978 by Joe Felsenstein, maximum parsimony can be inconsistent under certain conditions. The category of situations in which this is known to occur is called long branch attraction, and occurs, for example, where there are long branches (a high level of substitutions) for two characters (A & C), but short branches for another two (B & D). A and B diverged from a common ancestor, as did C and D. Of course, to know that a method is giving you the wrong answer, you would need to know what the correct answer is. This is generally not the case in science. For this reason, some view statistical consistency as irrelevant to empirical phylogenetic questions.

Assume for simplicity that we are considering a single binary character (it can either be + or -). Because the distance from B to D is small, in the vast majority of all cases, B and D will be the same. Here, we will assume that they are both + (+ and - are assigned arbitrarily and swapping them is only a matter of definition). If this is the case, there are four remaining possibilities. A and C can both be +, in which case all taxa are the same and all the trees have the same length. A can be + and C can be -, in which case only one character is different, and we cannot learn anything, as all trees have the same length. Similarly, A can be - and C can be +. The only remaining possibility is that A and C are both -. In this case, however, the evidence suggests that A and C group together, and B and D together. As a consequence, if the "true tree" is a tree of this type, the more data we collect (i.e. the more characters we study), the more the evidence will support the wrong tree. Of course, except in mathematical simulations, we never know what the "true tree" is. Thus, unless we are able to devise a model that is guaranteed to accurately recover the "true tree," any other optimality criterion or weighting scheme could also, in principle, be statistically inconsistent. The bottom line is, that while statistical inconsistency is an interesting theoretical issue, it is empirically a purely metaphysical concern, outside the realm of empirical testing. Any method could be inconsistent, and there is no way to know for certain whether it is, or not. It is for this reason that many systematists characterize their phylogenetic results as hypotheses of relationship.

Another complication with maximum parsimony, and other optimaltiy-criterion based phylogenetic methods, is that finding the shortest tree is an NP-hard problem. The only currently available, efficient way of obtaining a solution, given an arbitrarily large set of taxa, is by using heuristic methods which do not guarantee that the shortest tree will be recovered. These methods employ hill-climbing algorithms to progressively approach the best tree. However, it has been shown that there can be "tree islands" of suboptimal solutions, and the analysis can become trapped in these local optima. Thus, complex, flexible heuristics are required to ensure that tree space has been adequately explored. Several heuristics are available, including nearest neighbor interchange (NNI), tree bisection reconnection (TBR), and the parsimony ratchet.

Criticism

It has been asserted that a major problem, especially for paleontology, is that maximum parsimony assumes that the only way two species can share the same nucleotide at the same position is if they are genetically related. This asserts that phylogenetic applications of parsimony assume that all similarity is homologous (other interpretations, such as the assertion that two organisms might not be related at all, are nonsensical). This is emphatically not the case: as with any form of character-based phylogeny estimation, parsimony is used to test the homologous nature of similarities by finding the phylogenetic tree which best accounts for all of the similarities.

It is often stated that parsimony is not relevant to phylogenetic inference because "evolution is not parsimonious." In most cases, there is no explicit alternative proposed; if no alternative is available, any statistical method is preferable to none at all. Additionally, it is not clear what would be meant if the statement "evolution is parsimonious" were in fact true. This could be taken to mean that more character changes may have occurred historically than are predicted using the parsimony criterion. Because parsimony phylogeny estimation reconstructs the minimum number of changes necessary to explain a tree, this is quite possible. However, it has been shown through simulation studies, testing with known in vitro viral phylogenies, and congruence with other methods, that the accuracy of parsimony is in most cases not compromised by this. Parsimony analysis uses the number of character changes on trees to choose the best tree, but it does not require that exactly that many changes, and no more, produced the tree. As long as the changes that have not been accounted for are randomly distributed over the tree (a reasonable null expectation), the result should not be biased. In practice, the technique is robust: maximum parsimony exhibits minimal bias as a result of choosing the tree with the fewest changes.

An analogy can be drawn with choosing among contractors based on their initial (nonbinding) estimate of the cost of a job. The actual finished cost is very likely to be higher than the estimate. Despite this, choosing the contractor who furnished the lowest estimate should theoretically result in the lowest final project cost. This is because, in the absence of other data, we would assume that all of the relevant contractors have the same risk of cost overruns. In practice, of course, unscrupulous business practices may bias this result; in phylogenetics, too, some particular phylogenetic problems (for example, long branch attraction, described above) may potentially bias results. In both cases, however, there is no way to tell if the result is going to be biased, or the degree to which it will be biased, based on the estimate itself. With parsimony too, there is no way to tell that the data are positively misleading, without comparison to other evidence.

Parsimony is often characterized as implicitly adopting the position that evolutionary change is rare, or that homoplasy (convergence and reversal) is minimal in evolution. This is not entirely true: parsimony minimizes the number of convergences and reversals that are assumed by the preferred tree, but this may result in a relatively large number of such homoplastic events. It would be more appropriate to say that parsimony assumes only the minimum amount of change implied by the data. As above, this does not require that these were the only changes that occurred; it simply does not infer changes for which there is no evidence. The shorthand for describing this, to paraphrase Farris is that "parsimony minimizes assumed homoplasies, it does not assume that homoplasy is minimal."

Recent simulation studies suggest that parsimony may be less accurate than trees built using Bayesian approaches for morphological data, potentially due to overprecision, although this has been disputed. Studies using novel simulation methods have demonstrated that differences between inference methods result from the search strategy and consensus method employed, rather than the optimization used. Also, analyses of 38 molecular and 86 morphological empirical datasets have shown that the common mechanism assumed by the evolutionary models used in model-based phylogenetics apply to most molecular, but few morphological datasets. This finding validates the use of model-based phylogenetics for molecular data, but suggests that for morphological data, parsimony remains advantageous, at least until more sophisticated models become available for phenotypic data.

Alternatives

There are several other methods for inferring phylogenies based on discrete character data, including maximum likelihood and Bayesian inference. Each offers potential advantages and disadvantages. In practice, these methods tend to favor trees that are very similar to the most parsimonious tree(s) for the same dataset; however, they allow for complex modelling of evolutionary processes, and as classes of methods are statistically consistent and are not susceptible to long-branch attraction. Note, however, that the performance of likelihood and Bayesian methods are dependent on the quality of the particular model of evolution employed; an incorrect model can produce a biased result - just like parsimony. In addition, they are still quite computationally slow relative to parsimony methods, sometimes requiring weeks to run large datasets. Most of these methods have particularly avid proponents and detractors; parsimony especially has been advocated as philosophically superior (most notably by ardent cladists). One area where parsimony still holds much sway is in the analysis of morphological data, because—until recently—stochastic models of character change were not available for non-molecular data, and they are still not widely implemented. Parsimony has also recently been shown to be more likely to recover the true tree in the face of profound changes in evolutionary ("model") parameters (e.g., the rate of evolutionary change) within a tree.

Distance matrices can also be used to generate phylogenetic trees. Non-parametric distance methods were originally applied to phenetic data using a matrix of pairwise distances and reconciled to produce a tree. The distance matrix can come from a number of different sources, including immunological distance, morphometric analysis, and genetic distances. For phylogenetic character data, raw distance values can be calculated by simply counting the number of pairwise differences in character states (Manhattan distance) or by applying a model of evolution. Notably, distance methods also allow use of data that may not be easily converted to character data, such as DNA-DNA hybridization assays. Today, distance-based methods are often frowned upon because phylogenetically-informative data can be lost when converting characters to distances. There are a number of distance-matrix methods and optimality criteria, of which the minimum evolution criterion is most closely related to maximum parsimony.

Minimum Evolution

From among the distance methods, there exists a phylogenetic estimation criterion, known as Minimum Evolution (ME), that shares with maximum-parsimony the aspect of searching for the phylogeny that has the shortest total sum of branch lengths.

A subtle difference distinguishes the maximum-parsimony criterion from the ME criterion: while maximum-parsimony is based on an abductive heuristic, i.e., the plausibility of the simplest evolutionary hypothesis of taxa with respect to the more complex ones, the ME criterion is based on Kidd and Sgaramella-Zonta's conjectures (proven true 22 years later by Rzhetsky and Nei) stating that if the evolutionary distances from taxa were unbiased estimates of the true evolutionary distances then the true phylogeny of taxa would have a length shorter than any other alternative phylogeny compatible with those distances. Rzhetsky and Nei's results set the ME criterion free from the Occam's razor principle and confer it a solid theoretical and quantitative basis.

Bomber

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Bomber

A bomber is a military combat aircraft designed to attack ground and naval targets by dropping air-to-ground weaponry (such as bombs), launching torpedoes, or deploying air-launched cruise missiles. The first use of bombs dropped from an aircraft occurred in the Italo-Turkish War, with the first major deployments coming in the First World War and Second World War by all major airforces causing devastating damage to cities, towns, and rural areas. The first purpose built bombers were the Italian Caproni Ca 30 and British Bristol T.B.8, both of 1913. Some bombers were decorated with nose art or victory markings.

There are two major classifications of bomber: strategic and tactical. Strategic bombing is done by heavy bombers primarily designed for long-range bombing missions against strategic targets to diminish the enemy's ability to wage war by limiting access to resources through crippling infrastructure or reducing industrial output. Tactical bombing is aimed at countering enemy military activity and in supporting offensive operations, and is typically assigned to smaller aircraft operating at shorter ranges, typically near the troops on the ground or against enemy shipping.

During WWII with engine power as a major limitation, combined with the desire for accuracy and other operational factors, bomber designs tended to be tailored to specific roles. Early in the Cold War however, bombers were the only means of carrying nuclear weapons to enemy targets, and held the role of deterrence. With the advent of guided air-to-air missiles, bombers needed to avoid interception. High-speed and high-altitude flying became a means of evading detection and attack. With the advent of ICBMs the role of the bomber was brought to a more tactical focus in close air support roles, and a focus on stealth technology for strategic bombers.

Classification

Strategic

Strategic bombing is done by heavy bombers primarily designed for long-range bombing missions against strategic targets such as supply bases, bridges, factories, shipyards, and cities themselves, to diminish the enemy's ability to wage war by limiting access to resources through crippling infrastructure or reducing industrial output. Current examples include the strategic nuclear-armed bombers: B-2 Spirit, B-52 Stratofortress, Tupolev Tu-95 'Bear', Tupolev Tu-22M 'Backfire' and Tupolev Tu-160 "Blackjack"; historically notable examples are the: Gotha G.IV, Avro Lancaster, Heinkel He 111, Junkers Ju 88, Boeing B-17 Flying Fortress, Consolidated B-24 Liberator, Boeing B-29 Superfortress, and Tupolev Tu-16 'Badger'.

Tactical

Tactical bombing, aimed at countering enemy military activity and in supporting offensive operations, is typically assigned to smaller aircraft operating at shorter ranges, typically near the troops on the ground or against enemy shipping. This role is filled by tactical bomber class, which crosses and blurs with various other aircraft categories: light bombers, medium bombers, dive bombers, interdictors, fighter-bombers, attack aircraft, multirole combat aircraft, and others.

Current examples: Xian JH-7, Dassault Mirage 2000D, and the Panavia Tornado IDS
Historical examples: Ilyushin Il-2 Shturmovik, Junkers Ju 87 Stuka, Republic P-47 Thunderbolt, Hawker Typhoon and Mikoyan MiG-27.

History

The first use of an air-dropped bomb (actually four hand grenades specially manufactured by the Italian naval arsenal) was carried out by Italian Second Lieutenant Giulio Gavotti on 1 November 1911 during the Italo-Turkish war in Libya – although his plane was not designed for the task of bombing, and his improvised attacks on Ottoman positions had little impact. These picric acid-filled steel spheres were nicknamed "ballerinas" from the fluttering fabric ribbons attached.

Early bombers

On 16 October 1912, Bulgarian observer Prodan Tarakchiev dropped two of those bombs on the Turkish railway station of Karağaç (near the besieged Edirne) from an Albatros F.2 aircraft piloted by Radul Milkov, during the First Balkan War. This is deemed to be the first use of an aircraft as a bomber.

The first heavier-than-air aircraft purposely designed for bombing were the Italian Caproni Ca 30 and British Bristol T.B.8, both of 1913. The Bristol T.B.8 was an early British single engined biplane built by the Bristol Aeroplane Company. They were fitted with a prismatic Bombsight in the front cockpit and a cylindrical bomb carrier in the lower forward fuselage capable of carrying twelve 10 lb (4.5 kg) bombs, which could be dropped singly or as a salvo as required.

The aircraft was purchased for use both by the Royal Naval Air Service and the Royal Flying Corps (RFC), and three T.B.8s, that were being displayed in Paris during December 1913 fitted with bombing equipment, were sent to France following the outbreak of war. Under the command of Charles Rumney Samson, a bombing attack on German gun batteries at Middelkerke, Belgium was executed on 25 November 1914.

The dirigible, or airship, was developed in the early 20th century. Early airships were prone to disaster, but slowly the airship became more dependable, with a more rigid structure and stronger skin. Prior to the outbreak of war, Zeppelins, a larger and more streamlined form of airship designed by German Count Ferdinand von Zeppelin, were outfitted to carry bombs to attack targets at long range. These were the first long range, strategic bombers. Although the German air arm was strong, with a total of 123 airships by the end of the war, they were vulnerable to attack and engine failure, as well as navigational issues. German airships inflicted little damage on all 51 raids, with 557 Britons killed and 1,358 injured. The German Navy lost 53 of its 73 airships, and the German Army lost 26 of its 50 ships.

The Caproni Ca 30 was built by Gianni Caproni in Italy. It was a twin-boom biplane with three 67 kW (80 hp) Gnome rotary engines and first flew in October 1914. Test flights revealed power to be insufficient and the engine layout unworkable, and Caproni soon adopted a more conventional approach installing three 81 kW (110 hp) Fiat A.10s. The improved design was bought by the Italian Army and it was delivered in quantity from August 1915.

While mainly used as a trainer, Avro 504s were also briefly used as bombers at the start of the First World War by the Royal Naval Air Service (RNAS) when they were used for raids on the German airship sheds.

Strategic bombing

Bombing raids and interdiction operations were mainly carried out by French and British forces during the War as the German air arm was forced to concentrate its resources on a defensive strategy. Notably, bombing campaigns formed a part of the British offensive at the Battle of Neuve Chapelle in 1915, with Royal Flying Corps squadrons attacking German railway stations in an attempt to hinder the logistical supply of the German army. The early, improvised attempts at bombing that characterized the early part of the war slowly gave way to a more organized and systematic approach to strategic and tactical bombing, pioneered by various air power strategists of the Entente, especially Major Hugh Trenchard; he was the first to advocate that there should be "... sustained [strategic bombing] attacks with a view to interrupting the enemy's railway communications ... in conjunction with the main operations of the Allied Armies."

When the war started, bombing was very crude (hand-held bombs were thrown over the side) yet by the end of the war long-range bombers equipped with complex mechanical bombing computers were being built, designed to carry large loads to destroy enemy industrial targets. The most important bombers used in World War I were the French Breguet 14, British de Havilland DH-4, German Albatros C.III and Russian Sikorsky Ilya Muromets. The Russian Sikorsky Ilya Muromets, was the first four-engine bomber to equip a dedicated strategic bombing unit during World War I. This heavy bomber was unrivaled in the early stages of the war, as the Central Powers had no comparable aircraft until much later.

Long range bombing raids were carried out at night by multi-engine biplanes such as the Gotha G.IV (whose name was synonymous with all multi-engine German bombers) and later the Handley Page Type O; the majority of bombing was done by single-engined biplanes with one or two crew members flying short distances to attack enemy lines and immediate hinterland. As the effectiveness of a bomber was dependent on the weight and accuracy of its bomb load, ever larger bombers were developed starting in World War I, while considerable money was spent developing suitable bombsights.

World War II

With engine power as a major limitation, combined with the desire for accuracy and other operational factors, bomber designs tended to be tailored to specific roles. By the start of the war this included:

dive bomber – specially strengthened for vertical diving attacks for greater accuracy
light bomber, medium bomber and heavy bomber – subjective definitions based on size and/or payload capacity
torpedo bomber – specialized aircraft armed with torpedoes
ground attack aircraft – aircraft used against targets on a battlefield such as troop or tank concentrations
night bomber – specially equipped to operate at night when opposing defences are limited
maritime patrol – long range bombers that were used against enemy shipping, particularly submarines
fighter-bomber – a modified fighter aircraft used as a light bomber

Bombers of this era were not intended to attack other aircraft although most were fitted with defensive weapons. World War II saw the beginning of the widespread use of high speed bombers which began to minimize defensive weaponry in order to attain higher speed. Some smaller designs were used as the basis for night fighters. A number of fighters, such as the Hawker Hurricane were used as ground attack aircraft, replacing earlier conventional light bombers that proved unable to defend themselves while carrying a useful bomb load.

Cold War

At the start of the Cold War, bombers were the only means of carrying nuclear weapons to enemy targets, and had the role of deterrence. With the advent of guided air-to-air missiles, bombers needed to avoid interception. High-speed and high-altitude flying became a means of evading detection and attack. Designs such as the English Electric Canberra could fly faster or higher than contemporary fighters. When surface-to-air missiles became capable of hitting high-flying bombers, bombers were flown at low altitudes to evade radar detection and interception.

Once "stand off" nuclear weapon designs were developed, bombers did not need to pass over the target to make an attack; they could fire and turn away to escape the blast. Nuclear strike aircraft were generally finished in bare metal or anti-flash white to minimize absorption of thermal radiation from the flash of a nuclear explosion. The need to drop conventional bombs remained in conflicts with non-nuclear powers, such as the Vietnam War or Malayan Emergency.

The development of large strategic bombers stagnated in the later part of the Cold War because of spiraling costs and the development of the Intercontinental ballistic missile (ICBM) – which was felt to have similar deterrent value while being impossible to intercept. Because of this, the United States Air Force XB-70 Valkyrie program was cancelled in the early 1960s; the later B-1B Lancer and B-2 Spirit aircraft entered service only after protracted political and development problems. Their high cost meant that few were built and the 1950s-designed B-52s are projected to remain in use until the 2040s. Similarly, the Soviet Union used the intermediate-range Tu-22M 'Backfire' in the 1970s, but their Mach 3 bomber project stalled. The Mach 2 Tu-160 'Blackjack' was built only in tiny numbers, leaving the 1950s Tupolev Tu-16 and Tu-95 'Bear' heavy bombers to continue being used into the 21st century.

The British strategic bombing force largely came to an end when the V bomber force was phased out; the last of which left service in 1983. The French Mirage IV bomber version was retired in 1996, although the Mirage 2000N and the Rafale have taken on this role. The only other nation that fields strategic bombing forces is China, which has a number of Xian H-6s.

Modern era

Currently, only the United States Air Force, the Russian Aerospace Forces' Long-Range Aviation command, and China's People's Liberation Army Air Force operate strategic heavy bombers. Other air forces have transitioned away from dedicated bombers in favor of multirole combat aircraft.

At present, these air forces are each developing stealth replacements for their legacy bomber fleets, the USAF with the Northrop Grumman B-21, the Russian Aerospace Forces with the PAK DA, and the PLAAF with the Xian H-20. As of 2021, the B-21 is expected to enter service by 2026–2027. The B-21 would be capable of loitering near target areas for extended periods of time.

Other uses

Occasionally, military aircraft have been used to bomb ice jams with limited success as part of an effort to clear them. In 2018, the Swedish Air Force dropped bombs on a forest fire, snuffing out flames with the aid of the blast waves. The fires had been raging in an area contaminated with unexploded ordnance, rendering them difficult to extinguish for firefighters.

Limit inferior and limit superior

From Wikipedia, the free encyclopedia

In mathematics, the limit inferior and limit superior of a sequence can be thought of as limiting (that is, eventual and extreme) bounds on the sequence. They can be thought of in a similar fashion for a function (see limit of a function). For a set, they are the infimum and supremum of the set's limit points, respectively. In general, when there are multiple objects around which a sequence, function, or set accumulates, the inferior and superior limits extract the smallest and largest of them; the type of object and the measure of size is context-dependent, but the notion of extreme limits is invariant. Limit inferior is also called infimum limit, limit infimum, liminf, inferior limit, lower limit, or inner limit; limit superior is also known as supremum limit, limit supremum, limsup, superior limit, upper limit, or outer limit.

The limit inferior of a sequence $(x_{n})$ is denoted by

\underset{n \to \infty}{lim inf} x_{n} or \underset{n \to \infty}{\lim_{_}} x_{n},

and the limit superior of a sequence

(x_{n})

is denoted by

\underset{n \to \infty}{lim sup} x_{n} or \underset{n \to \infty}{\lim^{¯}} x_{n} .

Definition for sequences

The limit inferior of a sequence (x_n) is defined by

\underset{n \to \infty}{lim inf} x_{n} := lim_{n \to \infty} (inf_{m \geq n} x_{m})

\underset{n \to \infty}{lim inf} x_{n} := sup_{n \geq 0} inf_{m \geq n} x_{m} = sup {inf {x_{m} : m \geq n} : n \geq 0} .

Similarly, the limit superior of (x_n) is defined by

\underset{n \to \infty}{lim sup} x_{n} := lim_{n \to \infty} (sup_{m \geq n} x_{m})

\underset{n \to \infty}{lim sup} x_{n} := inf_{n \geq 0} sup_{m \geq n} x_{m} = inf {sup {x_{m} : m \geq n} : n \geq 0} .

Alternatively, the notations $\underset{n \to \infty}{\lim_{_}} x_{n} := \underset{n \to \infty}{lim inf} x_{n}$ and $\underset{n \to \infty}{\lim^{¯}} x_{n} := \underset{n \to \infty}{lim sup} x_{n}$ are sometimes used.

The limits superior and inferior can equivalently be defined using the concept of subsequential limits of the sequence $(x_{n})$ . An element $ξ$ of the extended real numbers $\bar{R}$ is a subsequential limit of $(x_{n})$ if there exists a strictly increasing sequence of natural numbers $(n_{k})$ such that $ξ = lim_{k \to \infty} x_{n_{k}}$ . If $E \subseteq \bar{R}$ is the set of all subsequential limits of $(x_{n})$ , then

\underset{n \to \infty}{lim sup} x_{n} = sup E

and

\underset{n \to \infty}{lim inf} x_{n} = inf E .

If the terms in the sequence are real numbers, the limit superior and limit inferior always exist, as the real numbers together with ±∞ (i.e. the extended real number line) are complete. More generally, these definitions make sense in any partially ordered set, provided the suprema and infima exist, such as in a complete lattice.

Whenever the ordinary limit exists, the limit inferior and limit superior are both equal to it; therefore, each can be considered a generalization of the ordinary limit which is primarily interesting in cases where the limit does not exist. Whenever lim inf x_n and lim sup x_n both exist, we have

\underset{n \to \infty}{lim inf} x_{n} \leq \underset{n \to \infty}{lim sup} x_{n} .

The limits inferior and superior are related to big-O notation in that they bound a sequence only "in the limit"; the sequence may exceed the bound. However, with big-O notation the sequence can only exceed the bound in a finite prefix of the sequence, whereas the limit superior of a sequence like e⁻ⁿ may actually be less than all elements of the sequence. The only promise made is that some tail of the sequence can be bounded above by the limit superior plus an arbitrarily small positive constant, and bounded below by the limit inferior minus an arbitrarily small positive constant.

The limit superior and limit inferior of a sequence are a special case of those of a function (see below).

The case of sequences of real numbers

In mathematical analysis, limit superior and limit inferior are important tools for studying sequences of real numbers. Since the supremum and infimum of an unbounded set of real numbers may not exist (the reals are not a complete lattice), it is convenient to consider sequences in the affinely extended real number system: we add the positive and negative infinities to the real line to give the complete totally ordered set [−∞,∞], which is a complete lattice.

Interpretation

Consider a sequence $(x_{n})$ consisting of real numbers. Assume that the limit superior and limit inferior are real numbers (so, not infinite).

The limit superior of $x_{n}$ is the smallest real number $b$ such that, for any positive real number $ε$ , there exists a natural number $N$ such that $x_{n} < b + ε$ for all $n > N$ . In other words, any number larger than the limit superior is an eventual upper bound for the sequence. Only a finite number of elements of the sequence are greater than $b + ε$ .
The limit inferior of $x_{n}$ is the largest real number $b$ such that, for any positive real number $ε$ , there exists a natural number $N$ such that $x_{n} > b - ε$ for all $n > N$ . In other words, any number below the limit inferior is an eventual lower bound for the sequence. Only a finite number of elements of the sequence are less than $b - ε$ .

Properties

The relationship of limit inferior and limit superior for sequences of real numbers is as follows:

\underset{n \to \infty}{lim sup} (- x_{n}) = - \underset{n \to \infty}{lim inf} x_{n}

As mentioned earlier, it is convenient to extend $R$ to $[- \infty, \infty] .$ Then, $(x_{n})$ in $[- \infty, \infty]$ converges if and only if

\underset{n \to \infty}{lim inf} x_{n} = \underset{n \to \infty}{lim sup} x_{n}

in which case

lim_{n \to \infty} x_{n}

is equal to their common value. (Note that when working just in

R,

convergence to

- \infty

\infty

would not be considered as convergence.) Since the limit inferior is at most the limit superior, the following conditions hold

\begin{aligned} \underset{n \to \infty}{lim inf} x_{n} & = \infty & implies lim_{n \to \infty} x_{n} = \infty, \\ \underset{n \to \infty}{lim sup} x_{n} & = - \infty & implies lim_{n \to \infty} x_{n} = - \infty . \end{aligned}

If $I = \underset{n \to \infty}{lim inf} x_{n}$ and $S = \underset{n \to \infty}{lim sup} x_{n}$ , then the interval $[I, S]$ need not contain any of the numbers $x_{n},$ but every slight enlargement $[I - ϵ, S + ϵ],$ for arbitrarily small $ϵ > 0,$ will contain $x_{n}$ for all but finitely many indices $n .$ In fact, the interval $[I, S]$ is the smallest closed interval with this property. We can formalize this property like this: there exist subsequences $x_{k_{n}}$ and $x_{h_{n}}$ of $x_{n}$ (where $k_{n}$ and $h_{n}$ are increasing) for which we have

\underset{n \to \infty}{lim inf} x_{n} + ϵ > x_{h_{n}} x_{k_{n}} > \underset{n \to \infty}{lim sup} x_{n} - ϵ

On the other hand, there exists a $n_{0} \in N$ so that for all $n \geq n_{0}$

\underset{n \to \infty}{lim inf} x_{n} - ϵ < x_{n} < \underset{n \to \infty}{lim sup} x_{n} + ϵ

To recapitulate:

If $Λ$ is greater than the limit superior, there are at most finitely many $x_{n}$ greater than $Λ;$ if it is less, there are infinitely many.
If $λ$ is less than the limit inferior, there are at most finitely many $x_{n}$ less than $λ;$ if it is greater, there are infinitely many.

Conversely, it can also be shown that:

If there are infinitely many $x_{n}$ greater than or equal to $Λ$ , then $Λ$ is lesser than or equal to the limit supremum; if there are only finitely many $x_{n}$ greater than $Λ$ , then $Λ$ is greater than or equal to the limit supremum.
If there are infinitely many $x_{n}$ lesser than or equal to $λ$ , then $λ$ is greater than or equal to the limit inferior; if there are only finitely many $x_{n}$ lesser than $λ$ , then $λ$ is lesser than or equal to the limit inferior.

In general,

inf_{n} x_{n} \leq \underset{n \to \infty}{lim inf} x_{n} \leq \underset{n \to \infty}{lim sup} x_{n} \leq sup_{n} x_{n} .

The liminf and limsup of a sequence are respectively the smallest and greatest cluster points.

For any two sequences of real numbers $(a_{n}), (b_{n}),$ the limit superior satisfies subadditivity whenever the right side of the inequality is defined (that is, not $\infty - \infty$ or $- \infty + \infty$ ): $\underset{n \to \infty}{lim sup} (a_{n} + b_{n}) \leq \underset{n \to \infty}{lim sup} a_{n} + \underset{n \to \infty}{lim sup} b_{n} .$

Analogously, the limit inferior satisfies superadditivity:

\underset{n \to \infty}{lim inf} (a_{n} + b_{n}) \geq \underset{n \to \infty}{lim inf} a_{n} + \underset{n \to \infty}{lim inf} b_{n} .

In the particular case that one of the sequences actually converges, say

a_{n} \to a,

then the inequalities above become equalities (with

\underset{n \to \infty}{lim sup} a_{n}

\underset{n \to \infty}{lim inf} a_{n}

being replaced by

a

For any two sequences of non-negative real numbers $(a_{n}), (b_{n}),$ the inequalities $\underset{n \to \infty}{lim sup} (a_{n} b_{n}) \leq (\underset{n \to \infty}{lim sup} a_{n}) (\underset{n \to \infty}{lim sup} b_{n})$ and $\underset{n \to \infty}{lim inf} (a_{n} b_{n}) \geq (\underset{n \to \infty}{lim inf} a_{n}) (\underset{n \to \infty}{lim inf} b_{n})$

hold whenever the right-hand side is not of the form $0 \cdot \infty .$

If $lim_{n \to \infty} a_{n} = A$ exists (including the case $A = + \infty$ ) and $B = \underset{n \to \infty}{lim sup} b_{n},$ then $\underset{n \to \infty}{lim sup} (a_{n} b_{n}) = A B$ provided that $A B$ is not of the form $0 \cdot \infty .$

Examples

As an example, consider the sequence given by the sine function: $x_{n} = \sin (n) .$ Using the fact that π is irrational, it follows that $\underset{n \to \infty}{lim inf} x_{n} = - 1$ and $\underset{n \to \infty}{lim sup} x_{n} = + 1.$ (This is because the sequence ${1, 2, 3, \dots}$ is equidistributed mod 2π, a consequence of the equidistribution theorem.)

An example from number theory is $\underset{n \to \infty}{lim inf} (p_{n + 1} - p_{n}),$ where $p_{n}$ is the $n$ -th prime number.

The value of this limit inferior is conjectured to be 2 – this is the twin prime conjecture – but as of April 2014 has only been proven to be less than or equal to 246. The corresponding limit superior is

+ \infty

, because there are arbitrarily large gaps between consecutive primes.

Real-valued functions

Assume that a function is defined from a subset of the real numbers to the real numbers. As in the case for sequences, the limit inferior and limit superior are always well-defined if we allow the values +∞ and −∞; in fact, if both agree then the limit exists and is equal to their common value (again possibly including the infinities). For example, given $f (x) = \sin (1 / x)$ , we have $\underset{x \to 0}{lim sup} f (x) = 1$ and $\underset{x \to 0}{lim inf} f (x) = - 1$ . The difference between the two is a rough measure of how "wildly" the function oscillates, and in observation of this fact, it is called the oscillation of f at 0. This idea of oscillation is sufficient to, for example, characterize Riemann-integrable functions as continuous except on a set of measure zero. Note that points of nonzero oscillation (i.e., points at which f is "badly behaved") are discontinuities which, unless they make up a set of zero, are confined to a negligible set.

Functions from topological spaces to complete lattices

Functions from metric spaces

There is a notion of limsup and liminf for functions defined on a metric space whose relationship to limits of real-valued functions mirrors that of the relation between the limsup, liminf, and the limit of a real sequence. Take a metric space $X$ , a subspace $E$ contained in $X$ , and a function $f : E \to R$ . Define, for any limit point $a$ of $E$ ,

\underset{x \to a}{lim sup} f (x) = lim_{ε \to 0} (sup {f (x) : x \in E \cap B (a, ε) ∖ {a}})

and

\underset{x \to a}{lim inf} f (x) = lim_{ε \to 0} (inf {f (x) : x \in E \cap B (a, ε) ∖ {a}})

where $B (a, ε)$ denotes the metric ball of radius $ε$ about $a$ .

Note that as ε shrinks, the supremum of the function over the ball is monotone decreasing, so we have

\underset{x \to a}{lim sup} f (x) = inf_{ε > 0} (sup {f (x) : x \in E \cap B (a, ε) ∖ {a}})

and similarly

\underset{x \to a}{lim inf} f (x) = sup_{ε > 0} (inf {f (x) : x \in E \cap B (a, ε) ∖ {a}}) .

Functions from topological spaces

This finally motivates the definitions for general topological spaces. Take X, E and a as before, but now let X be a topological space. In this case, we replace metric balls with neighborhoods:

\underset{x \to a}{lim sup} f (x) = inf {sup {f (x) : x \in E \cap U ∖ {a}} : U o p e n, a \in U, E \cap U ∖ {a} \neq \emptyset}

\underset{x \to a}{lim inf} f (x) = sup {inf {f (x) : x \in E \cap U ∖ {a}} : U o p e n, a \in U, E \cap U ∖ {a} \neq \emptyset}

(there is a way to write the formula using "lim" using nets and the neighborhood filter). This version is often useful in discussions of semi-continuity which crop up in analysis quite often. An interesting note is that this version subsumes the sequential version by considering sequences as functions from the natural numbers as a topological subspace of the extended real line, into the space (the closure of N in [−∞,∞], the extended real number line, is N ∪ {∞}.)

Sequences of sets

The power set ℘(X) of a set X is a complete lattice that is ordered by set inclusion, and so the supremum and infimum of any set of subsets (in terms of set inclusion) always exist. In particular, every subset Y of X is bounded above by X and below by the empty set ∅ because ∅ ⊆ Y ⊆ X. Hence, it is possible (and sometimes useful) to consider superior and inferior limits of sequences in ℘(X) (i.e., sequences of subsets of X).

There are two common ways to define the limit of sequences of sets. In both cases:

The sequence accumulates around sets of points rather than single points themselves. That is, because each element of the sequence is itself a set, there exist accumulation sets that are somehow nearby to infinitely many elements of the sequence.
The supremum/superior/outer limit is a set that joins these accumulation sets together. That is, it is the union of all of the accumulation sets. When ordering by set inclusion, the supremum limit is the least upper bound on the set of accumulation points because it contains each of them. Hence, it is the supremum of the limit points.
The infimum/inferior/inner limit is a set where all of these accumulation sets meet. That is, it is the intersection of all of the accumulation sets. When ordering by set inclusion, the infimum limit is the greatest lower bound on the set of accumulation points because it is contained in each of them. Hence, it is the infimum of the limit points.
Because ordering is by set inclusion, then the outer limit will always contain the inner limit (i.e., lim inf X_n ⊆ lim sup X_n). Hence, when considering the convergence of a sequence of sets, it generally suffices to consider the convergence of the outer limit of that sequence.

The difference between the two definitions involves how the topology (i.e., how to quantify separation) is defined. In fact, the second definition is identical to the first when the discrete metric is used to induce the topology on X.

General set convergence

A sequence of sets in a metrizable space $X$ approaches a limiting set when the elements of each member of the sequence approach the elements of the limiting set. In particular, if $(X_{n})$ is a sequence of subsets of $X,$ then:

$lim sup X_{n},$ which is also called the outer limit, consists of those elements which are limits of points in $X_{n}$ taken from (countably) infinitely many $n .$ That is, $x \in lim sup X_{n}$ if and only if there exists a sequence of points $(x_{k})$ and a subsequence $(X_{n_{k}})$ of $(X_{n})$ such that $x_{k} \in X_{n_{k}}$ and $lim_{k \to \infty} x_{k} = x .$
$lim inf X_{n},$ which is also called the inner limit, consists of those elements which are limits of points in $X_{n}$ for all but finitely many $n$ (that is, cofinitely many $n$ ). That is, $x \in lim inf X_{n}$ if and only if there exists a sequence of points $(x_{k})$ such that $x_{k} \in X_{k}$ and $lim_{k \to \infty} x_{k} = x .$

The limit $lim X_{n}$ exists if and only if $lim inf X_{n}$ and $lim sup X_{n}$ agree, in which case $lim X_{n} = lim sup X_{n} = lim inf X_{n} .$ The outer and inner limits should not be confused with the set-theoretic limits superior and inferior, as the latter sets are not sensitive to the topological structure of the space.

Special case: discrete metric

This is the definition used in measure theory and probability. Further discussion and examples from the set-theoretic point of view, as opposed to the topological point of view discussed below, are at set-theoretic limit.

By this definition, a sequence of sets approaches a limiting set when the limiting set includes elements which are in all except finitely many sets of the sequence and does not include elements which are in all except finitely many complements of sets of the sequence. That is, this case specializes the general definition when the topology on set X is induced from the discrete metric.

Specifically, for points x, y ∈ X, the discrete metric is defined by

d (x, y) := {\begin{cases} 0 & if x = y, \\ 1 & if x \neq y, \end{cases}

under which a sequence of points (x_k) converges to point x ∈ X if and only if x_k = x for all but finitely many k. Therefore, if the limit set exists it contains the points and only the points which are in all except finitely many of the sets of the sequence. Since convergence in the discrete metric is the strictest form of convergence (i.e., requires the most), this definition of a limit set is the strictest possible.

If (X_n) is a sequence of subsets of X, then the following always exist:

lim sup X_n consists of elements of X which belong to X_n for infinitely many n (see countably infinite). That is, x ∈ lim sup X_n if and only if there exists a subsequence (X_{n_k}) of (X_n) such that x ∈ X_{n_k} for all k.
lim inf X_n consists of elements of X which belong to X_n for all except finitely many n (i.e., for cofinitely many n). That is, x ∈ lim inf X_n if and only if there exists some m > 0 such that x ∈ X_n for all n > m.

Observe that x ∈ lim sup X_n if and only if x ∉ lim inf X_n^c.

lim X_n exists if and only if lim inf X_n and lim sup X_n agree, in which case lim X_n = lim sup X_n = lim inf X_n.

In this sense, the sequence has a limit so long as every point in X either appears in all except finitely many X_n or appears in all except finitely many X_n^c.

Using the standard parlance of set theory, set inclusion provides a partial ordering on the collection of all subsets of X that allows set intersection to generate a greatest lower bound and set union to generate a least upper bound. Thus, the infimum or meet of a collection of subsets is the greatest lower bound while the supremum or join is the least upper bound. In this context, the inner limit, lim inf X_n, is the largest meeting of tails of the sequence, and the outer limit, lim sup X_n, is the smallest joining of tails of the sequence. The following makes this precise.

Let I_n be the meet of the n^th tail of the sequence. That is,

\begin{aligned} I_{n} & = inf {X_{m} : m \in {n, n + 1, n + 2, \dots}} \\ = ⋂_{m = n}^{\infty} X_{m} = X_{n} \cap X_{n + 1} \cap X_{n + 2} \cap \dots . \end{aligned}

The sequence (I_n) is non-decreasing (i.e. I_n ⊆ I_n+1) because each I_n+1 is the intersection of fewer sets than I_n. The least upper bound on this sequence of meets of tails is

\begin{aligned} \underset{n \to \infty}{lim inf} X_{n} & = sup {inf {X_{m} : m \in {n, n + 1, \dots}} : n \in {1, 2, \dots}} \\ = ⋃_{n = 1}^{\infty} (⋂_{m = n}^{\infty} X_{m}) . \end{aligned}

So the limit infimum contains all subsets which are lower bounds for all but finitely many sets of the sequence.

Similarly, let J_n be the join of the n^th tail of the sequence. That is,

\begin{aligned} J_{n} & = sup {X_{m} : m \in {n, n + 1, n + 2, \dots}} \\ = ⋃_{m = n}^{\infty} X_{m} = X_{n} \cup X_{n + 1} \cup X_{n + 2} \cup \dots . \end{aligned}

The sequence (J_n) is non-increasing (i.e. J_n ⊇ J_n+1) because each J_n+1 is the union of fewer sets than J_n. The greatest lower bound on this sequence of joins of tails is

\begin{aligned} \underset{n \to \infty}{lim sup} X_{n} & = inf {sup {X_{m} : m \in {n, n + 1, \dots}} : n \in {1, 2, \dots}} \\ = ⋂_{n = 1}^{\infty} (⋃_{m = n}^{\infty} X_{m}) . \end{aligned}

So the limit supremum is contained in all subsets which are upper bounds for all but finitely many sets of the sequence.

Examples

The following are several set convergence examples. They have been broken into sections with respect to the metric used to induce the topology on set X.

Using the discrete metric

The Borel–Cantelli lemma is an example application of these constructs.

Using either the discrete metric or the Euclidean metric

Consider the set X = {0,1} and the sequence of subsets:

(X_{n}) = ({0}, {1}, {0}, {1}, {0}, {1}, \dots) .

The "odd" and "even" elements of this sequence form two subsequences, ({0}, {0}, {0}, ...) and ({1}, {1}, {1}, ...), which have limit points 0 and 1, respectively, and so the outer or superior limit is the set {0,1} of these two points. However, there are no limit points that can be taken from the (X_n) sequence as a whole, and so the interior or inferior limit is the empty set { }. That is,

lim sup X_n = {0,1}
lim inf X_n = { }

However, for (Y_n) = ({0}, {0}, {0}, ...) and (Z_n) = ({1}, {1}, {1}, ...):

lim sup Y_n = lim inf Y_n = lim Y_n = {0}
lim sup Z_n = lim inf Z_n = lim Z_n = {1}

Consider the set X = {50, 20, −100, −25, 0, 1} and the sequence of subsets:

(X_{n}) = ({50}, {20}, {- 100}, {- 25}, {0}, {1}, {0}, {1}, {0}, {1}, \dots) .

As in the previous two examples,

lim sup X_n = {0,1}
lim inf X_n = { }

That is, the four elements that do not match the pattern do not affect the lim inf and lim sup because there are only finitely many of them. In fact, these elements could be placed anywhere in the sequence. So long as the tails of the sequence are maintained, the outer and inner limits will be unchanged. The related concepts of essential inner and outer limits, which use the essential supremum and essential infimum, provide an important modification that "squashes" countably many (rather than just finitely many) interstitial additions.

Using the Euclidean metric

Consider the sequence of subsets of rational numbers:

(X_{n}) = ({0}, {1}, {1 / 2}, {1 / 2}, {2 / 3}, {1 / 3}, {3 / 4}, {1 / 4}, \dots) .

The "odd" and "even" elements of this sequence form two subsequences, ({0}, {1/2}, {2/3}, {3/4}, ...) and ({1}, {1/2}, {1/3}, {1/4}, ...), which have limit points 1 and 0, respectively, and so the outer or superior limit is the set {0,1} of these two points. However, there are no limit points that can be taken from the (X_n) sequence as a whole, and so the interior or inferior limit is the empty set { }. So, as in the previous example,

lim sup X_n = {0,1}
lim inf X_n = { }

However, for (Y_n) = ({0}, {1/2}, {2/3}, {3/4}, ...) and (Z_n) = ({1}, {1/2}, {1/3}, {1/4}, ...):

lim sup Y_n = lim inf Y_n = lim Y_n = {1}
lim sup Z_n = lim inf Z_n = lim Z_n = {0}

In each of these four cases, the elements of the limiting sets are not elements of any of the sets from the original sequence.

The Ω limit (i.e., limit set) of a solution to a dynamic system is the outer limit of solution trajectories of the system. Because trajectories become closer and closer to this limit set, the tails of these trajectories converge to the limit set.

For example, an LTI system that is the cascade connection of several stable systems with an undamped second-order LTI system (i.e., zero damping ratio) will oscillate endlessly after being perturbed (e.g., an ideal bell after being struck). Hence, if the position and velocity of this system are plotted against each other, trajectories will approach a circle in the state space. This circle, which is the Ω limit set of the system, is the outer limit of solution trajectories of the system. The circle represents the locus of a trajectory corresponding to a pure sinusoidal tone output; that is, the system output approaches/approximates a pure tone.

Generalized definitions

The above definitions are inadequate for many technical applications. In fact, the definitions above are specializations of the following definitions.

Definition for a set

The limit inferior of a set X ⊆ Y is the infimum of all of the limit points of the set. That is,

lim inf X := inf {x \in Y : x is a limit point of X}

Similarly, the limit superior of X is the supremum of all of the limit points of the set. That is,

lim sup X := sup {x \in Y : x is a limit point of X}

Note that the set X needs to be defined as a subset of a partially ordered set Y that is also a topological space in order for these definitions to make sense. Moreover, it has to be a complete lattice so that the suprema and infima always exist. In that case every set has a limit superior and a limit inferior. Also note that the limit inferior and the limit superior of a set do not have to be elements of the set.

Definition for filter bases

Take a topological space X and a filter base B in that space. The set of all cluster points for that filter base is given by

⋂ {{\bar{B}}_{0} : B_{0} \in B}

where ${\bar{B}}_{0}$ is the closure of $B_{0}$ . This is clearly a closed set and is similar to the set of limit points of a set. Assume that X is also a partially ordered set. The limit superior of the filter base B is defined as

lim sup B := sup ⋂ {{\bar{B}}_{0} : B_{0} \in B}

when that supremum exists. When X has a total order, is a complete lattice and has the order topology,

lim sup B = inf {sup B_{0} : B_{0} \in B} .

Similarly, the limit inferior of the filter base B is defined as

lim inf B := inf ⋂ {{\bar{B}}_{0} : B_{0} \in B}

when that infimum exists; if X is totally ordered, is a complete lattice, and has the order topology, then

lim inf B = sup {inf B_{0} : B_{0} \in B} .

If the limit inferior and limit superior agree, then there must be exactly one cluster point and the limit of the filter base is equal to this unique cluster point.

Specialization for sequences and nets

Note that filter bases are generalizations of nets, which are generalizations of sequences. Therefore, these definitions give the limit inferior and limit superior of any net (and thus any sequence) as well. For example, take topological space $X$ and the net $(x_{α})_{α \in A}$ , where $(A, \leq)$ is a directed set and $x_{α} \in X$ for all $α \in A$ . The filter base ("of tails") generated by this net is $B$ defined by

B := {{x_{α} : α_{0} \leq α} : α_{0} \in A} .

Therefore, the limit inferior and limit superior of the net are equal to the limit superior and limit inferior of $B$ respectively. Similarly, for topological space $X$ , take the sequence $(x_{n})$ where $x_{n} \in X$ for any $n \in N$ . The filter base ("of tails") generated by this sequence is $C$ defined by

C := {{x_{n} : n_{0} \leq n} : n_{0} \in N} .

Therefore, the limit inferior and limit superior of the sequence are equal to the limit superior and limit inferior of $C$ respectively.

Search This Blog

Monday, September 25, 2023

Maximum parsimony (phylogenetics)

Alternate characterization and rationale

In detail

Character data

Taxon sampling

Analysis

Problems with maximum parsimony phylogenetic inference

Criticism

Alternatives

Minimum Evolution

Bomber

Classification

Strategic

Tactical

History

Early bombers

Strategic bombing

World War II

Cold War

Modern era

Other uses

Limit inferior and limit superior

Definition for sequences

The case of sequences of real numbers

Interpretation

Properties

Examples

Real-valued functions

Functions from topological spaces to complete lattices

Functions from metric spaces

Functions from topological spaces

Sequences of sets

General set convergence

Special case: discrete metric

Examples

Generalized definitions

Definition for a set

Definition for filter bases

Specialization for sequences and nets

Roman mythology