Simulated Large Hadron ColliderCMS particle detector data depicting a Higgs boson produced by colliding protons decaying into hadron jets and electrons
In theoretical physics, the hierarchy problem is the problem concerning the large discrepancy between aspects of the weak force and gravity. There is no scientific consensus on why, for example, the weak force is 1024 times stronger than gravity.
Technical definition
A hierarchy problem occurs when the fundamental value of some physical parameter, such as a coupling constant or a mass, in some Lagrangian
is vastly different from its effective value, which is the value that
gets measured in an experiment. This happens because the effective value
is related to the fundamental value by a prescription known as renormalization,
which applies corrections to it. Typically the renormalized value of
parameters are close to their fundamental values, but in some cases, it
appears that there has been a delicate cancellation between the
fundamental quantity and the quantum corrections. Hierarchy problems are
related to fine-tuning problems and problems of naturalness. Over the past decade many scientistsargued that the hierarchy problem is a specific application of Bayesian statistics.
Studying renormalization
in hierarchy problems is difficult, because such quantum corrections
are usually power-law divergent, which means that the shortest-distance
physics are most important. Because we do not know the precise details
of the shortest-distance theory of physics,
we cannot even address how this delicate cancellation between two large
terms occurs. Therefore, researchers are led to postulate new physical
phenomena that resolve hierarchy problems without fine-tuning.
Overview
Suppose a physics model requires four parameters which allow it to
produce a very high-quality working model, generating predictions of
some aspect of our physical universe. Suppose we find through
experiments that the parameters have values: 1.2, 1.31, 0.9 and
404,331,557,902,116,024,553,602,703,216.58 (roughly 4×1029).
Scientists might wonder how such figures arise. But in particular, might
be especially curious about a theory where three values are close to
one, and the fourth is so different; in other words, the huge
disproportion we seem to find between the first three parameters and the
fourth. We might also wonder if one force is so much weaker than the
others that it needs a factor of 4×1029 to allow it to be
related to them in terms of effects, how did our universe come to be so
exactly balanced when its forces emerged? In current particle physics,
the differences between some parameters are much larger than this, so
the question is even more noteworthy.
One answer given by philosophers is the anthropic principle.
If the universe came to exist by chance, and perhaps vast numbers of
other universes exist or have existed, then life capable of physics
experiments only arose in universes that by chance had very balanced
forces. All of the universes where the forces were not balanced didn't
develop life capable of asking this question. So if lifeforms like human beings
are aware and capable of asking such a question, humans must have
arisen in a universe having balanced forces, however rare that might be.
A second possible answer is that there is a deeper understanding
of physics that we currently do not possess. There might be parameters
that we can derive physical constants from that have less unbalanced
values, or there might be a model with fewer parameters.
Examples in particle physics
The Higgs mass
In particle physics, the most important hierarchy problem is the question that asks why the weak force is 1024 times as strong as gravity. Both of these forces involve constants of nature, the Fermi constant for the weak force and the Newtonian constant of gravitation for gravity. Furthermore, if the Standard Model
is used to calculate the quantum corrections to Fermi's constant, it
appears that Fermi's constant is surprisingly large and is expected to
be closer to Newton's constant unless there is a delicate cancellation
between the bare value of Fermi's constant and the quantum corrections
to it.
More technically, the question is why the Higgs boson is so much lighter than the Planck mass (or the grand unification energy,
or a heavy neutrino mass scale): one would expect that the large
quantum contributions to the square of the Higgs boson mass would
inevitably make the mass huge, comparable to the scale at which new
physics appears unless there is an incredible fine-tuning cancellation between the quadratic radiative corrections and the bare mass.
The problem cannot even be formulated in the strict context of
the Standard Model, for the Higgs mass cannot be calculated. In a sense,
the problem amounts to the worry that a future theory of fundamental
particles, in which the Higgs boson mass will be calculable, should not
have excessive fine-tunings.
Theoretical solutions
There have been many proposed solutions by many physicists.
UV/IR mixing
In 2019, a pair of researchers proposed that IR/UV mixing resulting in the breakdown of the effectivequantum field theory could resolve the hierarchy problem. In 2021, another group of researchers showed that UV/IR mixing could resolve the hierarchy problem in string theory.
Supersymmetry
Some physicists believe that one may solve the hierarchy problem via supersymmetry.
Supersymmetry can explain how a tiny Higgs mass can be protected from
quantum corrections. Supersymmetry removes the power-law divergences of
the radiative corrections to the Higgs mass and solves the hierarchy
problem as long as the supersymmetric particles are light enough to
satisfy the Barbieri–Giudice criterion. This still leaves open the mu problem, however. The tenets of supersymmetry are being tested at the LHC, although no evidence has been found so far for supersymmetry.
Each particle that couples to the Higgs field has an associated Yukawa coupling λf. The coupling with the Higgs field for fermions gives an interaction term , with being the Dirac field and the Higgs field.
Also, the mass of a fermion is proportional to its Yukawa coupling,
meaning that the Higgs boson will couple most to the most massive
particle. This means that the most significant corrections to the Higgs
mass will originate from the heaviest particles, most prominently the
top quark. By applying the Feynman rules, one gets the quantum corrections to the Higgs mass squared from a fermion to be:
The
is called the ultraviolet cutoff and is the scale up to which the
Standard Model is valid. If we take this scale to be the Planck scale,
then we have the quadratically diverging Lagrangian. However, suppose
there existed two complex scalars (taken to be spin 0) such that:
(the couplings to the Higgs are exactly the same).
Then by the Feynman rules, the correction (from both scalars) is:
(Note that the contribution here is positive. This is because of the
spin-statistics theorem, which means that fermions will have a negative
contribution and bosons a positive contribution. This fact is
exploited.)
This gives a total contribution to the Higgs mass to be zero if we include both the fermionic and bosonic particles. Supersymmetry is an extension of this that creates 'superpartners' for all Standard Model particles.
Conformal
Without supersymmetry, a solution to the hierarchy problem has been proposed using just the Standard Model.
The idea can be traced back to the fact that the term in the Higgs
field that produces the uncontrolled quadratic correction upon
renormalization is the quadratic one. If the Higgs field had no mass
term, then no hierarchy problem arises. But by missing a quadratic term
in the Higgs field, one must find a way to recover the breaking of
electroweak symmetry through a non-null vacuum expectation value. This
can be obtained using the Weinberg–Coleman mechanism
with terms in the Higgs potential arising from quantum corrections.
Mass obtained in this way is far too small with respect to what is seen
in accelerator facilities and so a conformal Standard Model needs more
than one Higgs particle. This proposal has been put forward in 2006 by Krzysztof Antoni Meissner and Hermann Nicolai and is currently under scrutiny. But if no further excitation is observed beyond the one seen so far at LHC, this model would have to be abandoned.
Extra dimensions
No experimental or observational evidence of extra dimensions has been officially reported. Analyses of results from the Large Hadron Collider severely constrain theories with large extra dimensions.
However, extra dimensions could explain why the gravity force is so
weak, and why the expansion of the universe is faster than expected.
If we live in a 3+1 dimensional world, then we calculate the gravitational force via Gauss's law for gravity:
If we extend this idea to extra dimensions, then we get:
(2)
where is the 3+1+
dimensional Planck mass. However, we are assuming that these extra
dimensions are the same size as the normal 3+1 dimensions. Let us say
that the extra dimensions are of size n ≪ than normal dimensions. If we let r'≪n, then we get (2). However, if we let r≫n, then we get our usual Newton's law. However, when r ≫ n,
the flux in the extra dimensions becomes a constant, because there is
no extra room for gravitational flux to flow through. Thus the flux will
be proportional to because this is the flux in the extra dimensions. The formula is:
which gives:
Thus the fundamental Planck mass (the extra-dimensional one) could
actually be small, meaning that gravity is actually strong, but this
must be compensated by the number of the extra dimensions and their
size. Physically, this means that gravity is weak because there is a
loss of flux to the extra dimensions.
This section is adapted from "Quantum Field Theory in a Nutshell" by A. Zee.
In 1998 Nima Arkani-Hamed, Savas Dimopoulos, and Gia Dvali proposed the ADD model, also known as the model with large extra dimensions, an alternative scenario to explain the weakness of gravity relative to the other forces. This theory requires that the fields of the Standard Model are confined to a four-dimensional membrane, while gravity propagates in several additional spatial dimensions that are large compared to the Planck scale.
In 1998–99 Merab Gogberashvili published on arXiv
(and subsequently in peer-reviewed journals) a number of articles where
he showed that if the Universe is considered as a thin shell (a
mathematical synonym
for "brane") expanding in 5-dimensional space then it is possible to
obtain one scale for particle theory corresponding to the 5-dimensional cosmological constant and Universe thickness, and thus to solve the hierarchy problem. It was also shown that four-dimensionality of the Universe is the result of stability requirement since the extra component of the Einstein field equations giving the localized solution for matter fields coincides with one of the conditions of stability.
Subsequently, there were proposed the closely related Randall–Sundrum scenarios which offered their solution to the hierarchy problem.
In physical cosmology, current observations in favor of an accelerating universe imply the existence of a tiny, but nonzero cosmological constant. This problem, called the cosmological constant problem,
is a hierarchy problem very similar to that of the Higgs boson mass
problem, since the cosmological constant is also very sensitive to
quantum corrections, but it is complicated by the necessary involvement
of general relativity in the problem. Proposed solutions to the cosmological constant problem include modifying and/or extending gravity, adding matter with unvanishing pressure, and UV/IR mixing in the Standard Model and gravity. Some physicists have resorted to anthropic reasoning to solve the cosmological constant problem, but it is disputed whether anthropic reasoning is scientific.
In phylogenetics, maximum parsimony is an optimality criterion under which the phylogenetic tree
that minimizes the total number of character-state changes (or
minimizes the cost of differentially weighted character-state changes).
Under the maximum-parsimony criterion, the optimal tree will minimize
the amount of homoplasy (i.e., convergent evolution, parallel evolution, and evolutionary reversals).
In other words, under this criterion, the shortest possible tree that
explains the data is considered best. Some of the basic ideas behind
maximum parsimony were presented by James S. Farris in 1970 and Walter M. Fitch in 1971.
Maximum parsimony is an intuitive and simple criterion, and it is popular for this reason. However, although it is easy to score a phylogenetic tree (by counting the number of character-state changes), there is no algorithm to quickly generate
the most-parsimonious tree. Instead, the most-parsimonious tree must be
sought in "tree space" (i.e., amongst all possible trees). For a small
number of taxa (i.e., fewer than nine) it is possible to do an exhaustive search,
in which every possible tree is scored, and the best one is selected.
For nine to twenty taxa, it will generally be preferable to use branch-and-bound, which is also guaranteed to return the best tree. For greater numbers of taxa, a heuristic search must be performed.
Because the most-parsimonious tree is always the shortest
possible tree, this means that—in comparison to a hypothetical "true"
tree that actually describes the unknown evolutionary history of the
organisms under study—the "best" tree according to the maximum-parsimony
criterion will often underestimate the actual evolutionary change that
could have occurred. In addition, maximum parsimony is not statistically
consistent. That is, it is not guaranteed to produce the true tree with
high probability, given sufficient data. As demonstrated in 1978 by Joe Felsenstein, maximum parsimony can be inconsistent under certain conditions, such as long-branch attraction.
Of course, any phylogenetic algorithm could also be statistically
inconsistent if the model it employs to estimate the preferred tree does
not accurately match the way that evolution occurred in that clade.
This is unknowable. Therefore, while statistical consistency is an
interesting theoretical property, it lies outside the realm of
testability, and is irrelevant to empirical phylogenetic studies.
Alternate characterization and rationale
In
phylogenetics, parsimony is mostly interpreted as favoring the trees
that minimize the amount of evolutionary change required (see for
example ).
Alternatively, phylogenetic parsimony can be characterized as favoring
the trees that maximize explanatory power by minimizing the number of
observed similarities that cannot be explained by inheritance and common
descent.
Minimization of required evolutionary change on the one hand and
maximization of observed similarities that can be explained as homology
on the other may result in different preferred trees when some observed
features are not applicable in some groups that are included in the
tree, and the latter can be seen as the more general approach.
While evolution is not an inherently parsimonious process,
centuries of scientific experience lend support to the aforementioned
principle of parsimony (Occam's razor).
Namely, the supposition of a simpler, more parsimonious chain of
events is preferable to the supposition of a more complicated, less
parsimonious chain of events. Hence, parsimony (sensu lato) is typically sought in inferring phylogenetic trees, and in scientific explanation generally.
In detail
Parsimony is part of a class of character-based tree estimation methods which use a matrix of discrete phylogenetic characters and character states to infer one or more optimal phylogenetic trees for a set of taxa, commonly a set of species or reproductively isolated populations of a single species. These methods operate by evaluating candidate phylogenetic trees according to an explicit optimality criterion;
the tree with the most favorable score is taken as the best hypothesis
of the phylogenetic relationships of the included taxa. Maximum
parsimony is used with most kinds of phylogenetic data; until recently,
it was the only widely used character-based tree estimation method used
for morphological data.
Inferring phylogenies is not a trivial problem. A huge number of
possible phylogenetic trees exist for any reasonably sized set of taxa;
for example, a mere ten species gives over two million possible unrooted
trees. These possibilities must be searched to find a tree that best
fits the data according to the optimality criterion. However, the data
themselves do not lead to a simple, arithmetic solution to the problem.
Ideally, we would expect the distribution of whatever evolutionary
characters (such as phenotypic traits or alleles)
to directly follow the branching pattern of evolution. Thus we could
say that if two organisms possess a shared character, they should be
more closely related to each other than to a third organism that lacks
this character (provided that character was not present in the last
common ancestor of all three, in which case it would be a symplesiomorphy).
We would predict that bats and monkeys are more closely related to each
other than either is to an elephant, because male bats and monkeys
possess external testicles,
which elephants lack. However, we cannot say that bats and monkeys are
more closely related to one another than they are to whales, though the
two have external testicles absent in whales, because we believe that
the males in the last common ancestral species of the three had external
testicles.
However, the phenomena of convergent evolution, parallel evolution, and evolutionary reversals (collectively termed homoplasy)
add an unpleasant wrinkle to the problem of inferring phylogeny. For a
number of reasons, two organisms can possess a trait inferred to have
not been present in their last common ancestor: If we naively took the
presence of this trait as evidence of a relationship, we would infer an
incorrect tree. Empirical phylogenetic data may include substantial
homoplasy, with different parts of the data suggesting sometimes very
different relationships. Methods used to estimate phylogenetic trees are
explicitly intended to resolve the conflict within the data by picking
the phylogenetic tree that is the best fit to all the data overall,
accepting that some data simply will not fit. It is often mistakenly
believed that parsimony assumes that convergence is rare; in fact, even
convergently derived characters have some value in
maximum-parsimony-based phylogenetic analyses, and the prevalence of
convergence does not systematically affect the outcome of
parsimony-based methods.
Data that do not fit a tree perfectly are not simply "noise",
they can contain relevant phylogenetic signal in some parts of a tree,
even if they conflict with the tree overall. In the whale example given
above, the lack of external testicles in whales is homoplastic: It
reflects a return to the condition inferred to have been present in
ancient ancestors of mammals, whose testicles were internal. This
inferred similarity between whales and ancient mammal ancestors is in
conflict with the tree we accept based on the weight of other
characters, since it implies that the mammals with external testicles
should form a group excluding whales. However, among the whales, the
reversal to internal testicles actually correctly associates the various
types of whales (including dolphins and porpoises) into the group Cetacea.
Still, the determination of the best-fitting tree—and thus which data
do not fit the tree—is a complex process. Maximum parsimony is one
method developed to do this.
Character data
The
input data used in a maximum parsimony analysis is in the form of
"characters" for a range of taxa. There is no generally agreed-upon
definition of a phylogenetic character, but operationally a character
can be thought of as an attribute, an axis along which taxa are observed
to vary. These attributes can be physical (morphological), molecular,
genetic, physiological, or behavioral. The only widespread agreement on
characters seems to be that variation used for character analysis should
reflect heritable variation.
Whether it must be directly heritable, or whether indirect inheritance
(e.g., learned behaviors) is acceptable, is not entirely resolved.
Each character is divided into discrete character states,
into which the variations observed are classified. Character states are
often formulated as descriptors, describing the condition of the
character substrate. For example, the character "eye color" might have
the states "blue" and "brown." Characters can have two or more states
(they can have only one, but these characters lend nothing to a maximum
parsimony analysis, and are often excluded).
Coding characters for phylogenetic analysis is not an exact
science, and there are numerous complicating issues. Typically, taxa are
scored with the same state if they are more similar to one another in
that particular attribute than each is to taxa scored with a different
state. This is not straightforward when character states are not clearly
delineated or when they fail to capture all of the possible variation
in a character. How would one score the previously mentioned character
for a taxon (or individual) with hazel eyes? Or green? As noted above,
character coding is generally based on similarity: Hazel and green eyes
might be lumped with blue because they are more similar to that color
(being light), and the character could be then recoded as "eye color:
light; dark." Alternatively, there can be multi-state characters, such
as "eye color: brown; hazel, blue; green."
Ambiguities in character state delineation and scoring can be a
major source of confusion, dispute, and error in phylogenetic analysis
using character data. Note that, in the above example, "eyes: present;
absent" is also a possible character, which creates issues because "eye
color" is not applicable if eyes are not present. For such situations, a
"?" ("unknown") is scored, although sometimes "X" or "-" (the latter
usually in sequence
data) are used to distinguish cases where a character cannot be scored
from a case where the state is simply unknown. Current implementations
of maximum parsimony generally treat unknown values in the same manner:
the reasons the data are unknown have no particular effect on analysis.
Effectively, the program treats a ? as if it held the state that would
involve the fewest extra steps in the tree (see below), although this is
not an explicit step in the algorithm.
Genetic data are particularly amenable to character-based
phylogenetic methods such as maximum parsimony because protein and
nucleotide sequences are naturally discrete: A particular position in a nucleotide sequence can be either adenine, cytosine, guanine, or thymine / uracil, or a sequence gap; a position (residue) in a protein sequence will be one of the basic amino acids or a sequence gap. Thus, character scoring is rarely ambiguous, except in cases where sequencing
methods fail to produce a definitive assignment for a particular
sequence position. Sequence gaps are sometimes treated as characters,
although there is no consensus on how they should be coded.
Characters can be treated as unordered or ordered. For a binary
(two-state) character, this makes little difference. For a multi-state
character, unordered characters can be thought of as having an equal
"cost" (in terms of number of "evolutionary events") to change from any
one state to any other; complementarily, they do not require passing
through intermediate states. Ordered characters have a particular
sequence in which the states must occur through evolution, such that
going between some states requires passing through an intermediate. This
can be thought of complementarily as having different costs to pass
between different pairs of states. In the eye-color example above, it is
possible to leave it unordered, which imposes the same evolutionary
"cost" to go from brown-blue, green-blue, green-hazel, etc.
Alternatively, it could be ordered brown-hazel-green-blue; this would
normally imply that it would cost two evolutionary events to go from
brown-green, three from brown-blue, but only one from brown-hazel. This
can also be thought of as requiring eyes to evolve through a "hazel
stage" to get from brown to green, and a "green stage" to get from hazel
to blue, etc. For many characters, it is not obvious if and how they
should be ordered. On the contrary, for characters that represent
discretization of an underlying continuous variable, like shape, size,
and ratio characters, ordering is logical,
and simulations have shown that this improves ability to recover
correct clades, while decreasing the recovering of erroneous clades.
There is a lively debate on the utility and appropriateness of
character ordering, but no consensus. Some authorities order characters
when there is a clear logical, ontogenetic,
or evolutionary transition among the states (for example, "legs: short;
medium; long"). Some accept only some of these criteria. Some run an
unordered analysis, and order characters that show a clear order of
transition in the resulting tree (which practice might be accused of circular reasoning).
Some authorities refuse to order characters at all, suggesting that it
biases an analysis to require evolutionary transitions to follow a
particular path.
It is also possible to apply differential weighting to individual
characters. This is usually done relative to a "cost" of 1. Thus, some
characters might be seen as more likely to reflect the true evolutionary
relationships among taxa, and thus they might be weighted at a value 2
or more; changes in these characters would then count as two
evolutionary "steps" rather than one when calculating tree scores (see
below). There has been much discussion in the past about character
weighting. Most authorities now weight all characters equally, although
exceptions are common. For example, allele frequency
data is sometimes pooled in bins and scored as an ordered character. In
these cases, the character itself is often downweighted so that small
changes in allele frequencies count less than major changes in other
characters. Also, the third codon position in a coding nucleotide sequence
is particularly labile, and is sometimes downweighted, or given a
weight of 0, on the assumption that it is more likely to exhibit
homoplasy. In some cases, repeated analyses are run, with characters
reweighted in inverse proportion to the degree of homoplasy discovered in the previous analysis (termed successive weighting); this is another technique that might be considered circular reasoning.
Character state changes can also be weighted individually. This is often done for nucleotide sequence
data; it has been empirically determined that certain base changes
(A-C, A-T, G-C, G-T, and the reverse changes) occur much less often than
others (A-G, C-T, and their reverse changes). These changes are
therefore often weighted more. As shown above in the discussion of
character ordering, ordered characters can be thought of as a form of
character state weighting.
Some systematists prefer to exclude characters known to be, or
suspected to be, highly homoplastic or that have a large number of
unknown entries ("?"). As noted below, theoretical and simulation work
has demonstrated that this is likely to sacrifice accuracy rather than
improve it. This is also the case with characters that are variable in
the terminal taxa: theoretical, congruence, and simulation studies have
all demonstrated that such polymorphic characters contain significant
phylogenetic information.
Taxon sampling
The time required for a parsimony analysis (or any phylogenetic analysis) is proportional to the number of taxa
(and characters) included in the analysis. Also, because more taxa
require more branches to be estimated, more uncertainty may be expected
in large analyses. Because data collection costs in time and money often
scale directly with the number of taxa included, most analyses include
only a fraction of the taxa that could have been sampled. Indeed, some
authors have contended that four taxa (the minimum required to produce a
meaningful unrooted tree) are all that is necessary for accurate
phylogenetic analysis, and that more characters are more valuable than
more taxa in phylogenetics. This has led to a raging controversy about
taxon sampling.
Empirical, theoretical, and simulation studies have led to a
number of dramatic demonstrations of the importance of adequate taxon
sampling. Most of these can be summarized by a simple observation: a
phylogenetic data matrix has dimensions of characters times taxa.
Doubling the number of taxa doubles the amount of information in a
matrix just as surely as doubling the number of characters. Each taxon
represents a new sample for every character, but, more importantly, it
(usually) represents a new combination of character states. These
character states can not only determine where that taxon is placed on
the tree, they can inform the entire analysis, possibly causing
different relationships among the remaining taxa to be favored by
changing estimates of the pattern of character changes.
The most disturbing weakness of parsimony analysis, that of long-branch attraction
(see below) is particularly pronounced with poor taxon sampling,
especially in the four-taxon case. This is a well-understood case in
which additional character sampling may not improve the quality of the
estimate. As taxa are added, they often break up long branches
(especially in the case of fossils), effectively improving the
estimation of character state changes along them. Because of the
richness of information added by taxon sampling, it is even possible to
produce highly accurate estimates of phylogenies with hundreds of taxa
using only a few thousand characters.
Although many studies have been performed, there is still much
work to be done on taxon sampling strategies. Because of advances in
computer performance, and the reduced cost and increased automation of
molecular sequencing, sample sizes overall are on the rise, and studies
addressing the relationships of hundreds of taxa (or other terminal
entities, such as genes) are becoming common. Of course, this is not to
say that adding characters is not also useful; the number of characters
is increasing as well.
Some systematists prefer to exclude taxa based on the number of
unknown character entries ("?") they exhibit, or because they tend to
"jump around" the tree in analyses (i.e., they are "wildcards"). As
noted below, theoretical and simulation work has demonstrated that this
is likely to sacrifice accuracy rather than improve it. Although these
taxa may generate more most-parsimonious trees (see below), methods such
as agreement subtrees and reduced consensus can still extract
information on the relationships of interest.
It has been observed that inclusion of more taxa tends to lower overall support values (bootstrap
percentages or decay indices, see below). The cause of this is clear:
as additional taxa are added to a tree, they subdivide the branches to
which they attach, and thus dilute the information that supports that
branch. While support for individual branches is reduced, support for
the overall relationships is actually increased. Consider analysis that
produces the following tree: (fish, (lizard, (whale, (cat, monkey)))).
Adding a rat and a walrus will probably reduce the support for the
(whale, (cat, monkey)) clade, because the rat and the walrus may fall
within this clade, or outside of the clade, and since these five animals
are all relatively closely related, there should be more uncertainty
about their relationships. Within error, it may be impossible to
determine any of these animals' relationships relative to one another.
However, the rat and the walrus will probably add character data that
cements the grouping any two of these mammals exclusive of the fish or
the lizard; where the initial analysis might have been misled, say, by
the presence of fins in the fish and the whale, the presence of the
walrus, with blubber and fins like a whale but whiskers like a cat and a
rat, firmly ties the whale to the mammals.
To cope with this problem, agreement subtrees, reduced consensus, and double-decay analysis
seek to identify supported relationships (in the form of "n-taxon
statements," such as the four-taxon statement "(fish, (lizard, (cat,
whale)))") rather than whole trees. If the goal of an analysis is a
resolved tree, as is the case for comparative phylogenetics,
these methods cannot solve the problem. However, if the tree estimate
is so poorly supported, the results of any analysis derived from the
tree will probably be too suspect to use anyway.
Analysis
A
maximum parsimony analysis runs in a very straightforward fashion. Trees
are scored according to the degree to which they imply a parsimonious
distribution of the character data. The most parsimonious tree for the
dataset represents the preferred hypothesis of relationships among the
taxa in the analysis.
Trees are scored (evaluated) by using a simple algorithm to
determine how many "steps" (evolutionary transitions) are required to
explain the distribution of each character. A step is, in essence, a
change from one character state to another, although with ordered
characters some transitions require more than one step. Contrary to
popular belief, the algorithm does not explicitly assign particular
character states to nodes (branch junctions) on a tree: the fewest steps
can involve multiple, equally costly assignments and distributions of
evolutionary transitions. What is optimized is the total number of
changes.
There are many more possible phylogenetic trees
than can be searched exhaustively for more than eight taxa or so. A
number of algorithms are therefore used to search among the possible
trees. Many of these involve taking an initial tree (usually the favored
tree from the last iteration of the algorithm), and perturbing it to
see if the change produces a higher score.
The trees resulting from parsimony search are unrooted: They show
all the possible relationships of the included taxa, but they lack any
statement on relative times of divergence. A particular branch is chosen
to root the tree by the user. This branch is then taken to be outside
all the other branches of the tree, which together form a monophyletic
group. This imparts a sense of relative time to the tree. Incorrect
choice of a root can result in incorrect relationships on the tree, even
if the tree is itself correct in its unrooted form.
Parsimony analysis often returns a number of equally
most-parsimonious trees (MPTs). A large number of MPTs is often seen as
an analytical failure, and is widely believed to be related to the
number of missing entries ("?") in the dataset, characters showing too
much homoplasy, or the presence of topologically labile "wildcard" taxa
(which may have many missing entries). Numerous methods have been
proposed to reduce the number of MPTs, including removing characters or
taxa with large amounts of missing data before analysis, removing or
downweighting highly homoplastic characters (successive weighting) or removing wildcard taxa (the phylogenetic trunk method) a posteriori and then reanalyzing the data.
Numerous theoretical and simulation studies have demonstrated
that highly homoplastic characters, characters and taxa with abundant
missing data, and "wildcard" taxa contribute to the analysis. Although
excluding characters or taxa may appear to improve resolution, the
resulting tree is based on less data, and is therefore a less reliable
estimate of the phylogeny (unless the characters or taxa are non
informative, see safe taxonomic reduction).
Today's general consensus is that having multiple MPTs is a valid
analytical result; it simply indicates that there is insufficient data
to resolve the tree completely. In many cases, there is substantial
common structure in the MPTs, and differences are slight and involve
uncertainty in the placement of a few taxa. There are a number of
methods for summarizing the relationships within this set, including consensus trees, which show common relationships among all the taxa, and pruned agreement subtrees, which show common structure by temporarily pruning "wildcard" taxa from every tree until they all agree. Reduced consensus takes this one step further, by showing all subtrees (and therefore all relationships) supported by the input trees.
Even if multiple MPTs are returned, parsimony analysis still basically produces a point-estimate, lacking confidence intervals
of any sort. This has often been levelled as a criticism, since there
is certainly error in estimating the most-parsimonious tree, and the
method does not inherently include any means of establishing how
sensitive its conclusions are to this error. Several methods have been
used to assess support.
Jackknifing and bootstrapping, well-known statistical resampling
procedures, have been employed with parsimony analysis. The jackknife,
which involves resampling without replacement ("leave-one-out") can be
employed on characters or taxa; interpretation may become complicated in
the latter case, because the variable of interest is the tree, and
comparison of trees with different taxa is not straightforward. The
bootstrap, resampling with replacement (sample x items randomly out of a
sample of size x, but items can be picked multiple times), is only used
on characters, because adding duplicate taxa does not change the result
of a parsimony analysis. The bootstrap is much more commonly employed
in phylogenetics (as elsewhere); both methods involve an arbitrary but
large number of repeated iterations involving perturbation of the
original data followed by analysis. The resulting MPTs from each
analysis are pooled, and the results are usually presented on a 50% Majority Rule Consensus
tree, with individual branches (or nodes) labelled with the percentage
of bootstrap MPTs in which they appear. This "bootstrap percentage"
(which is not a P-value,
as is sometimes claimed) is used as a measure of support. Technically,
it is supposed to be a measure of repeatability, the probability that
that branch (node, clade) would be recovered if the taxa were sampled
again. Experimental tests with viral phylogenies suggest that the
bootstrap percentage is not a good estimator of repeatability for
phylogenetics, but it is a reasonable estimator of accuracy.[citation needed]
In fact, it has been shown that the bootstrap percentage, as an
estimator of accuracy, is biased, and that this bias results on average
in an underestimate of confidence (such that as little as 70% support
might really indicate up to 95% confidence). However, the direction of
bias cannot be ascertained in individual cases, so assuming that high
values bootstrap support indicate even higher confidence is unwarranted.
Another means of assessing support is Bremer support, or the decay index
which is a parameter of a given data set, rather than an estimate based
on pseudoreplicated subsamples, as are the bootstrap and jackknife
procedures described above. Bremer support (also known as branch
support) is simply the difference in number of steps between the score
of the MPT(s), and the score of the most parsimonious tree that does not
contain a particular clade (node, branch). It can be thought of as the
number of steps you have to add to lose that clade; implicitly, it is
meant to suggest how great the error in the estimate of the score of the
MPT must be for the clade to no longer be supported by the analysis,
although this is not necessarily what it does. Branch support values are
often fairly low for modestly-sized data sets (one or two steps being
typical), but they often appear to be proportional to bootstrap
percentages. As data matrices become larger, branch support values often
continue to increase as bootstrap values plateau at 100%. Thus, for
large data matrices, branch support values may provide a more
informative means to compare support for strongly-supported branches.
However, interpretation of decay values is not straightforward, and
they seem to be preferred by authors with philosophical objections to
the bootstrap (although many morphological systematists, especially
paleontologists, report both). Double-decay analysis is a decay counterpart to reduced consensus that evaluates the decay index for all possible subtree relationships (n-taxon statements) within a tree.
Problems with maximum parsimony phylogenetic inference
Maximum parsimony is an epistemologically straightforward approach
that makes few mechanistic assumptions, and is popular for this reason.
However, it may not be statistically consistent
under certain circumstances. Consistency, here meaning the monotonic
convergence on the correct answer with the addition of more data, is a
desirable property of statistical methods. As demonstrated in 1978 by Joe Felsenstein,
maximum parsimony can be inconsistent under certain conditions. The
category of situations in which this is known to occur is called long branch attraction,
and occurs, for example, where there are long branches (a high level of
substitutions) for two characters (A & C), but short branches for
another two (B & D). A and B diverged from a common ancestor, as did
C and D. Of course, to know that a method is giving you the wrong
answer, you would need to know what the correct answer is. This is
generally not the case in science. For this reason, some view
statistical consistency as irrelevant to empirical phylogenetic
questions.
Assume for simplicity that we are considering a single binary
character (it can either be + or -). Because the distance from B to D is
small, in the vast majority of all cases, B and D will be the same.
Here, we will assume that they are both + (+ and - are assigned
arbitrarily and swapping them is only a matter of definition). If this
is the case, there are four remaining possibilities. A and C can both be
+, in which case all taxa are the same and all the trees have the same
length. A can be + and C can be -, in which case only one character is
different, and we cannot learn anything, as all trees have the same
length. Similarly, A can be - and C can be +. The only remaining
possibility is that A and C are both -. In this case, however, the
evidence suggests that A and C group together, and B and D together. As a
consequence, if the "true tree" is a tree of this type, the more data
we collect (i.e. the more characters we study), the more the evidence
will support the wrong tree. Of course, except in mathematical
simulations, we never know what the "true tree" is. Thus, unless we are
able to devise a model that is guaranteed to accurately recover the
"true tree," any other optimality criterion or weighting scheme could
also, in principle, be statistically inconsistent. The bottom line is,
that while statistical inconsistency is an interesting theoretical
issue, it is empirically a purely metaphysical concern, outside the
realm of empirical testing. Any method could be inconsistent, and there
is no way to know for certain whether it is, or not. It is for this
reason that many systematists characterize their phylogenetic results as
hypotheses of relationship.
Another complication with maximum parsimony, and other
optimaltiy-criterion based phylogenetic methods, is that finding the
shortest tree is an NP-hard problem.
The only currently available, efficient way of obtaining a solution,
given an arbitrarily large set of taxa, is by using heuristic methods
which do not guarantee that the shortest tree will be recovered. These
methods employ hill-climbing algorithms
to progressively approach the best tree. However, it has been shown
that there can be "tree islands" of suboptimal solutions, and the
analysis can become trapped in these local optima.
Thus, complex, flexible heuristics are required to ensure that tree
space has been adequately explored. Several heuristics are available,
including nearest neighbor interchange (NNI), tree bisection reconnection (TBR), and the parsimony ratchet.
Criticism
It has been asserted that a major problem, especially for paleontology,
is that maximum parsimony assumes that the only way two species can
share the same nucleotide at the same position is if they are
genetically related. This asserts that phylogenetic applications of parsimony assume that all similarity is homologous (other interpretations, such as the assertion that two organisms might not
be related at all, are nonsensical). This is emphatically not the case:
as with any form of character-based phylogeny estimation, parsimony is
used to test the homologous nature of similarities by finding the
phylogenetic tree which best accounts for all of the similarities.
It is often stated that parsimony is not relevant to phylogenetic inference because "evolution is not parsimonious."
In most cases, there is no explicit alternative proposed; if no
alternative is available, any statistical method is preferable to none
at all. Additionally, it is not clear what would be meant if the
statement "evolution is parsimonious" were in fact true. This could be
taken to mean that more character changes may have occurred historically
than are predicted using the parsimony criterion. Because parsimony
phylogeny estimation reconstructs the minimum number of changes
necessary to explain a tree, this is quite possible. However, it has
been shown through simulation studies, testing with known in vitro
viral phylogenies, and congruence with other methods, that the accuracy
of parsimony is in most cases not compromised by this. Parsimony
analysis uses the number of character changes on trees to choose the
best tree, but it does not require that exactly that many changes, and
no more, produced the tree. As long as the changes that have not been
accounted for are randomly distributed over the tree (a reasonable null
expectation), the result should not be biased. In practice, the
technique is robust: maximum parsimony exhibits minimal bias as a result
of choosing the tree with the fewest changes.
An analogy can be drawn with choosing among contractors based on
their initial (nonbinding) estimate of the cost of a job. The actual
finished cost is very likely to be higher than the estimate. Despite
this, choosing the contractor who furnished the lowest estimate should
theoretically result in the lowest final project cost. This is because,
in the absence of other data, we would assume that all of the relevant
contractors have the same risk of cost overruns. In practice, of course,
unscrupulous business practices may bias this result; in phylogenetics,
too, some particular phylogenetic problems (for example, long branch attraction,
described above) may potentially bias results. In both cases, however,
there is no way to tell if the result is going to be biased, or the
degree to which it will be biased, based on the estimate itself. With
parsimony too, there is no way to tell that the data are positively
misleading, without comparison to other evidence.
Parsimony is often characterized as implicitly adopting the
position that evolutionary change is rare, or that homoplasy
(convergence and reversal) is minimal in evolution. This is not entirely
true: parsimony minimizes the number of convergences and reversals that
are assumed by the preferred tree, but this may result in a relatively
large number of such homoplastic events. It would be more appropriate to
say that parsimony assumes only the minimum amount of change implied by
the data. As above, this does not require that these were the only
changes that occurred; it simply does not infer changes for which there
is no evidence. The shorthand for describing this, to paraphrase Farris is that "parsimony minimizes assumed homoplasies, it does not assume that homoplasy is minimal."
Recent simulation studies suggest that parsimony may be less
accurate than trees built using Bayesian approaches for morphological
data, potentially due to overprecision, although this has been disputed.
Studies using novel simulation methods have demonstrated that
differences between inference methods result from the search strategy
and consensus method employed, rather than the optimization used.
Also, analyses of 38 molecular and 86 morphological empirical datasets
have shown that the common mechanism assumed by the evolutionary models
used in model-based phylogenetics apply to most molecular, but few
morphological datasets.
This finding validates the use of model-based phylogenetics for
molecular data, but suggests that for morphological data, parsimony
remains advantageous, at least until more sophisticated models become
available for phenotypic data.
There are several other methods for inferring phylogenies based on discrete character data, including maximum likelihood and Bayesian inference.
Each offers potential advantages and disadvantages. In practice, these
methods tend to favor trees that are very similar to the most
parsimonious tree(s) for the same dataset; however, they allow for complex modelling of evolutionary processes, and as classes of methods are statistically consistent and are not susceptible to long-branch attraction. Note, however, that the performance of likelihood and Bayesian methods are dependent on the quality of the particular model of evolution
employed; an incorrect model can produce a biased result - just like
parsimony. In addition, they are still quite computationally slow
relative to parsimony methods, sometimes requiring weeks to run large
datasets. Most of these methods have particularly avid proponents and
detractors; parsimony especially has been advocated as philosophically
superior (most notably by ardent cladists).
One area where parsimony still holds much sway is in the analysis of
morphological data, because—until recently—stochastic models of
character change were not available for non-molecular data, and they are
still not widely implemented. Parsimony has also recently been shown to
be more likely to recover the true tree in the face of profound changes
in evolutionary ("model") parameters (e.g., the rate of evolutionary
change) within a tree.
Distance matrices can also be used to generate phylogenetic trees. Non-parametric distance methods were originally applied to phenetic data using a matrix of pairwise distances and reconciled to produce a tree. The distance matrix can come from a number of different sources, including immunological distance, morphometric analysis, and genetic distances.
For phylogenetic character data, raw distance values can be calculated
by simply counting the number of pairwise differences in character
states (Manhattan distance)
or by applying a model of evolution. Notably, distance methods also
allow use of data that may not be easily converted to character data,
such as DNA-DNA hybridization
assays. Today, distance-based methods are often frowned upon because
phylogenetically-informative data can be lost when converting characters
to distances. There are a number of distance-matrix methods and
optimality criteria, of which the minimum evolution criterion is most closely related to maximum parsimony.
From among the distance methods, there exists a phylogenetic estimation criterion, known as Minimum Evolution
(ME), that shares with maximum-parsimony the aspect of searching for
the phylogeny that has the shortest total sum of branch lengths.
A subtle difference distinguishes the maximum-parsimony criterion
from the ME criterion: while maximum-parsimony is based on an abductive
heuristic, i.e., the plausibility of the simplest evolutionary
hypothesis of taxa with respect to the more complex ones, the ME
criterion is based on Kidd and Sgaramella-Zonta's conjectures (proven
true 22 years later by Rzhetsky and Nei)
stating that if the evolutionary distances from taxa were unbiased
estimates of the true evolutionary distances then the true phylogeny of
taxa would have a length shorter than any other alternative phylogeny
compatible with those distances. Rzhetsky and Nei's results set the ME
criterion free from the Occam's razor principle and confer it a solid theoretical and quantitative basis.
There are two major classifications of bomber: strategic and tactical. Strategic bombing is done by heavy bombers primarily designed for long-range bombing missions against strategic targets
to diminish the enemy's ability to wage war by limiting access to
resources through crippling infrastructure or reducing industrial
output. Tactical bombing is aimed at countering enemy military activity
and in supporting offensive operations, and is typically assigned to
smaller aircraft operating at shorter ranges, typically near the troops
on the ground or against enemy shipping.
During WWII
with engine power as a major limitation, combined with the desire for
accuracy and other operational factors, bomber designs tended to be
tailored to specific roles. Early in the Cold War however, bombers were the only means of carrying nuclear weapons to enemy targets, and held the role of deterrence.
With the advent of guided air-to-air missiles, bombers needed to avoid
interception. High-speed and high-altitude flying became a means of
evading detection and attack. With the advent of ICBMs the role of the bomber was brought to a more tactical focus in close air support roles, and a focus on stealth technology for strategic bombers.
The
first use of an air-dropped bomb (actually four hand grenades specially
manufactured by the Italian naval arsenal) was carried out by Italian
Second Lieutenant Giulio Gavotti on 1 November 1911 during the Italo-Turkish war in Libya
– although his plane was not designed for the task of bombing, and his
improvised attacks on Ottoman positions had little impact. These picric acid-filled steel spheres were nicknamed "ballerinas" from the fluttering fabric ribbons attached.
Early bombers
On 16 October 1912, Bulgarian observer Prodan Tarakchiev dropped two of those bombs on the Turkish railway station of Karağaç (near the besieged Edirne) from an Albatros F.2 aircraft piloted by Radul Milkov, during the First Balkan War. This is deemed to be the first use of an aircraft as a bomber.
The first heavier-than-air aircraft purposely designed for bombing were the ItalianCaproni Ca 30 and BritishBristol T.B.8, both of 1913. The Bristol T.B.8 was an early British single enginedbiplane built by the Bristol Aeroplane Company. They were fitted with a prismatic Bombsight in the front cockpit
and a cylindrical bomb carrier in the lower forward fuselage capable of
carrying twelve 10 lb (4.5 kg) bombs, which could be dropped singly or
as a salvo as required.
The aircraft was purchased for use both by the Royal Naval Air Service and the Royal Flying Corps (RFC), and three T.B.8s, that were being displayed in Paris during December 1913 fitted with bombing equipment, were sent to France following the outbreak of war. Under the command of Charles Rumney Samson, a bombing attack on German gun batteries at Middelkerke, Belgium was executed on 25 November 1914.
The dirigible, or airship, was developed in the early 20th
century. Early airships were prone to disaster, but slowly the airship
became more dependable, with a more rigid structure and stronger skin.
Prior to the outbreak of war, Zeppelins, a larger and more streamlined form of airship designed by German Count Ferdinand von Zeppelin,
were outfitted to carry bombs to attack targets at long range. These
were the first long range, strategic bombers. Although the German air
arm was strong, with a total of 123 airships by the end of the war, they
were vulnerable to attack and engine failure, as well as navigational
issues. German airships inflicted little damage on all 51 raids, with
557 Britons killed and 1,358 injured. The German Navy lost 53 of its 73
airships, and the German Army lost 26 of its 50 ships.
The Caproni Ca 30 was built by Gianni Caproni in Italy. It was a twin-boom biplane with three 67 kW (80 hp) Gnome rotary engines and first flew in October 1914.
Test flights revealed power to be insufficient and the engine layout
unworkable, and Caproni soon adopted a more conventional approach
installing three 81 kW (110 hp) Fiat A.10s. The improved design was bought by the Italian Army and it was delivered in quantity from August 1915.
While mainly used as a trainer, Avro 504s were also briefly used as bombers at the start of the First World War by the Royal Naval Air Service (RNAS) when they were used for raids on the German airship sheds.
Strategic bombing
Bombing raids and interdiction operations were mainly carried out by French and British forces during the War as the German air arm
was forced to concentrate its resources on a defensive strategy.
Notably, bombing campaigns formed a part of the British offensive at the
Battle of Neuve Chapelle in 1915, with Royal Flying Corps squadrons attacking German railway
stations in an attempt to hinder the logistical supply of the German army.
The early, improvised attempts at bombing that characterized the early
part of the war slowly gave way to a more organized and systematic
approach to strategic and tactical bombing, pioneered by various air
power strategists of the Entente, especially Major Hugh Trenchard;
he was the first to advocate that there should be "... sustained
[strategic bombing] attacks with a view to interrupting the enemy's
railway communications ... in conjunction with the main operations of
the Allied Armies."
When the war started, bombing was very crude (hand-held bombs
were thrown over the side) yet by the end of the war long-range bombers
equipped with complex mechanical bombing computers were being built,
designed to carry large loads to destroy enemy industrial targets. The
most important bombers used in World War I were the French Breguet 14, British de Havilland DH-4, German Albatros C.III and Russian
Sikorsky Ilya Muromets. The RussianSikorsky Ilya Muromets, was the first four-engine bomber to equip a dedicated strategic bombing unit during World War I.
This heavy bomber was unrivaled in the early stages of the war, as the
Central Powers had no comparable aircraft until much later.
Long range bombing raids were carried out at night by multi-engine biplanes such as the Gotha G.IV (whose name was synonymous with all multi-engine German bombers) and later the Handley Page Type O;
the majority of bombing was done by single-engined biplanes with one or
two crew members flying short distances to attack enemy lines and
immediate hinterland. As the effectiveness of a bomber was dependent on
the weight and accuracy of its bomb load, ever larger bombers were
developed starting in World War I, while considerable money was spent
developing suitable bombsights.
World War II
With
engine power as a major limitation, combined with the desire for
accuracy and other operational factors, bomber designs tended to be
tailored to specific roles. By the start of the war this included:
dive bomber – specially strengthened for vertical diving attacks for greater accuracy
torpedo bomber – specialized aircraft armed with torpedoes
ground attack aircraft – aircraft used against targets on a battlefield such as troop or tank concentrations
night bomber – specially equipped to operate at night when opposing defences are limited
maritime patrol – long range bombers that were used against enemy shipping, particularly submarines
fighter-bomber – a modified fighter aircraft used as a light bomber
Bombers of this era were not intended to attack other aircraft
although most were fitted with defensive weapons. World War II saw the
beginning of the widespread use of high speed bombers which began to
minimize defensive weaponry in order to attain higher speed. Some
smaller designs were used as the basis for night fighters. A number of fighters, such as the Hawker Hurricane
were used as ground attack aircraft, replacing earlier conventional
light bombers that proved unable to defend themselves while carrying a
useful bomb load.
Cold War
At the start of the Cold War, bombers were the only means of carrying nuclear weapons to enemy targets, and had the role of deterrence.
With the advent of guided air-to-air missiles, bombers needed to avoid
interception. High-speed and high-altitude flying became a means of
evading detection and attack. Designs such as the English Electric Canberra
could fly faster or higher than contemporary fighters. When
surface-to-air missiles became capable of hitting high-flying bombers,
bombers were flown at low altitudes to evade radar detection and
interception.
Once "stand off" nuclear weapon designs were developed, bombers
did not need to pass over the target to make an attack; they could fire
and turn away to escape the blast. Nuclear strike aircraft were
generally finished in bare metal or anti-flash white to minimize absorption of thermal radiation from the flash of a nuclear explosion. The need to drop conventional bombs remained in conflicts with non-nuclear powers, such as the Vietnam War or Malayan Emergency.
The development of large strategic bombers stagnated in the later
part of the Cold War because of spiraling costs and the development of
the Intercontinental ballistic missile
(ICBM) – which was felt to have similar deterrent value while being
impossible to intercept. Because of this, the United States Air Force XB-70 Valkyrie program was cancelled in the early 1960s; the later B-1B Lancer and B-2 Spirit
aircraft entered service only after protracted political and
development problems. Their high cost meant that few were built and the
1950s-designed B-52s are projected to remain in use until the 2040s.
Similarly, the Soviet Union used the intermediate-range Tu-22M 'Backfire' in the 1970s, but their Mach 3 bomber project stalled. The Mach 2 Tu-160 'Blackjack' was built only in tiny numbers, leaving the 1950s Tupolev Tu-16 and Tu-95 'Bear' heavy bombers to continue being used into the 21st century.
The British strategic bombing force largely came to an end when the V bomber force was phased out; the last of which left service in 1983. The French Mirage IV
bomber version was retired in 1996, although the Mirage 2000N and the
Rafale have taken on this role. The only other nation that fields
strategic bombing forces is China, which has a number of Xian H-6s.
At present, these air forces are each developing stealth replacements for their legacy bomber fleets, the USAF with the Northrop Grumman B-21, the Russian Aerospace Forces with the PAK DA, and the PLAAF with the Xian H-20. As of 2021, the B-21 is expected to enter service by 2026–2027. The B-21 would be capable of loitering near target areas for extended periods of time.
Other uses
Occasionally, military aircraft have been used to bomb ice jams with limited success as part of an effort to clear them. In 2018, the Swedish Air Force
dropped bombs on a forest fire, snuffing out flames with the aid of the
blast waves. The fires had been raging in an area contaminated with unexploded ordnance, rendering them difficult to extinguish for firefighters.