The human mitochondrial molecular clock is the rate at which mutations have been accumulating in the mitochondrial genome of hominids during the course of human evolution.
The archeological record of human activity from early periods in human
prehistory is relatively limited and its interpretation has been
controversial. Because of the uncertainties from the archeological
record, scientists have turned to molecular dating techniques in order
to refine the timeline of human evolution. A major goal of scientists in
the field is to develop an accurate hominid mitochondrial molecular
clock which could then be used to confidently date events that occurred
during the course of human evolution.
Estimates of the mutation rate of human mitochondrial DNA (mtDNA) vary greatly depending on the available data and the method used for estimation. The two main methods of estimation, phylogeny based methods and pedigree based methods, have produced mutation rates that differ by almost an order of magnitude. Current research has been focused on resolving the high variability obtained from different rate estimates.
Estimates of the mutation rate of human mitochondrial DNA (mtDNA) vary greatly depending on the available data and the method used for estimation. The two main methods of estimation, phylogeny based methods and pedigree based methods, have produced mutation rates that differ by almost an order of magnitude. Current research has been focused on resolving the high variability obtained from different rate estimates.
Rate variability
A
major assumption of the molecular clock theory is that mutations within
a particular genetic system occur at a statistically uniform rate and
this uniform rate can be used for dating genetic events. In practice the
assumption of a single uniform rate is an oversimplification. Though a
single mutation rate is often applied, it is often a composite or an
average of several different mutation rates. Many factors influence observed mutation rates and these factors include the type of samples, the region of the genome studied and the time period covered.
Actual vs. observed rates
The rate at which mutations occur during reproduction, the germline mutation
rate, is thought to be higher than all observed mutation rates, because
not all mutations are successfully passed down to subsequent
generations.
mtDNA is only passed down along the matrilineal line, and therefore
mutations passed down to sons are lost. Random genetic drift may also
cause the loss of mutations. For these reasons, the actual mutation rate
will not be equivalent to the mutation rate observed from a population
sample.
Population size
Population dynamics are believed to influence observed mutation rates. When a population is expanding, more germline mutations
are preserved in the population. As a result, observed mutation rates
tend to increase in an expanding population. When populations contract,
as in a population bottleneck,
more germline mutations are lost. Population bottlenecks thus tend to
slow down observed mutation rates. Since the emergence of the species
homo sapiens about 200,000 years ago, human population have expanded
from a few thousand individuals living in Africa to over 6.5 billion all
over the world. However, the expansion has not been uniform, so the
history of human populations may consist of both bottlenecks and
expansions.
Structural variability
The
mutation rate across the mitochondrial genome is not uniformly
distributed. Certain regions of the genome are known to mutate more
rapidly than others. The Hypervariable regions are known to be highly polymorphic relative to other parts of the genome.
The rate at which mutations accumulate in coding and non-coding regions of the genome also differs as mutations in the coding region are subject to purifying selection. For this reason, some studies avoid coding region or synonymous mutations when calibrating the molecular clock. Loogvali et al. (2009)
only consider synonymous mutations, they have recalibrated the
molecular clock of human mtDNA as 7990 years per synonymous mutation
over
the mitochondrial genome.
Soares et al. (2009)
consider both coding and non-coding region mutations to arrive at a
single mutation rate, but apply a correction factor to account for
selection in the coding region.
Temporal variability
The
mutation rate has been observed to vary with time. Mutation rates
within the human species are faster than those observed along the
human-ape lineage. The mutation rate is also thought to be faster in
recent times, since the beginning of the Holocene 11,000 years ago.
Parallel mutations and saturation
Parallel mutation (sometimes referred to as Homoplasy) or convergent evolution occurs when separate lineages have the same mutation independently occur at the same site in the genome.
Saturation
occurs when a single site experiences multiple mutations. Parallel
mutations and saturation result in the underestimation of the mutation
rate because they are likely to be overlooked.
Heteroplasmy
Individuals affected by heteroplasmy
have a mixture of mtDNA types, some with new mutations and some
without. The new mutations may or may not be passed down to subsequent
generations. Thus the presence of heteroplasmic individuals in a sample
may complicate the calculation of mutation rates.
Methods
Pedigree based
Pedigree
methods estimate the mutation rate by comparing the mtDNA sequences of a
sample of parent/offspring pairs or analyzing mtDNA sequences of
individuals from a deep-rooted genealogy. The number of new mutations in
the sample is counted and divided by the total number of
parent-to-child DNA transmission events to arrive at a mutation rate.
Phylogeny based
Phylogeny
based methods are estimated by first reconstructing the haplotype of
the most recent common ancestor (MRCA) of a sample of two or more
genetic lineages. A requirement is that the time to the most recent
common ancestor (TMRCA)
of the sample of lineages must already be known from other independent
sources, usually the archeological record. The average number of
mutations that have accumulated since the MRCA
is then computed and divided by the TMRCA to arrive at the mutation
rate. The human mutation rate is usually estimated by comparing the
sequences of modern humans and chimpanzees and then reconstructing the
ancestral haplotype of the chimpanzee-human common ancestor. According
to the paleontological record the last common ancestor of humans may
have lived around 6 million years ago.
Pedigree vs. phylogeny comparison
Rates
obtained by pedigree methods are about 10 times faster than those
obtained by phylogenetic methods. Several factors acting together may be
responsible for this difference. As pedigree methods record mutations
in living subjects, the mutation rates from pedigree studies are closer
to the germline mutation rate. Pedigree studies use genealogies that are
only a few generations deep whereas phylogeny based methods use
timescales that are thousands or millions of years deep. According to
Henn et al. 2009, phylogeny based methods take into account events that
occur over long time scales and are thus less affected by stochastic
fluctuations. Howell et al. 2003 suggests that selection, saturation,
parallel mutations and genetic drift are responsible for the differences
observed between pedigree based methods and phylogeny based methods.
Estimating based on AMH archaeology
Study | Sequence type |
TAnchor (location) |
Referencing method (correction method) |
Cann, Stoneking & Wilson (1987) | Restriction fragments | 40, 30, and 12 Ka (Australia, New Guinea New World) |
archaeologically defined migrations matched with estimated sequence divergence rates |
Endicott & Ho (2008) | Genomic | 40 to 55 Ka (Papua New Guinea) 14.5 to 21.5 Ka (Haps H1 and H3) |
PNG following Haplogroup P |
Anatomical modern humans (AMH) spread out of Africa and over a large
area of Eurasia and left artifacts along the northern coast of the
Southwest, South, Southeast and East Asia. Cann, Stoneking & Wilson (1987) did not rely on a predicted TCHLCA to estimate single-nucleotide polymorphism
(SNP) rates. Instead, they used evidence of colonization in Southeast
Asia and Oceania to estimate mutation rates. In addition they used RFLP
technology (Restriction fragment length polymorphism) to examine differences between DNA. Using these techniques this group came up with a TMRCA
of 140,000 to 290,000 years. Cann et al. (1987) estimated the TMRCA of
humans to be approximately 210 ky and the most recent estimates Soares
et al. 2009 (using 7 million year chimpanzee human mtDNA MRCA) differ by
only 9%, which is relatively close considering the wide confidence
range for both estimates and calls for more ancient TCHLCA.
Endicott & Ho (2008)
have reevaluated the predicted migrations globally and compared those
to the actual evidence. This group used the coding regions of sequences.
They postulate that the molecular clock based on chimp-human
comparisons is not reliable, particularly in predicting recent
migrations, such as founding migrations into Europe, Australia, and the
Americans. With this technique this group came up with a TMRCA of 82,000 to 134,000 years.
Estimating based on CHLCA
Because
chimps and humans share a matrilineal ancestor, establishing the
geological age of that last ancestor allows the estimation of the
mutation rate. The chimp-human last common ancestor (CHLCA) is frequently applied as an anchor for mt-TMRCA studies with ranges between 4 and 13 million years cited in the literature.
This is one source of variation in the time estimates. The other
weakness is the non-clocklike accumulation of SNPs, would tend to make
more recent branches look older than they actually are.
Regions(s) | Subregions (or site within codon) |
SNP rate (per site * year) |
|
Control region |
HVR I | 1.6 × 10−7 | |
HVR II | 2.3 × 10−7 | ||
remaining | 1.5 × 10−8 | ||
Protein- coding |
(1st and 2nd) | 8.8 × 10−9 | |
(3rd) | 1.9 × 10−8 | ||
DNA encoding rRNA (rDNA) | 8.2 × 10−9 | ||
DNA encoding tRNA (tDNA) | 6.9 × 10−9 | ||
other | 2.4 × 10−8 | ||
TCHLCA assumed 6.5 Ma, relative rate to 1st & 2nd codons |
These two sources may balance each other or amplify each other depending on the direction of the TCHLCA
error. There are two major reasons why this method is widely employed.
First the pedigree based rates are inappropriate for estimates for very
long periods of time. Second, while the archaeology anchored rates
represent the intermediate range, archaeological evidence for human
colonization often occurs well after colonization. For example,
colonization of Eurasia from west to east is believed to have occurred
along the Indian Ocean. However, the oldest archaeological sites that
also demonstrate anatomically modern humans (AMH) are in China and
Australia, greater than 42,000 years in age. However the oldest Indian
site with AMH remains is from 34,000 years, and another site with AMH
compatible archaeology is in excess of 76,000 years in age. Therefore, application of the anchor is a subjective interpretation of when humans were first present.
A simple measure the sequence divergence
between humans and chimps can be bound by observing the SNPs. Given
that the mitogenome is about 16553 base pairs in length (each base-pair
which can be aligned with known references is called a site), the formula is:
The '2' in the denominator
is derived from the 2 lineages, human and chimpanzee, that split from
the CHLCA. Ideally it represents the accumulation of mutations on both
lineages but in different positions (SNPs). As long as the number of SNP
observed approximates the number of mutations this formula works well.
However, at rapidly evolving sites mutations are obscured by saturation
affects. Sorting positions within the mitogenome by rate and
compensating for saturation are alternative approaches.
Because the TCHLCA is subject to change with more
paleontological information, the equation described above allows the
comparison of TMRCA from different studies.
Study | Sequence type |
TCHLCA (sorting time) |
Referencing method (correction method) |
Vigilant et al. (1991) | HVR | 4 to 6 Ma | CH transversions, (15:1 transition:transversion) |
Ingman et al. (2000) | genomic (not HVR) |
5 Ma | CH genomic comparison |
Endicott & Ho (2008) | genomic (not HVR) |
5 to 7.5 Ma | CH (relaxed rate, rate-class defined) |
Gonder et al. (2007) | genomic (not HVR) |
6.0 Ma (+ 0.5 Ma) |
CH (rate class defined) |
Mishmar et al. (2003) | genomic (not HVR) |
6.5 Ma (+ 0.5 Ma) |
CH (rate class defined) |
Soares et al. (2009) | genomic | 6.5Ma (+ 0.5 Ma) |
CHLCA anchored, (Examined selection by Ka/(Ks + k)) |
Chimpanzee to Human = CH, LCA = last common ancestor |
Early, HVR, sequence-based methods
To overcome the effects of saturation, HVR analysis relied on the transversional distance between humans and chimpanzees. A transition
to transversion ratio was applied to this distance to estimate sequence
divergence in the HVR between chimpanzees and humans, and divided by an
assumed TCHLCA of 4 to 6 million years.
Based on 26.4 substitutions between chimpanzee and human and 15:1
ratio, the estimated 396 transitions over 610 base-pairs demonstrated
sequence divergence of 69.2% (rate * TCHLCA of 0.369), producing divergence rates of roughly 11.5% to 17.3% per million years.
Vigilant et al. (1991)
also estimated the sequence divergence rate for the sites in the
rapidly evolving HVR I and HVR II regions. As noted in the table above,
the rate of evolution is so high that site saturation occurs in direct
chimpanzee and human comparisons. Consequently, this study used
transversions, which evolve at a slower rate than the more common
transition polymorphisms. Comparing chimp and human mitogenomes, they
noted 26.4 transversions within the HVR regions, however they made no
correction for saturation. As more HVR sequence was obtained following
this study, it was noted that the dinucleotide site CRS:16181-16182
experienced numerous transversions in parsimony analysis, many of these
were considered to be sequencing errors. However the sequencing of Feldhofer I Neanderthal revealed that there was also a transversion between humans and Neanderthals at this site. In addition, Soares et al. (2009)
noted three sites in which recurrent transversions had occurred in
human lineages, two of which are in HVR I, 16265 (12 occurrences) and
16318(8 occurrences).
Therefore, 26.4 transversions was an underestimate of the likely number
of transversion events. The year 1991 study also used a
transition-to-transversion ratio from the study of old world monkeys of
15:1.
However, examination of chimp and gorilla HVR reveals a rate that is
lower, and the examination of humans places the rate at 34:1.
Therefore, this study underestimated that level of sequence divergence
between chimpanzee and human. The estimated sequence divergence
0.738/site (includes transversions) is significantly lower than the ~2.5
per site suggested by Soares et al. (2009). These two errors would
result in an overestimate of the human mitochondrial TMRCA. However,
they failed to detect the basal L0 lineage in the analysis and also
failed to detect recurrent transitions in many lineages, which also
underestimate the TMRCA. Also, Vigilant et al. (1991) used a more recent
CHLCA anchor of 4 to 6 million years.
Coding region sequence based methods
African mtDNA haplogroups | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Partial coding region sequence originally supplemented HVR studies
because complete coding region sequence was uncommon. There were
suspicions that the HVR studies had missed major branches based on some
earlier RFLP and coding region studies. Ingman et al. (2000) was the first study to compare genomic sequences for coalescence analysis. Coding region sequence discriminated M and N haplogroups and L0 and L1
macrohaplogroups. Because the genomic DNA sequencing resolved the two
deepest branches it improved some aspects estimating TMRCA over HVR
sequence alone. Excluding the D-loop and using a 5-million-year TCHLCA, Ingman et al. (2000) estimated the mutation rate to be 1.70 × 10−8 per site per year (rate * TCHLCA = 0.085, 15,435 sites).
However, coding region DNA has come under question because coding
sequences are either under purifying selection to maintain structure
and function, or under regional selection to evolve new capacities.
The problem with mutations in the coding region has been described as
such: mutations occurring in the coding region that are not lethal to the mitochondria can persist but are negatively selective
to the host; over a few generations these will persist, but over
thousands of generations these slowly are pruned from the population,
leaving SNPs.
However, over thousands of generations regionally selective mutations
may not be discriminated from these transient coding region mutations.
The problem with rare mutations in the human mitogenomes is significant
enough to prompt a half-dozen recent studies on the matter.
Ingman et al. (2000) estimated the non-D loop region evolution 1.7 × 10−8
per year per site based on 53 non-identical genomic sequence
overrepresenting Africa in a global sample. Despite this
over-representation, the resolution of the L0 subbranches was lacking
and one other deep L1 branches has been found. Despite these limitations
that sampling was adequate for the hallmark study. Today, L0 is
restricted to African populations, whereas L1 is the ancestral
haplogroup of all non-Africans, as well as most Africans. Mitochondrial
Eve's sequence can be approximated by comparing a sequence from L0 with a
sequence from L1. By reconciling the mutations in L0 and L1. The mtDNA
sequences of contemporary human populations will generally differ from
Mitochondrial Eve's sequence by about 50 mutations. Mutation rates were not classified according to site (other than excluding the HVR regions). The TCHLCA used in the year 2000 study of 5 Ma was also lower than values used in the most recent studies.
Estimates from ancient DNA
Since
it has become possible to sequence large numbers of ancient
mitogenomes, several studies have estimated the mitochondrial mutation
rate by measuring how many more mutations on average have accumulated in
modern (or later) genomes compared to ancient (or earlier) ones
descending from the same phylogenetic node. These studies have obtained
similar results: central estimates for the whole chromosome, in
substitutions per site per year: 2.47 × 10−8; 2.14 × 10−8; 2.53 × 10−8; and 2.74 × 10−8.
Inter-comparing rates and studies
Molecular clocking of mitochondrial DNA has been criticized because of its inconsistent molecular clock. A retrospective analysis of any pioneering process will reveal inadequacies. With mitochondrial the inadequacies are the argument from ignorance of rate variation and overconfidence concerning the TCHLCA
of 5 Ma. Lack of historical perspective might explain the second issue,
the problem of rate variation is something that could only be resolved
by the massive study of mitochondria that followed. The number of HVR
sequences that have accumulated from 1987 to 2000 increased by
magnitudes. Soares et al. (2009)
used 2196 mitogenomic sequences and uncovered 10,683 substitution
events within these sequences. Eleven of 16560 sites in the mitogenome
produced greater than 11% of all the substitutions with statistically
significant rate variation within the 11 sites.
They argue that there is a neutral-site mutation rate which is a
magnitude slower than rate observed for the fastest site, CRS 16519.
Consequently, purifying selection aside, the rate of mutation itself
varies between sites, with a few sites much more likely to undergo new
mutations relative to others.
Soares et al. (2009) noted two spans of DNA, CRS 2651-2700 and
3028-3082, that had no SNPs within the 2196 mitogenomic sequences.