A Medley of Potpourri: Human mitochondrial molecular clock

Tuesday, August 18, 2020

Human mitochondrial molecular clock

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Human_mitochondrial_molecular_clock

The human mitochondrial molecular clock is the rate at which mutations have been accumulating in the mitochondrial genome of hominids during the course of human evolution. The archeological record of human activity from early periods in human prehistory is relatively limited and its interpretation has been controversial. Because of the uncertainties from the archeological record, scientists have turned to molecular dating techniques in order to refine the timeline of human evolution. A major goal of scientists in the field is to develop an accurate hominid mitochondrial molecular clock which could then be used to confidently date events that occurred during the course of human evolution.

Estimates of the mutation rate of human mitochondrial DNA (mtDNA) vary greatly depending on the available data and the method used for estimation. The two main methods of estimation, phylogeny based methods and pedigree based methods, have produced mutation rates that differ by almost an order of magnitude. Current research has been focused on resolving the high variability obtained from different rate estimates.

Rate variability

A major assumption of the molecular clock theory is that mutations within a particular genetic system occur at a statistically uniform rate and this uniform rate can be used for dating genetic events. In practice the assumption of a single uniform rate is an oversimplification. Though a single mutation rate is often applied, it is often a composite or an average of several different mutation rates. Many factors influence observed mutation rates and these factors include the type of samples, the region of the genome studied and the time period covered.

Actual vs. observed rates

The rate at which mutations occur during reproduction, the germline mutation rate, is thought to be higher than all observed mutation rates, because not all mutations are successfully passed down to subsequent generations. mtDNA is only passed down along the matrilineal line, and therefore mutations passed down to sons are lost. Random genetic drift may also cause the loss of mutations. For these reasons, the actual mutation rate will not be equivalent to the mutation rate observed from a population sample.

Population size

Population dynamics are believed to influence observed mutation rates. When a population is expanding, more germline mutations are preserved in the population. As a result, observed mutation rates tend to increase in an expanding population. When populations contract, as in a population bottleneck, more germline mutations are lost. Population bottlenecks thus tend to slow down observed mutation rates. Since the emergence of the species homo sapiens about 200,000 years ago, human population have expanded from a few thousand individuals living in Africa to over 6.5 billion all over the world. However, the expansion has not been uniform, so the history of human populations may consist of both bottlenecks and expansions.

Structural variability

The mutation rate across the mitochondrial genome is not uniformly distributed. Certain regions of the genome are known to mutate more rapidly than others. The Hypervariable regions are known to be highly polymorphic relative to other parts of the genome.

The rate at which mutations accumulate in coding and non-coding regions of the genome also differs as mutations in the coding region are subject to purifying selection. For this reason, some studies avoid coding region or synonymous mutations when calibrating the molecular clock. Loogvali et al. (2009) only consider synonymous mutations, they have recalibrated the molecular clock of human mtDNA as 7990 years per synonymous mutation over the mitochondrial genome. Soares et al. (2009) consider both coding and non-coding region mutations to arrive at a single mutation rate, but apply a correction factor to account for selection in the coding region.

Temporal variability

The mutation rate has been observed to vary with time. Mutation rates within the human species are faster than those observed along the human-ape lineage. The mutation rate is also thought to be faster in recent times, since the beginning of the Holocene 11,000 years ago.

Parallel mutations and saturation

Parallel mutation (sometimes referred to as Homoplasy) or convergent evolution occurs when separate lineages have the same mutation independently occur at the same site in the genome. Saturation occurs when a single site experiences multiple mutations. Parallel mutations and saturation result in the underestimation of the mutation rate because they are likely to be overlooked.

Heteroplasmy

Individuals affected by heteroplasmy have a mixture of mtDNA types, some with new mutations and some without. The new mutations may or may not be passed down to subsequent generations. Thus the presence of heteroplasmic individuals in a sample may complicate the calculation of mutation rates.

Methods

Pedigree based

Pedigree methods estimate the mutation rate by comparing the mtDNA sequences of a sample of parent/offspring pairs or analyzing mtDNA sequences of individuals from a deep-rooted genealogy. The number of new mutations in the sample is counted and divided by the total number of parent-to-child DNA transmission events to arrive at a mutation rate.

Phylogeny based

Phylogeny based methods are estimated by first reconstructing the haplotype of the most recent common ancestor (MRCA) of a sample of two or more genetic lineages. A requirement is that the time to the most recent common ancestor (TMRCA) of the sample of lineages must already be known from other independent sources, usually the archeological record. The average number of mutations that have accumulated since the MRCA is then computed and divided by the TMRCA to arrive at the mutation rate. The human mutation rate is usually estimated by comparing the sequences of modern humans and chimpanzees and then reconstructing the ancestral haplotype of the chimpanzee-human common ancestor. According to the paleontological record the last common ancestor of humans may have lived around 6 million years ago.

Pedigree vs. phylogeny comparison

Rates obtained by pedigree methods are about 10 times faster than those obtained by phylogenetic methods. Several factors acting together may be responsible for this difference. As pedigree methods record mutations in living subjects, the mutation rates from pedigree studies are closer to the germline mutation rate. Pedigree studies use genealogies that are only a few generations deep whereas phylogeny based methods use timescales that are thousands or millions of years deep. According to Henn et al. 2009, phylogeny based methods take into account events that occur over long time scales and are thus less affected by stochastic fluctuations. Howell et al. 2003 suggests that selection, saturation, parallel mutations and genetic drift are responsible for the differences observed between pedigree based methods and phylogeny based methods.

Estimating based on AMH archaeology

Methods/parameters for archaeologically estimated dates of mitochondrial Eve
Study	Sequence type	T_Anchor (location)	Referencing method (correction method)
Cann, Stoneking & Wilson (1987)	Restriction fragments	40, 30, and 12 Ka (Australia, New Guinea New World)	archaeologically defined migrations matched with estimated sequence divergence rates
Endicott & Ho (2008)	Genomic	40 to 55 Ka (Papua New Guinea) 14.5 to 21.5 Ka (Haps H1 and H3)	PNG following Haplogroup P

Anatomical modern humans (AMH) spread out of Africa and over a large area of Eurasia and left artifacts along the northern coast of the Southwest, South, Southeast and East Asia. Cann, Stoneking & Wilson (1987) did not rely on a predicted T_CHLCA to estimate single-nucleotide polymorphism (SNP) rates. Instead, they used evidence of colonization in Southeast Asia and Oceania to estimate mutation rates. In addition they used RFLP technology (Restriction fragment length polymorphism) to examine differences between DNA. Using these techniques this group came up with a T_MRCA of 140,000 to 290,000 years. Cann et al. (1987) estimated the TMRCA of humans to be approximately 210 ky and the most recent estimates Soares et al. 2009 (using 7 million year chimpanzee human mtDNA MRCA) differ by only 9%, which is relatively close considering the wide confidence range for both estimates and calls for more ancient T_CHLCA.

Endicott & Ho (2008) have reevaluated the predicted migrations globally and compared those to the actual evidence. This group used the coding regions of sequences. They postulate that the molecular clock based on chimp-human comparisons is not reliable, particularly in predicting recent migrations, such as founding migrations into Europe, Australia, and the Americans. With this technique this group came up with a T_MRCA of 82,000 to 134,000 years.

Estimating based on CHLCA

Because chimps and humans share a matrilineal ancestor, establishing the geological age of that last ancestor allows the estimation of the mutation rate. The chimp-human last common ancestor (CHLCA) is frequently applied as an anchor for mt-T_MRCA studies with ranges between 4 and 13 million years cited in the literature. This is one source of variation in the time estimates. The other weakness is the non-clocklike accumulation of SNPs, would tend to make more recent branches look older than they actually are.

SNP rates as described by Soares et al. (2009)
Regions(s)	Subregions (or site within codon)	SNP rate (per site * year)
Control region	HVR I	1.6 × 10⁻⁷
	HVR II	2.3 × 10⁻⁷
	remaining	1.5 × 10⁻⁸
Protein- coding	(1st and 2nd)	8.8 × 10⁻⁹
Protein- coding	(3rd)	1.9 × 10⁻⁸
DNA encoding rRNA (rDNA)		8.2 × 10⁻⁹
DNA encoding tRNA (tDNA)		6.9 × 10⁻⁹
other		2.4 × 10⁻⁸
T_CHLCA assumed 6.5 Ma, relative rate to 1st & 2nd codons

These two sources may balance each other or amplify each other depending on the direction of the T_CHLCA error. There are two major reasons why this method is widely employed. First the pedigree based rates are inappropriate for estimates for very long periods of time. Second, while the archaeology anchored rates represent the intermediate range, archaeological evidence for human colonization often occurs well after colonization. For example, colonization of Eurasia from west to east is believed to have occurred along the Indian Ocean. However, the oldest archaeological sites that also demonstrate anatomically modern humans (AMH) are in China and Australia, greater than 42,000 years in age. However the oldest Indian site with AMH remains is from 34,000 years, and another site with AMH compatible archaeology is in excess of 76,000 years in age. Therefore, application of the anchor is a subjective interpretation of when humans were first present.

A simple measure the sequence divergence between humans and chimps can be bound by observing the SNPs. Given that the mitogenome is about 16553 base pairs in length (each base-pair which can be aligned with known references is called a site), the formula is:

rate={\frac {SNPs}{(2T_{CHLCA}16553)}}

The '2' in the denominator is derived from the 2 lineages, human and chimpanzee, that split from the CHLCA. Ideally it represents the accumulation of mutations on both lineages but in different positions (SNPs). As long as the number of SNP observed approximates the number of mutations this formula works well. However, at rapidly evolving sites mutations are obscured by saturation affects. Sorting positions within the mitogenome by rate and compensating for saturation are alternative approaches.

Because the T_CHLCA is subject to change with more paleontological information, the equation described above allows the comparison of TMRCA from different studies.

Methods/parameters for estimating date of mitochondrial Eve
Study	Sequence type	T_CHLCA (sorting time)	Referencing method (correction method)
Vigilant et al. (1991)	HVR	4 to 6 Ma	CH transversions, (15:1 transition:transversion)
Ingman et al. (2000)	genomic (not HVR)	5 Ma	CH genomic comparison
Endicott & Ho (2008)	genomic (not HVR)	5 to 7.5 Ma	CH (relaxed rate, rate-class defined)
Gonder et al. (2007)	genomic (not HVR)	6.0 Ma (+ 0.5 Ma)	CH (rate class defined)
Mishmar et al. (2003)	genomic (not HVR)	6.5 Ma (+ 0.5 Ma)	CH (rate class defined)
Soares et al. (2009)	genomic	6.5Ma (+ 0.5 Ma)	CHLCA anchored, (Examined selection by Ka/(Ks + k))
Chimpanzee to Human = CH, LCA = last common ancestor

Early, HVR, sequence-based methods

To overcome the effects of saturation, HVR analysis relied on the transversional distance between humans and chimpanzees. A transition to transversion ratio was applied to this distance to estimate sequence divergence in the HVR between chimpanzees and humans, and divided by an assumed T_CHLCA of 4 to 6 million years. Based on 26.4 substitutions between chimpanzee and human and 15:1 ratio, the estimated 396 transitions over 610 base-pairs demonstrated sequence divergence of 69.2% (rate * T_CHLCA of 0.369), producing divergence rates of roughly 11.5% to 17.3% per million years.

HVR is exceptionally prone to saturation, leading to the underestimation of the SNP rate when comparing very distantly related lineages

Vigilant et al. (1991) also estimated the sequence divergence rate for the sites in the rapidly evolving HVR I and HVR II regions. As noted in the table above, the rate of evolution is so high that site saturation occurs in direct chimpanzee and human comparisons. Consequently, this study used transversions, which evolve at a slower rate than the more common transition polymorphisms. Comparing chimp and human mitogenomes, they noted 26.4 transversions within the HVR regions, however they made no correction for saturation. As more HVR sequence was obtained following this study, it was noted that the dinucleotide site CRS:16181-16182 experienced numerous transversions in parsimony analysis, many of these were considered to be sequencing errors. However the sequencing of Feldhofer I Neanderthal revealed that there was also a transversion between humans and Neanderthals at this site. In addition, Soares et al. (2009) noted three sites in which recurrent transversions had occurred in human lineages, two of which are in HVR I, 16265 (12 occurrences) and 16318(8 occurrences). Therefore, 26.4 transversions was an underestimate of the likely number of transversion events. The year 1991 study also used a transition-to-transversion ratio from the study of old world monkeys of 15:1. However, examination of chimp and gorilla HVR reveals a rate that is lower, and the examination of humans places the rate at 34:1. Therefore, this study underestimated that level of sequence divergence between chimpanzee and human. The estimated sequence divergence 0.738/site (includes transversions) is significantly lower than the ~2.5 per site suggested by Soares et al. (2009). These two errors would result in an overestimate of the human mitochondrial TMRCA. However, they failed to detect the basal L0 lineage in the analysis and also failed to detect recurrent transitions in many lineages, which also underestimate the TMRCA. Also, Vigilant et al. (1991) used a more recent CHLCA anchor of 4 to 6 million years.

Coding region sequence based methods

African mtDNA haplogroups

L0d

L0k

L0f

	L0b

	L0a

	L1b

	L1c

	L3

	L4

Partial coding region sequence originally supplemented HVR studies because complete coding region sequence was uncommon. There were suspicions that the HVR studies had missed major branches based on some earlier RFLP and coding region studies. Ingman et al. (2000) was the first study to compare genomic sequences for coalescence analysis. Coding region sequence discriminated M and N haplogroups and L0 and L1 macrohaplogroups. Because the genomic DNA sequencing resolved the two deepest branches it improved some aspects estimating TMRCA over HVR sequence alone. Excluding the D-loop and using a 5-million-year T_CHLCA, Ingman et al. (2000) estimated the mutation rate to be 1.70 × 10⁻⁸ per site per year (rate * T_CHLCA = 0.085, 15,435 sites).

However, coding region DNA has come under question because coding sequences are either under purifying selection to maintain structure and function, or under regional selection to evolve new capacities. The problem with mutations in the coding region has been described as such: mutations occurring in the coding region that are not lethal to the mitochondria can persist but are negatively selective to the host; over a few generations these will persist, but over thousands of generations these slowly are pruned from the population, leaving SNPs. However, over thousands of generations regionally selective mutations may not be discriminated from these transient coding region mutations. The problem with rare mutations in the human mitogenomes is significant enough to prompt a half-dozen recent studies on the matter.

Ingman et al. (2000) estimated the non-D loop region evolution 1.7 × 10⁻⁸ per year per site based on 53 non-identical genomic sequence overrepresenting Africa in a global sample. Despite this over-representation, the resolution of the L0 subbranches was lacking and one other deep L1 branches has been found. Despite these limitations that sampling was adequate for the hallmark study. Today, L0 is restricted to African populations, whereas L1 is the ancestral haplogroup of all non-Africans, as well as most Africans. Mitochondrial Eve's sequence can be approximated by comparing a sequence from L0 with a sequence from L1. By reconciling the mutations in L0 and L1. The mtDNA sequences of contemporary human populations will generally differ from Mitochondrial Eve's sequence by about 50 mutations. Mutation rates were not classified according to site (other than excluding the HVR regions). The T_CHLCA used in the year 2000 study of 5 Ma was also lower than values used in the most recent studies.

Estimates from ancient DNA

Since it has become possible to sequence large numbers of ancient mitogenomes, several studies have estimated the mitochondrial mutation rate by measuring how many more mutations on average have accumulated in modern (or later) genomes compared to ancient (or earlier) ones descending from the same phylogenetic node. These studies have obtained similar results: central estimates for the whole chromosome, in substitutions per site per year: 2.47 × 10⁻⁸; 2.14 × 10⁻⁸; 2.53 × 10⁻⁸; and 2.74 × 10⁻⁸.

Inter-comparing rates and studies

Molecular clocking of mitochondrial DNA has been criticized because of its inconsistent molecular clock. A retrospective analysis of any pioneering process will reveal inadequacies. With mitochondrial the inadequacies are the argument from ignorance of rate variation and overconfidence concerning the T_CHLCA of 5 Ma. Lack of historical perspective might explain the second issue, the problem of rate variation is something that could only be resolved by the massive study of mitochondria that followed. The number of HVR sequences that have accumulated from 1987 to 2000 increased by magnitudes. Soares et al. (2009) used 2196 mitogenomic sequences and uncovered 10,683 substitution events within these sequences. Eleven of 16560 sites in the mitogenome produced greater than 11% of all the substitutions with statistically significant rate variation within the 11 sites. They argue that there is a neutral-site mutation rate which is a magnitude slower than rate observed for the fastest site, CRS 16519. Consequently, purifying selection aside, the rate of mutation itself varies between sites, with a few sites much more likely to undergo new mutations relative to others. Soares et al. (2009) noted two spans of DNA, CRS 2651-2700 and 3028-3082, that had no SNPs within the 2196 mitogenomic sequences.

A Medley of Potpourri

Search This Blog