A Medley of Potpourri

Tuesday, August 18, 2020

Human mitochondrial genetics

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Human_mitochondrial_genetics

Human mitochondrial DNA
The 16,569 bp long human mitochondrial genome with the protein-coding, ribosomal RNA, and transfer RNA genes.
Features
Length (bp)	16,569
No. of genes	13 (coding genes) 24 (non coding genes)
Type	Mitochondrial DNA
Complete gene lists
HGNC	Gene list
NCBI	Gene list
External map viewers
Ensembl	Chromosome MT
Entrez	Chromosome MT
NCBI	Chromosome MT
UCSC	Chromosome M
Full DNA sequences
RefSeq	NC_012920 (FASTA)
GenBank	J01415 (FASTA)

Human mitochondrial genetics is the study of the genetics of human mitochondrial DNA (the DNA contained in human mitochondria). The human mitochondrial genome is the entirety of hereditary information contained in human mitochondria. Mitochondria are small structures in cells that generate energy for the cell to use, and are hence referred to as the "powerhouses" of the cell.

Mitochondrial DNA (mtDNA) is not transmitted through nuclear DNA (nDNA). In humans, as in most multicellular organisms, mitochondrial DNA is inherited only from the mother's ovum. There are theories, however, that paternal mtDNA transmission in humans can occur under certain circumstances.

Mitochondrial inheritance is therefore non-Mendelian, as Mendelian inheritance presumes that half the genetic material of a fertilized egg (zygote) derives from each parent.

Eighty percent of mitochondrial DNA codes for mitochondrial RNA, and therefore most mitochondrial DNA mutations lead to functional problems, which may be manifested as muscle disorders (myopathies).

Because they provide 30 molecules of ATP per glucose molecule in contrast to the 2 ATP molecules produced by glycolysis, mitochondria are essential to all higher organisms for sustaining life. The mitochondrial diseases are genetic disorders carried in mitochondrial DNA, or nuclear DNA coding for mitochondrial components. Slight problems with any one of the numerous enzymes used by the mitochondria can be devastating to the cell, and in turn, to the organism.

Quantity

In humans, mitochondrial DNA (mtDNA) forms closed circular molecules that contain 16,569 DNA base pairs, with each such molecule normally containing a full set of the mitochondrial genes. Each human mitochondrion contains, on average, approximately 5 such mtDNA molecules, with the quantity ranging between 1 and 15. Each human cell contains approximately 100 mitochondria, giving a total number of mtDNA molecules per human cell of approximately 500.

Inheritance patterns

Mitochondrial inheritance patterns

The reason for maternal inheritance in mitochondrial DNA is that when the sperm enters the egg cell, it discards its middle part, which contains its mitochondria, so that only its head with the nucleus penetrates the egg cell.

Because mitochondrial diseases (diseases due to malfunction of mitochondria) can be inherited both maternally and through chromosomal inheritance, the way in which they are passed on from generation to generation can vary greatly depending on the disease. Mitochondrial genetic mutations that occur in the nuclear DNA can occur in any of the chromosomes (depending on the species). Mutations inherited through the chromosomes can be autosomal dominant or recessive and can also be sex-linked dominant or recessive. Chromosomal inheritance follows normal Mendelian laws, despite the fact that the phenotype of the disease may be masked.

Because of the complex ways in which mitochondrial and nuclear DNA "communicate" and interact, even seemingly simple inheritance is hard to diagnose. A mutation in chromosomal DNA may change a protein that regulates (increases or decreases) the production of another certain protein in the mitochondria or the cytoplasm; this may lead to slight, if any, noticeable symptoms. On the other hand, some devastating mtDNA mutations are easy to diagnose because of their widespread damage to muscular, neural, and/or hepatic tissues (among other high-energy and metabolism-dependent tissues) and because they are present in the mother and all the offspring.

The number of affected mtDNA molecules inherited by a specific offspring can vary greatly because

the mitochondria within the fertilized oocyte is what the new life will have to begin with (in terms of mtDNA),
the number of affected mitochondria varies from cell (in this case, the fertilized oocyte) to cell depending both on the number it inherited from its mother cell and environmental factors which may favor mutant or wildtype mitochondrial DNA,
the number of mtDNA molecules in the mitochondria varies from around two to ten.

It is possible, even in twin births, for one baby to receive more than half mutant mtDNA molecules while the other twin may receive only a tiny fraction of mutant mtDNA molecules with respect to wildtype (depending on how the twins divide from each other and how many mutant mitochondria happen to be on each side of the division). In a few cases, some mitochondria or a mitochondrion from the sperm cell enters the oocyte but paternal mitochondria are actively decomposed.

Genes

Electron transport chain, and humanin

It was originally incorrectly believed that the mitochondrial genome contained only 13 protein-coding genes, all of them encoding proteins of the electron transport chain. However, in 2001, a 14th biologically active protein called humanin was discovered, and was found to be encoded by the mitochondrial gene MT-RNR2 which also encodes part of the mitochondrial ribosome (made out of RNA):

Complex number	Category	Genes	Positions in the mitogenome	Strand
I	NADH dehydrogenase
		MT-ND1	3,307–4,262	L
		MT-ND2	4,470–5,511	L
		MT-ND3	10,059–10,404	L
		MT-ND4L	10,470–10,766	L
		MT-ND4	10,760–12,137 (overlap with MT-ND4L)	L
		MT-ND5	12,337–14,148	L
		MT-ND6	14,149–14,673	H
III	Coenzyme Q - cytochrome c reductase / Cytochrome b	MT-CYB	14,747–15,887	L
IV	Cytochrome c oxidase	MT-CO1	5,904–7,445	L
		MT-CO2	7,586–8,269	L
		MT-CO3	9,207–9,990	L
V	ATP synthase	MT-ATP6	8,527–9,207 (overlap with MT-ATP8)	L
V	ATP synthase	MT-ATP8	8,366–8,572	L
—	Humanin	MT-RNR2	—	—

Unlike the other proteins, humanin does not remain in the mitochondria, and interacts with the rest of the cell and cellular receptors. Humanin can protect brain cells by inhibiting apoptosis. Despite its name, versions of humanin also exist in other animals, such as rattin in rats.

rRNA

The following genes encode rRNAs:

Subunit	rRNA	Genes	Positions in the mitogenome	Strand
Small (SSU)	12S	MT-RNR1	648–1,601	L
Large (LSU)	16S	MT-RNR2	1,671–3,229	L

tRNA

The following genes encode tRNAs:

Amino Acid	3-Letter	1-Letter	MT DNA	Positions	Strand
Alanine	Ala	A	MT-TA	5,587–5,655	H
Arginine	Arg	R	MT-TR	10,405–10,469	L
Asparagine	Asn	N	MT-TN	5,657–5,729	H
Aspartic acid	Asp	D	MT-TD	7,518–7,585	L
Cysteine	Cys	C	MT-TC	5,761–5,826	H
Glutamic acid	Glu	E	MT-TE	14,674–14,742	H
Glutamine	Gln	Q	MT-TQ	4,329–4,400	H
Glycine	Gly	G	MT-TG	9,991–10,058	L
Histidine	His	H	MT-TH	12,138–12,206	L
Isoleucine	Ile	I	MT-TI	4,263–4,331	L
Leucine	Leu (UUR)	L	MT-TL1	3,230–3,304	L
Leucine	Leu (CUN)	L	MT-TL2	12,266–12,336	L
Lysine	Lys	K	MT-TK	8,295–8,364	L
Methionine	Met	M	MT-TM	4,402–4,469	L
Phenylalanine	Phe	F	MT-TF	577–647	L
Proline	Pro	P	MT-TP	15,956–16,023	H
Serine	Ser (UCN)	S	MT-TS1	7,446–7,514	H
Serine	Ser (AGY)	S	MT-TS2	12,207–12,265	L
Threonine	Thr	T	MT-TT	15,888–15,953	L
Tryptophan	Trp	W	MT-TW	5,512–5,579	L
Tyrosine	Tyr	Y	MT-TY	5,826–5,891	H
Valine	Val	V	MT-TV	1,602–1,670	L

Location of genes

Mitochondrial DNA traditionally had the two strands of DNA designated the heavy and the light strand, due to their buoyant densities during separation in cesium chloride gradients, which was found to be related to the relative G+T nucleotide content of the strand. However, confusion of labeling of this strands is widespread, and appears to originate with an identification of the majority coding strand as the heavy in one influential article in 1999. In humans, the light strand of mtDNA carries 28 genes and the heavy strand of mtDNA carries only 9 genes. Eight of the 9 genes on the heavy strand code for mitochondrial tRNA molecules. Human mtDNA consists of 16,569 nucleotide pairs. The entire molecule is regulated by only one regulatory region which contains the origins of replication of both heavy and light strands. The entire human mitochondrial DNA molecule has been mapped.

Genetic code variants

The genetic code is, for the most part, universal, with few exceptions: mitochondrial genetics includes some of these. For most organisms the "stop codons" are "UAA", "UAG", and "UGA". In vertebrate mitochondria "AGA" and "AGG" are also stop codons, but not "UGA", which codes for tryptophan instead. "AUA" codes for isoleucine in most organisms but for methionine in vertebrate mitochondrial mRNA.

There are many other variations among the codes used by other mitochondrial m/tRNA, which happened not to be harmful to their organisms, and which can be used as a tool (along with other mutations among the mtDNA/RNA of different species) to determine relative proximity of common ancestry of related species. (The more related two species are, the more mtDNA/RNA mutations will be the same in their mitochondrial genome).

Using these techniques, it is estimated that the first mitochondria arose around 1.5 billion years ago. A generally accepted hypothesis is that mitochondria originated as an aerobic prokaryote in a symbiotic relationship within an anaerobic eukaryote.

Replication, repair, transcription, and translation

Mitochondrial replication is controlled by nuclear genes and is specifically suited to make as many mitochondria as that particular cell needs at the time.

Mitochondrial transcription in humans is initiated from three promoters, H1, H2, and L (heavy strand 1, heavy strand 2, and light strand promoters). The H2 promoter transcribes almost the entire heavy strand and the L promoter transcribes the entire light strand. The H1 promoter causes the transcription of the two mitochondrial rRNA molecules.

When transcription takes place on the heavy strand a polycistronic transcript is created. The light strand produces either small transcripts, which can be used as primers, or one long transcript. The production of primers occurs by processing of light strand transcripts with the Mitochondrial RNase MRP (Mitochondrial RNA Processing). The requirement of transcription to produce primers links the process of transcription to mtDNA replication. Full length transcripts are cut into functional tRNA, rRNA, and mRNA molecules.

The process of transcription initiation in mitochondria involves three types of proteins: the mitochondrial RNA polymerase (POLRMT), mitochondrial transcription factor A (TFAM), and mitochondrial transcription factors B1 and B2 (TFB1M, TFB2M). POLRMT, TFAM, and TFB1M or TFB2M assemble at the mitochondrial promoters and begin transcription. The actual molecular events that are involved in initiation are unknown, but these factors make up the basal transcription machinery and have been shown to function in vitro. Mitochondrial translation is still not very well understood. In vitro translations have still not been successful, probably due to the difficulty of isolating sufficient mt mRNA, functional mt rRNA, and possibly because of the complicated changes that the mRNA undergoes before it is translated.

Mitochondrial DNA polymerase

The Mitochondrial DNA Polymerase (Pol gamma, encoded by the POLG gene) is used in the copying of mtDNA during replication. Because the two (heavy and light) strands on the circular mtDNA molecule have different origins of replication, it replicates in a D-loop mode. One strand begins to replicate first, displacing the other strand. This continues until replication reaches the origin of replication on the other strand, at which point the other strand begins replicating in the opposite direction. This results in two new mtDNA molecules. Each mitochondrion has several copies of the mtDNA molecule and the number of mtDNA molecules is a limiting factor in mitochondrial fission. After the mitochondrion has enough mtDNA, membrane area, and membrane proteins, it can undergo fission (very similar to that which bacteria use) to become two mitochondria. Evidence suggests that mitochondria can also undergo fusion and exchange (in a form of crossover) genetic material among each other. Mitochondria sometimes form large matrices in which fusion, fission, and protein exchanges are constantly occurring. mtDNA shared among mitochondria (despite the fact that they can undergo fusion).

Damage and transcription error

Mitochondrial DNA is susceptible to damage from free oxygen radicals from mistakes that occur during the production of ATP through the electron transport chain. These mistakes can be caused by genetic disorders, cancer, and temperature variations. These radicals can damage mtDNA molecules or change them, making it hard for mitochondrial polymerase to replicate them. Both cases can lead to deletions, rearrangements, and other mutations. Recent evidence has suggested that mitochondria have enzymes that proofread mtDNA and fix mutations that may occur due to free radicals. It is believed that a DNA recombinase found in mammalian cells is also involved in a repairing recombination process. Deletions and mutations due to free radicals have been associated with the aging process. It is believed that radicals cause mutations which lead to mutant proteins, which in turn led to more radicals. This process takes many years and is associated with some aging processes involved in oxygen-dependent tissues such as brain, heart, muscle, and kidney. Auto-enhancing processes such as these are possible causes of degenerative diseases including Parkinson's, Alzheimer's, and coronary artery disease.

Chromosomally mediated mtDNA replication errors

Because mitochondrial growth and fission are mediated by the nuclear DNA, mutations in nuclear DNA can have a wide array of effects on mtDNA replication. Despite the fact that the loci for some of these mutations have been found on human chromosomes, specific genes and proteins involved have not yet been isolated. Mitochondria need a certain protein to undergo fission. If this protein (generated by the nucleus) is not present, the mitochondria grow but they do not divide. This leads to giant, inefficient mitochondria. Mistakes in chromosomal genes or their products can also affect mitochondrial replication more directly by inhibiting mitochondrial polymerase and can even cause mutations in the mtDNA directly and indirectly. Indirect mutations are most often caused by radicals created by defective proteins made from nuclear DNA.

Mitochondrial diseases

Contribution of mitochondrial versus nuclear genome

In total, the mitochondrion hosts about 3000 different types of proteins, but only about 13 of them are coded on the mitochondrial DNA. Most of the 3000 types of proteins are involved in a variety of processes other than ATP production, such as porphyrin synthesis. Only about 3% of them code for ATP production proteins. This means most of the genetic information coding for the protein makeup of mitochondria is in chromosomal DNA and is involved in processes other than ATP synthesis. This increases the chances that a mutation that will affect a mitochondrion will occur in chromosomal DNA, which is inherited in a Mendelian pattern. Another result is that a chromosomal mutation will affect a specific tissue due to its specific needs, whether those may be high energy requirements or a need for the catabolism or anabolism of a specific neurotransmitter or nucleic acid. Because several copies of the mitochondrial genome are carried by each mitochondrion (2–10 in humans), mitochondrial mutations can be inherited maternally by mtDNA mutations which are present in mitochondria inside the oocyte before fertilization, or (as stated above) through mutations in the chromosomes.

Presentation

Mitochondrial diseases range in severity from asymptomatic to fatal, and are most commonly due to inherited rather than acquired mutations of mitochondrial DNA. A given mitochondrial mutation can cause various diseases depending on the severity of the problem in the mitochondria and the tissue the affected mitochondria are in. Conversely, several different mutations may present themselves as the same disease. This almost patient-specific characterization of mitochondrial diseases (see Personalized medicine) makes them very hard to accurately recognize, diagnose and trace. Some diseases are observable at or even before birth (many causing death) while others do not show themselves until late adulthood (late-onset disorders). This is because the number of mutant versus wildtype mitochondria varies between cells and tissues, and is continuously changing. Because cells have multiple mitochondria, different mitochondria in the same cell can have different variations of the mtDNA. This condition is referred to as heteroplasmy. When a certain tissue reaches a certain ratio of mutant versus wildtype mitochondria, a disease will present itself. The ratio varies from person to person and tissue to tissue (depending on its specific energy, oxygen, and metabolism requirements, and the effects of the specific mutation). Mitochondrial diseases are very numerous and different. Apart from diseases caused by abnormalities in mitochondrial DNA, many diseases are suspected to be associated in part by mitochondrial dysfunctions, such as diabetes mellitus, forms of cancer and cardiovascular disease, lactic acidosis, specific forms of myopathy, osteoporosis, Alzheimer's disease, Parkinsons's disease, stroke, male infertility and which are also believed to play a role in the aging process.

Use in forensics

Human mtDNA can also be used to help identify individuals. Forensic laboratories occasionally use mtDNA comparison to identify human remains, and especially to identify older unidentified skeletal remains. Although unlike nuclear DNA, mtDNA is not specific to one individual, it can be used in combination with other evidence (anthropological evidence, circumstantial evidence, and the like) to establish identification. mtDNA is also used to exclude possible matches between missing persons and unidentified remains. Many researchers believe that mtDNA is better suited to identification of older skeletal remains than nuclear DNA because the greater number of copies of mtDNA per cell increases the chance of obtaining a useful sample, and because a match with a living relative is possible even if numerous maternal generations separate the two.

Examples

American outlaw Jesse James's remains were identified using a comparison between mtDNA extracted from his remains and the mtDNA of the son of the female-line great-granddaughter of his sister.

Similarly, the remains of Alexandra Feodorovna (Alix of Hesse), last Empress of Russia, and her children were identified by comparison of their mitochondrial DNA with that of Prince Philip, Duke of Edinburgh, whose maternal grandmother was Alexandra's sister Victoria of Hesse.

Similarly to identify Emperor Nicholas II remains his mitochondrial DNA was compared with that of James Carnegie, 3rd Duke of Fife, whose maternal great-grandmother Alexandra of Denmark (Queen Alexandra) was sister of Nicholas II mother Dagmar of Denmark (Empress Maria Feodorovna).

Similarly the remains of king Richard III.

Human mitochondrial molecular clock

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Human_mitochondrial_molecular_clock

The human mitochondrial molecular clock is the rate at which mutations have been accumulating in the mitochondrial genome of hominids during the course of human evolution. The archeological record of human activity from early periods in human prehistory is relatively limited and its interpretation has been controversial. Because of the uncertainties from the archeological record, scientists have turned to molecular dating techniques in order to refine the timeline of human evolution. A major goal of scientists in the field is to develop an accurate hominid mitochondrial molecular clock which could then be used to confidently date events that occurred during the course of human evolution.

Estimates of the mutation rate of human mitochondrial DNA (mtDNA) vary greatly depending on the available data and the method used for estimation. The two main methods of estimation, phylogeny based methods and pedigree based methods, have produced mutation rates that differ by almost an order of magnitude. Current research has been focused on resolving the high variability obtained from different rate estimates.

Rate variability

A major assumption of the molecular clock theory is that mutations within a particular genetic system occur at a statistically uniform rate and this uniform rate can be used for dating genetic events. In practice the assumption of a single uniform rate is an oversimplification. Though a single mutation rate is often applied, it is often a composite or an average of several different mutation rates. Many factors influence observed mutation rates and these factors include the type of samples, the region of the genome studied and the time period covered.

Actual vs. observed rates

The rate at which mutations occur during reproduction, the germline mutation rate, is thought to be higher than all observed mutation rates, because not all mutations are successfully passed down to subsequent generations. mtDNA is only passed down along the matrilineal line, and therefore mutations passed down to sons are lost. Random genetic drift may also cause the loss of mutations. For these reasons, the actual mutation rate will not be equivalent to the mutation rate observed from a population sample.

Population size

Population dynamics are believed to influence observed mutation rates. When a population is expanding, more germline mutations are preserved in the population. As a result, observed mutation rates tend to increase in an expanding population. When populations contract, as in a population bottleneck, more germline mutations are lost. Population bottlenecks thus tend to slow down observed mutation rates. Since the emergence of the species homo sapiens about 200,000 years ago, human population have expanded from a few thousand individuals living in Africa to over 6.5 billion all over the world. However, the expansion has not been uniform, so the history of human populations may consist of both bottlenecks and expansions.

Structural variability

The mutation rate across the mitochondrial genome is not uniformly distributed. Certain regions of the genome are known to mutate more rapidly than others. The Hypervariable regions are known to be highly polymorphic relative to other parts of the genome.

The rate at which mutations accumulate in coding and non-coding regions of the genome also differs as mutations in the coding region are subject to purifying selection. For this reason, some studies avoid coding region or synonymous mutations when calibrating the molecular clock. Loogvali et al. (2009) only consider synonymous mutations, they have recalibrated the molecular clock of human mtDNA as 7990 years per synonymous mutation over the mitochondrial genome. Soares et al. (2009) consider both coding and non-coding region mutations to arrive at a single mutation rate, but apply a correction factor to account for selection in the coding region.

Temporal variability

The mutation rate has been observed to vary with time. Mutation rates within the human species are faster than those observed along the human-ape lineage. The mutation rate is also thought to be faster in recent times, since the beginning of the Holocene 11,000 years ago.

Parallel mutations and saturation

Parallel mutation (sometimes referred to as Homoplasy) or convergent evolution occurs when separate lineages have the same mutation independently occur at the same site in the genome. Saturation occurs when a single site experiences multiple mutations. Parallel mutations and saturation result in the underestimation of the mutation rate because they are likely to be overlooked.

Heteroplasmy

Individuals affected by heteroplasmy have a mixture of mtDNA types, some with new mutations and some without. The new mutations may or may not be passed down to subsequent generations. Thus the presence of heteroplasmic individuals in a sample may complicate the calculation of mutation rates.

Methods

Pedigree based

Pedigree methods estimate the mutation rate by comparing the mtDNA sequences of a sample of parent/offspring pairs or analyzing mtDNA sequences of individuals from a deep-rooted genealogy. The number of new mutations in the sample is counted and divided by the total number of parent-to-child DNA transmission events to arrive at a mutation rate.

Phylogeny based

Phylogeny based methods are estimated by first reconstructing the haplotype of the most recent common ancestor (MRCA) of a sample of two or more genetic lineages. A requirement is that the time to the most recent common ancestor (TMRCA) of the sample of lineages must already be known from other independent sources, usually the archeological record. The average number of mutations that have accumulated since the MRCA is then computed and divided by the TMRCA to arrive at the mutation rate. The human mutation rate is usually estimated by comparing the sequences of modern humans and chimpanzees and then reconstructing the ancestral haplotype of the chimpanzee-human common ancestor. According to the paleontological record the last common ancestor of humans may have lived around 6 million years ago.

Pedigree vs. phylogeny comparison

Rates obtained by pedigree methods are about 10 times faster than those obtained by phylogenetic methods. Several factors acting together may be responsible for this difference. As pedigree methods record mutations in living subjects, the mutation rates from pedigree studies are closer to the germline mutation rate. Pedigree studies use genealogies that are only a few generations deep whereas phylogeny based methods use timescales that are thousands or millions of years deep. According to Henn et al. 2009, phylogeny based methods take into account events that occur over long time scales and are thus less affected by stochastic fluctuations. Howell et al. 2003 suggests that selection, saturation, parallel mutations and genetic drift are responsible for the differences observed between pedigree based methods and phylogeny based methods.

Estimating based on AMH archaeology

Methods/parameters for archaeologically estimated dates of mitochondrial Eve
Study	Sequence type	T_Anchor (location)	Referencing method (correction method)
Cann, Stoneking & Wilson (1987)	Restriction fragments	40, 30, and 12 Ka (Australia, New Guinea New World)	archaeologically defined migrations matched with estimated sequence divergence rates
Endicott & Ho (2008)	Genomic	40 to 55 Ka (Papua New Guinea) 14.5 to 21.5 Ka (Haps H1 and H3)	PNG following Haplogroup P

Anatomical modern humans (AMH) spread out of Africa and over a large area of Eurasia and left artifacts along the northern coast of the Southwest, South, Southeast and East Asia. Cann, Stoneking & Wilson (1987) did not rely on a predicted T_CHLCA to estimate single-nucleotide polymorphism (SNP) rates. Instead, they used evidence of colonization in Southeast Asia and Oceania to estimate mutation rates. In addition they used RFLP technology (Restriction fragment length polymorphism) to examine differences between DNA. Using these techniques this group came up with a T_MRCA of 140,000 to 290,000 years. Cann et al. (1987) estimated the TMRCA of humans to be approximately 210 ky and the most recent estimates Soares et al. 2009 (using 7 million year chimpanzee human mtDNA MRCA) differ by only 9%, which is relatively close considering the wide confidence range for both estimates and calls for more ancient T_CHLCA.

Endicott & Ho (2008) have reevaluated the predicted migrations globally and compared those to the actual evidence. This group used the coding regions of sequences. They postulate that the molecular clock based on chimp-human comparisons is not reliable, particularly in predicting recent migrations, such as founding migrations into Europe, Australia, and the Americans. With this technique this group came up with a T_MRCA of 82,000 to 134,000 years.

Estimating based on CHLCA

Because chimps and humans share a matrilineal ancestor, establishing the geological age of that last ancestor allows the estimation of the mutation rate. The chimp-human last common ancestor (CHLCA) is frequently applied as an anchor for mt-T_MRCA studies with ranges between 4 and 13 million years cited in the literature. This is one source of variation in the time estimates. The other weakness is the non-clocklike accumulation of SNPs, would tend to make more recent branches look older than they actually are.

SNP rates as described by Soares et al. (2009)
Regions(s)	Subregions (or site within codon)	SNP rate (per site * year)
Control region	HVR I	1.6 × 10⁻⁷
	HVR II	2.3 × 10⁻⁷
	remaining	1.5 × 10⁻⁸
Protein- coding	(1st and 2nd)	8.8 × 10⁻⁹
Protein- coding	(3rd)	1.9 × 10⁻⁸
DNA encoding rRNA (rDNA)		8.2 × 10⁻⁹
DNA encoding tRNA (tDNA)		6.9 × 10⁻⁹
other		2.4 × 10⁻⁸
T_CHLCA assumed 6.5 Ma, relative rate to 1st & 2nd codons

These two sources may balance each other or amplify each other depending on the direction of the T_CHLCA error. There are two major reasons why this method is widely employed. First the pedigree based rates are inappropriate for estimates for very long periods of time. Second, while the archaeology anchored rates represent the intermediate range, archaeological evidence for human colonization often occurs well after colonization. For example, colonization of Eurasia from west to east is believed to have occurred along the Indian Ocean. However, the oldest archaeological sites that also demonstrate anatomically modern humans (AMH) are in China and Australia, greater than 42,000 years in age. However the oldest Indian site with AMH remains is from 34,000 years, and another site with AMH compatible archaeology is in excess of 76,000 years in age. Therefore, application of the anchor is a subjective interpretation of when humans were first present.

A simple measure the sequence divergence between humans and chimps can be bound by observing the SNPs. Given that the mitogenome is about 16553 base pairs in length (each base-pair which can be aligned with known references is called a site), the formula is:

rate={\frac {SNPs}{(2T_{CHLCA}16553)}}

The '2' in the denominator is derived from the 2 lineages, human and chimpanzee, that split from the CHLCA. Ideally it represents the accumulation of mutations on both lineages but in different positions (SNPs). As long as the number of SNP observed approximates the number of mutations this formula works well. However, at rapidly evolving sites mutations are obscured by saturation affects. Sorting positions within the mitogenome by rate and compensating for saturation are alternative approaches.

Because the T_CHLCA is subject to change with more paleontological information, the equation described above allows the comparison of TMRCA from different studies.

Methods/parameters for estimating date of mitochondrial Eve
Study	Sequence type	T_CHLCA (sorting time)	Referencing method (correction method)
Vigilant et al. (1991)	HVR	4 to 6 Ma	CH transversions, (15:1 transition:transversion)
Ingman et al. (2000)	genomic (not HVR)	5 Ma	CH genomic comparison
Endicott & Ho (2008)	genomic (not HVR)	5 to 7.5 Ma	CH (relaxed rate, rate-class defined)
Gonder et al. (2007)	genomic (not HVR)	6.0 Ma (+ 0.5 Ma)	CH (rate class defined)
Mishmar et al. (2003)	genomic (not HVR)	6.5 Ma (+ 0.5 Ma)	CH (rate class defined)
Soares et al. (2009)	genomic	6.5Ma (+ 0.5 Ma)	CHLCA anchored, (Examined selection by Ka/(Ks + k))
Chimpanzee to Human = CH, LCA = last common ancestor

Early, HVR, sequence-based methods

To overcome the effects of saturation, HVR analysis relied on the transversional distance between humans and chimpanzees. A transition to transversion ratio was applied to this distance to estimate sequence divergence in the HVR between chimpanzees and humans, and divided by an assumed T_CHLCA of 4 to 6 million years. Based on 26.4 substitutions between chimpanzee and human and 15:1 ratio, the estimated 396 transitions over 610 base-pairs demonstrated sequence divergence of 69.2% (rate * T_CHLCA of 0.369), producing divergence rates of roughly 11.5% to 17.3% per million years.

HVR is exceptionally prone to saturation, leading to the underestimation of the SNP rate when comparing very distantly related lineages

Vigilant et al. (1991) also estimated the sequence divergence rate for the sites in the rapidly evolving HVR I and HVR II regions. As noted in the table above, the rate of evolution is so high that site saturation occurs in direct chimpanzee and human comparisons. Consequently, this study used transversions, which evolve at a slower rate than the more common transition polymorphisms. Comparing chimp and human mitogenomes, they noted 26.4 transversions within the HVR regions, however they made no correction for saturation. As more HVR sequence was obtained following this study, it was noted that the dinucleotide site CRS:16181-16182 experienced numerous transversions in parsimony analysis, many of these were considered to be sequencing errors. However the sequencing of Feldhofer I Neanderthal revealed that there was also a transversion between humans and Neanderthals at this site. In addition, Soares et al. (2009) noted three sites in which recurrent transversions had occurred in human lineages, two of which are in HVR I, 16265 (12 occurrences) and 16318(8 occurrences). Therefore, 26.4 transversions was an underestimate of the likely number of transversion events. The year 1991 study also used a transition-to-transversion ratio from the study of old world monkeys of 15:1. However, examination of chimp and gorilla HVR reveals a rate that is lower, and the examination of humans places the rate at 34:1. Therefore, this study underestimated that level of sequence divergence between chimpanzee and human. The estimated sequence divergence 0.738/site (includes transversions) is significantly lower than the ~2.5 per site suggested by Soares et al. (2009). These two errors would result in an overestimate of the human mitochondrial TMRCA. However, they failed to detect the basal L0 lineage in the analysis and also failed to detect recurrent transitions in many lineages, which also underestimate the TMRCA. Also, Vigilant et al. (1991) used a more recent CHLCA anchor of 4 to 6 million years.

Coding region sequence based methods

African mtDNA haplogroups

L0d

L0k

L0f

	L0b

	L0a

	L1b

	L1c

	L3

	L4

Partial coding region sequence originally supplemented HVR studies because complete coding region sequence was uncommon. There were suspicions that the HVR studies had missed major branches based on some earlier RFLP and coding region studies. Ingman et al. (2000) was the first study to compare genomic sequences for coalescence analysis. Coding region sequence discriminated M and N haplogroups and L0 and L1 macrohaplogroups. Because the genomic DNA sequencing resolved the two deepest branches it improved some aspects estimating TMRCA over HVR sequence alone. Excluding the D-loop and using a 5-million-year T_CHLCA, Ingman et al. (2000) estimated the mutation rate to be 1.70 × 10⁻⁸ per site per year (rate * T_CHLCA = 0.085, 15,435 sites).

However, coding region DNA has come under question because coding sequences are either under purifying selection to maintain structure and function, or under regional selection to evolve new capacities. The problem with mutations in the coding region has been described as such: mutations occurring in the coding region that are not lethal to the mitochondria can persist but are negatively selective to the host; over a few generations these will persist, but over thousands of generations these slowly are pruned from the population, leaving SNPs. However, over thousands of generations regionally selective mutations may not be discriminated from these transient coding region mutations. The problem with rare mutations in the human mitogenomes is significant enough to prompt a half-dozen recent studies on the matter.

Ingman et al. (2000) estimated the non-D loop region evolution 1.7 × 10⁻⁸ per year per site based on 53 non-identical genomic sequence overrepresenting Africa in a global sample. Despite this over-representation, the resolution of the L0 subbranches was lacking and one other deep L1 branches has been found. Despite these limitations that sampling was adequate for the hallmark study. Today, L0 is restricted to African populations, whereas L1 is the ancestral haplogroup of all non-Africans, as well as most Africans. Mitochondrial Eve's sequence can be approximated by comparing a sequence from L0 with a sequence from L1. By reconciling the mutations in L0 and L1. The mtDNA sequences of contemporary human populations will generally differ from Mitochondrial Eve's sequence by about 50 mutations. Mutation rates were not classified according to site (other than excluding the HVR regions). The T_CHLCA used in the year 2000 study of 5 Ma was also lower than values used in the most recent studies.

Estimates from ancient DNA

Since it has become possible to sequence large numbers of ancient mitogenomes, several studies have estimated the mitochondrial mutation rate by measuring how many more mutations on average have accumulated in modern (or later) genomes compared to ancient (or earlier) ones descending from the same phylogenetic node. These studies have obtained similar results: central estimates for the whole chromosome, in substitutions per site per year: 2.47 × 10⁻⁸; 2.14 × 10⁻⁸; 2.53 × 10⁻⁸; and 2.74 × 10⁻⁸.

Inter-comparing rates and studies

Molecular clocking of mitochondrial DNA has been criticized because of its inconsistent molecular clock. A retrospective analysis of any pioneering process will reveal inadequacies. With mitochondrial the inadequacies are the argument from ignorance of rate variation and overconfidence concerning the T_CHLCA of 5 Ma. Lack of historical perspective might explain the second issue, the problem of rate variation is something that could only be resolved by the massive study of mitochondria that followed. The number of HVR sequences that have accumulated from 1987 to 2000 increased by magnitudes. Soares et al. (2009) used 2196 mitogenomic sequences and uncovered 10,683 substitution events within these sequences. Eleven of 16560 sites in the mitogenome produced greater than 11% of all the substitutions with statistically significant rate variation within the 11 sites. They argue that there is a neutral-site mutation rate which is a magnitude slower than rate observed for the fastest site, CRS 16519. Consequently, purifying selection aside, the rate of mutation itself varies between sites, with a few sites much more likely to undergo new mutations relative to others. Soares et al. (2009) noted two spans of DNA, CRS 2651-2700 and 3028-3082, that had no SNPs within the 2196 mitogenomic sequences.

DNA repair

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/DNA_repair

DNA damage resulting in multiple broken chromosomes

DNA repair is a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as radiation can cause DNA damage, resulting in as many as 1 million individual molecular lesions per cell per day. Many of these lesions cause structural damage to the DNA molecule and can alter or eliminate the cell's ability to transcribe the gene that the affected DNA encodes. Other lesions induce potentially harmful mutations in the cell's genome, which affect the survival of its daughter cells after it undergoes mitosis. As a consequence, the DNA repair process is constantly active as it responds to damage in the DNA structure. When normal repair processes fail, and when cellular apoptosis does not occur, irreparable DNA damage may occur, including double-strand breaks and DNA crosslinkages (interstrand crosslinks or ICLs). This can eventually lead to malignant tumors, or cancer as per the two hit hypothesis.

The rate of DNA repair is dependent on many factors, including the cell type, the age of the cell, and the extracellular environment. A cell that has accumulated a large amount of DNA damage, or one that no longer effectively repairs damage incurred to its DNA, can enter one of three possible states:

an irreversible state of dormancy, known as senescence
cell suicide, also known as apoptosis or programmed cell death
unregulated cell division, which can lead to the formation of a tumor that is cancerous

The DNA repair ability of a cell is vital to the integrity of its genome and thus to the normal functionality of that organism. Many genes that were initially shown to influence life span have turned out to be involved in DNA damage repair and protection.

Play media

Paul Modrich talks about himself and his work in DNA repair.

The 2015 Nobel Prize in Chemistry was awarded to Tomas Lindahl, Paul Modrich, and Aziz Sancar for their work on the molecular mechanisms of DNA repair processes.

DNA damage

DNA damage, due to environmental factors and normal metabolic processes inside the cell, occurs at a rate of 10,000 to 1,000,000 molecular lesions per cell per day. While this constitutes only 0.000165% of the human genome's approximately 6 billion bases (3 billion base pairs), unrepaired lesions in critical genes (such as tumor suppressor genes) can impede a cell's ability to carry out its function and appreciably increase the likelihood of tumor formation and contribute to tumour heterogeneity.

The vast majority of DNA damage affects the primary structure of the double helix; that is, the bases themselves are chemically modified. These modifications can in turn disrupt the molecules' regular helical structure by introducing non-native chemical bonds or bulky adducts that do not fit in the standard double helix. Unlike proteins and RNA, DNA usually lacks tertiary structure and therefore damage or disturbance does not occur at that level. DNA is, however, supercoiled and wound around "packaging" proteins called histones (in eukaryotes), and both superstructures are vulnerable to the effects of DNA damage.

Sources

DNA damage can be subdivided into two main types:

endogenous damage such as attack by reactive oxygen species produced from normal metabolic byproducts (spontaneous mutation), especially the process of oxidative deamination
1. also includes replication errors
exogenous damage caused by external agents such as
1. ultraviolet [UV 200–400 nm] radiation from the sun or other artificial light sources
2. other radiation frequencies, including x-rays and gamma rays
3. hydrolysis or thermal disruption
4. certain plant toxins
5. human-made mutagenic chemicals, especially aromatic compounds that act as DNA intercalating agents
6. viruses

The replication of damaged DNA before cell division can lead to the incorporation of wrong bases opposite damaged ones. Daughter cells that inherit these wrong bases carry mutations from which the original DNA sequence is unrecoverable (except in the rare case of a back mutation, for example, through gene conversion).

Types

There are several types of damage to DNA due to endogenous cellular processes:

oxidation of bases [e.g. 8-oxo-7,8-dihydroguanine (8-oxoG)] and generation of DNA strand interruptions from reactive oxygen species,
alkylation of bases (usually methylation), such as formation of 7-methylguanosine, 1-methyladenine, 6-O-Methylguanine
hydrolysis of bases, such as deamination, depurination, and depyrimidination.
"bulky adduct formation" (e.g., benzo[a]pyrene diol epoxide-dG adduct, aristolactam I-dA adduct)
mismatch of bases, due to errors in DNA replication, in which the wrong DNA base is stitched into place in a newly forming DNA strand, or a DNA base is skipped over or mistakenly inserted.
Monoadduct damage cause by change in single nitrogenous base of DNA
Diadduct damage

Damage caused by exogenous agents comes in many forms. Some examples are:

UV-B light causes crosslinking between adjacent cytosine and thymine bases creating pyrimidine dimers. This is called direct DNA damage.
UV-A light creates mostly free radicals. The damage caused by free radicals is called indirect DNA damage.
Ionizing radiation such as that created by radioactive decay or in cosmic rays causes breaks in DNA strands. Intermediate-level ionizing radiation may induce irreparable DNA damage (leading to replicational and transcriptional errors needed for neoplasia or may trigger viral interactions) leading to pre-mature aging and cancer.
Thermal disruption at elevated temperature increases the rate of depurination (loss of purine bases from the DNA backbone) and single-strand breaks. For example, hydrolytic depurination is seen in the thermophilic bacteria, which grow in hot springs at 40–80 °C.The rate of depurination (300 purine residues per genome per generation) is too high in these species to be repaired by normal repair machinery, hence a possibility of an adaptive response cannot be ruled out.
Industrial chemicals such as vinyl chloride and hydrogen peroxide, and environmental chemicals such as polycyclic aromatic hydrocarbons found in smoke, soot and tar create a huge diversity of DNA adducts- ethenobases, oxidized bases, alkylated phosphotriesters and crosslinking of DNA, just to name a few.

UV damage, alkylation/methylation, X-ray damage and oxidative damage are examples of induced damage. Spontaneous damage can include the loss of a base, deamination, sugar ring puckering and tautomeric shift. Constitutive (spontaneous) DNA damage caused by endogenous oxidants can be detected as a low level of histone H2AX phosphorylation in untreated cells.

Nuclear versus mitochondrial

In human cells, and eukaryotic cells in general, DNA is found in two cellular locations – inside the nucleus and inside the mitochondria. Nuclear DNA (nDNA) exists as chromatin during non-replicative stages of the cell cycle and is condensed into aggregate structures known as chromosomes during cell division. In either state the DNA is highly compacted and wound up around bead-like proteins called histones. Whenever a cell needs to express the genetic information encoded in its nDNA the required chromosomal region is unravelled, genes located therein are expressed, and then the region is condensed back to its resting conformation. Mitochondrial DNA (mtDNA) is located inside mitochondria organelles, exists in multiple copies, and is also tightly associated with a number of proteins to form a complex known as the nucleoid. Inside mitochondria, reactive oxygen species (ROS), or free radicals, byproducts of the constant production of adenosine triphosphate (ATP) via oxidative phosphorylation, create a highly oxidative environment that is known to damage mtDNA. A critical enzyme in counteracting the toxicity of these species is superoxide dismutase, which is present in both the mitochondria and cytoplasm of eukaryotic cells.

Senescence and apoptosis

Senescence, an irreversible process in which the cell no longer divides, is a protective response to the shortening of the chromosome ends. The telomeres are long regions of repetitive noncoding DNA that cap chromosomes and undergo partial degradation each time a cell undergoes division (see Hayflick limit). In contrast, quiescence is a reversible state of cellular dormancy that is unrelated to genome damage. Senescence in cells may serve as a functional alternative to apoptosis in cases where the physical presence of a cell for spatial reasons is required by the organism, which serves as a "last resort" mechanism to prevent a cell with damaged DNA from replicating inappropriately in the absence of pro-growth cellular signaling. Unregulated cell division can lead to the formation of a tumor, which is potentially lethal to an organism. Therefore, the induction of senescence and apoptosis is considered to be part of a strategy of protection against cancer.

Mutation

It is important to distinguish between DNA damage and mutation, the two major types of error in DNA. DNA damage and mutation are fundamentally different. Damage results in physical abnormalities in the DNA, such as single- and double-strand breaks, 8-hydroxydeoxyguanosine residues, and polycyclic aromatic hydrocarbon adducts. DNA damage can be recognized by enzymes, and thus can be correctly repaired if redundant information, such as the undamaged sequence in the complementary DNA strand or in a homologous chromosome, is available for copying. If a cell retains DNA damage, transcription of a gene can be prevented, and thus translation into a protein will also be blocked. Replication may also be blocked or the cell may die.

In contrast to DNA damage, a mutation is a change in the base sequence of the DNA. A mutation cannot be recognized by enzymes once the base change is present in both DNA strands, and thus a mutation cannot be repaired. At the cellular level, mutations can cause alterations in protein function and regulation. Mutations are replicated when the cell replicates. In a population of cells, mutant cells will increase or decrease in frequency according to the effects of the mutation on the ability of the cell to survive and reproduce.

Although distinctly different from each other, DNA damage and mutation are related because DNA damage often causes errors of DNA synthesis during replication or repair; these errors are a major source of mutation.

Given these properties of DNA damage and mutation, it can be seen that DNA damage is a special problem in non-dividing or slowly-dividing cells, where unrepaired damage will tend to accumulate over time. On the other hand, in rapidly-dividing cells, unrepaired DNA damage that does not kill the cell by blocking replication will tend to cause replication errors and thus mutation. The great majority of mutations that are not neutral in their effect are deleterious to a cell's survival. Thus, in a population of cells composing a tissue with replicating cells, mutant cells will tend to be lost. However, infrequent mutations that provide a survival advantage will tend to clonally expand at the expense of neighboring cells in the tissue. This advantage to the cell is disadvantageous to the whole organism, because such mutant cells can give rise to cancer. Thus, DNA damage in frequently dividing cells, because it gives rise to mutations, is a prominent cause of cancer. In contrast, DNA damage in infrequently-dividing cells is likely a prominent cause of aging.

Mechanisms

Cells cannot function if DNA damage corrupts the integrity and accessibility of essential information in the genome (but cells remain superficially functional when non-essential genes are missing or damaged). Depending on the type of damage inflicted on the DNA's double helical structure, a variety of repair strategies have evolved to restore lost information. If possible, cells use the unmodified complementary strand of the DNA or the sister chromatid as a template to recover the original information. Without access to a template, cells use an error-prone recovery mechanism known as translesion synthesis as a last resort.

Damage to DNA alters the spatial configuration of the helix, and such alterations can be detected by the cell. Once damage is localized, specific DNA repair molecules bind at or near the site of damage, inducing other molecules to bind and form a complex that enables the actual repair to take place.

Direct reversal

Cells are known to eliminate three types of damage to their DNA by chemically reversing it. These mechanisms do not require a template, since the types of damage they counteract can occur in only one of the four bases. Such direct reversal mechanisms are specific to the type of damage incurred and do not involve breakage of the phosphodiester backbone. The formation of pyrimidine dimers upon irradiation with UV light results in an abnormal covalent bond between adjacent pyrimidine bases. The photoreactivation process directly reverses this damage by the action of the enzyme photolyase, whose activation is obligately dependent on energy absorbed from blue/UV light (300–500 nm wavelength) to promote catalysis. Photolyase, an old enzyme present in bacteria, fungi, and most animals no longer functions in humans, who instead use nucleotide excision repair to repair damage from UV irradiation. Another type of damage, methylation of guanine bases, is directly reversed by the protein methyl guanine methyl transferase (MGMT), the bacterial equivalent of which is called ogt. This is an expensive process because each MGMT molecule can be used only once; that is, the reaction is stoichiometric rather than catalytic. A generalized response to methylating agents in bacteria is known as the adaptive response and confers a level of resistance to alkylating agents upon sustained exposure by upregulation of alkylation repair enzymes. The third type of DNA damage reversed by cells is certain methylation of the bases cytosine and adenine.

Single-strand damage

Structure of the base-excision repair enzyme uracil-DNA glycosylase excising a hydrolytically-produced uracil residue from DNA. The uracil residue is shown in yellow.

When only one of the two strands of a double helix has a defect, the other strand can be used as a template to guide the correction of the damaged strand. In order to repair damage to one of the two paired molecules of DNA, there exist a number of excision repair mechanisms that remove the damaged nucleotide and replace it with an undamaged nucleotide complementary to that found in the undamaged DNA strand.

Base excision repair (BER): damaged single bases or nucleotides are most commonly repaired by removing the base or the nucleotide involved and then inserting the correct base or nucleotide. In base excision repair, a glycosylase enzyme removes the damaged base from the DNA by cleaving the bond between the base and the deoxyribose. These enzymes remove a single base to create an apurinic or apyrimidinic site (AP site). Enzymes called AP endonucleases nick the damaged DNA backbone at the AP site. DNA polymerase then removes the damaged region using its 5’ to 3’ exonuclease activity and correctly synthesizes the new strand using the complementary strand as a template. The gap is then sealed by enzyme DNA ligase.
Nucleotide excision repair (NER): bulky, helix-distorting damage, such as pyrimidine dimerization caused by UV light is usually repaired by a three-step process. First the damage is recognized, then 12-24 nucleotide-long strands of DNA are removed both upstream and downstream of the damage site by endonucleases, and the removed DNA region is then resynthesized. NER is a highly evolutionarily conserved repair mechanism and is used in nearly all eukaryotic and prokaryotic cells. In prokaryotes, NER is mediated by Uvr proteins. In eukaryotes, many more proteins are involved, although the general strategy is the same.
Mismatch repair systems are present in essentially all cells to correct errors that are not corrected by proofreading. These systems consist of at least two proteins. One detects the mismatch, and the other recruits an endonuclease that cleaves the newly synthesized DNA strand close to the region of damage. In E. coli , the proteins involved are the Mut class proteins: MutS, MutL, and MutH. In most Eukaryotes, the analog for MutS is MSH and the analog for MutL is MLH. MutH is only present in bacteria. This is followed by removal of damaged region by an exonuclease, resynthesis by DNA polymerase, and nick sealing by DNA ligase.

Double-strand breaks

Double-strand break repair pathway models

Double-strand breaks, in which both strands in the double helix are severed, are particularly hazardous to the cell because they can lead to genome rearrangements. In fact, when a double-strand break is accompanied by a cross-linkage joining the two strands at the same point, neither strand can be used as a template for the repair mechanisms, so that the cell will not be able to complete mitosis when it next divides, and will either die or, in rare cases, undergo a mutation. Three mechanisms exist to repair double-strand breaks (DSBs): non-homologous end joining (NHEJ), microhomology-mediated end joining (MMEJ), and homologous recombination (HR). In an in vitro system, MMEJ occurred in mammalian cells at the levels of 10–20% of HR when both HR and NHEJ mechanisms were also available.

DNA ligase, shown above repairing chromosomal damage, is an enzyme that joins broken nucleotides together by catalyzing the formation of an internucleotide ester bond between the phosphate backbone and the deoxyribose nucleotides.

In NHEJ, DNA Ligase IV, a specialized DNA ligase that forms a complex with the cofactor XRCC4, directly joins the two ends. To guide accurate repair, NHEJ relies on short homologous sequences called microhomologies present on the single-stranded tails of the DNA ends to be joined. If these overhangs are compatible, repair is usually accurate. NHEJ can also introduce mutations during repair. Loss of damaged nucleotides at the break site can lead to deletions, and joining of nonmatching termini forms insertions or translocations. NHEJ is especially important before the cell has replicated its DNA, since there is no template available for repair by homologous recombination. There are "backup" NHEJ pathways in higher eukaryotes. Besides its role as a genome caretaker, NHEJ is required for joining hairpin-capped double-strand breaks induced during V(D)J recombination, the process that generates diversity in B-cell and T-cell receptors in the vertebrate immune system.

Homologous recombination requires the presence of an identical or nearly identical sequence to be used as a template for repair of the break. The enzymatic machinery responsible for this repair process is nearly identical to the machinery responsible for chromosomal crossover during meiosis. This pathway allows a damaged chromosome to be repaired using a sister chromatid (available in G2 after DNA replication) or a homologous chromosome as a template. DSBs caused by the replication machinery attempting to synthesize across a single-strand break or unrepaired lesion cause collapse of the replication fork and are typically repaired by recombination.

MMEJ starts with short-range end resection by MRE11 nuclease on either side of a double-strand break to reveal microhomology regions. In further steps, Poly (ADP-ribose) polymerase 1 (PARP1) is required and may be an early step in MMEJ. There is pairing of microhomology regions followed by recruitment of flap structure-specific endonuclease 1 (FEN1) to remove overhanging flaps. This is followed by recruitment of XRCC1–LIG3 to the site for ligating the DNA ends, leading to an intact DNA. MMEJ is always accompanied by a deletion, so that MMEJ is a mutagenic pathway for DNA repair.

The extremophile Deinococcus radiodurans has a remarkable ability to survive DNA damage from ionizing radiation and other sources. At least two copies of the genome, with random DNA breaks, can form DNA fragments through annealing. Partially overlapping fragments are then used for synthesis of homologous regions through a moving D-loop that can continue extension until they find complementary partner strands. In the final step there is crossover by means of RecA-dependent homologous recombination.

Topoisomerases introduce both single- and double-strand breaks in the course of changing the DNA's state of supercoiling, which is especially common in regions near an open replication fork. Such breaks are not considered DNA damage because they are a natural intermediate in the topoisomerase biochemical mechanism and are immediately repaired by the enzymes that created them.

Translesion synthesis

Translesion synthesis (TLS) is a DNA damage tolerance process that allows the DNA replication machinery to replicate past DNA lesions such as thymine dimers or AP sites. It involves switching out regular DNA polymerases for specialized translesion polymerases (i.e. DNA polymerase IV or V, from the Y Polymerase family), often with larger active sites that can facilitate the insertion of bases opposite damaged nucleotides. The polymerase switching is thought to be mediated by, among other factors, the post-translational modification of the replication processivity factor PCNA. Translesion synthesis polymerases often have low fidelity (high propensity to insert wrong bases) on undamaged templates relative to regular polymerases. However, many are extremely efficient at inserting correct bases opposite specific types of damage. For example, Pol η mediates error-free bypass of lesions induced by UV irradiation, whereas Pol ι introduces mutations at these sites. Pol η is known to add the first adenine across the T^T photodimer using Watson-Crick base pairing and the second adenine will be added in its syn conformation using Hoogsteen base pairing. From a cellular perspective, risking the introduction of point mutations during translesion synthesis may be preferable to resorting to more drastic mechanisms of DNA repair, which may cause gross chromosomal aberrations or cell death. In short, the process involves specialized polymerases either bypassing or repairing lesions at locations of stalled DNA replication. For example, Human DNA polymerase eta can bypass complex DNA lesions like guanine-thymine intra-strand crosslink, G[8,5-Me]T, although it can cause targeted and semi-targeted mutations. Paromita Raychaudhury and Ashis Basu studied the toxicity and mutagenesis of the same lesion in Escherichia coli by replicating a G[8,5-Me]T-modified plasmid in E. coli with specific DNA polymerase knockouts. Viability was very low in a strain lacking pol II, pol IV, and pol V, the three SOS-inducible DNA polymerases, indicating that translesion synthesis is conducted primarily by these specialized DNA polymerases. A bypass platform is provided to these polymerases by Proliferating cell nuclear antigen (PCNA). Under normal circumstances, PCNA bound to polymerases replicates the DNA. At a site of lesion, PCNA is ubiquitinated, or modified, by the RAD6/RAD18 proteins to provide a platform for the specialized polymerases to bypass the lesion and resume DNA replication. After translesion synthesis, extension is required. This extension can be carried out by a replicative polymerase if the TLS is error-free, as in the case of Pol η, yet if TLS results in a mismatch, a specialized polymerase is needed to extend it; Pol ζ. Pol ζ is unique in that it can extend terminal mismatches, whereas more processive polymerases cannot. So when a lesion is encountered, the replication fork will stall, PCNA will switch from a processive polymerase to a TLS polymerase such as Pol ι to fix the lesion, then PCNA may switch to Pol ζ to extend the mismatch, and last PCNA will switch to the processive polymerase to continue replication.

Global response to DNA damage

Cells exposed to ionizing radiation, ultraviolet light or chemicals are prone to acquire multiple sites of bulky DNA lesions and double-strand breaks. Moreover, DNA damaging agents can damage other biomolecules such as proteins, carbohydrates, lipids, and RNA. The accumulation of damage, to be specific, double-strand breaks or adducts stalling the replication forks, are among known stimulation signals for a global response to DNA damage. The global response to damage is an act directed toward the cells' own preservation and triggers multiple pathways of macromolecular repair, lesion bypass, tolerance, or apoptosis. The common features of global response are induction of multiple genes, cell cycle arrest, and inhibition of cell division.

Initial steps

The packaging of eukaryotic DNA into chromatin presents a barrier to all DNA-based processes that require recruitment of enzymes to their sites of action. To allow DNA repair, the chromatin must be remodeled. In eukaryotes, ATP dependent chromatin remodeling complexes and histone-modifying enzymes are two predominant factors employed to accomplish this remodeling process.

Chromatin relaxation occurs rapidly at the site of a DNA damage. In one of the earliest steps, the stress-activated protein kinase, c-Jun N-terminal kinase (JNK), phosphorylates SIRT6 on serine 10 in response to double-strand breaks or other DNA damage. This post-translational modification facilitates the mobilization of SIRT6 to DNA damage sites, and is required for efficient recruitment of poly (ADP-ribose) polymerase 1 (PARP1) to DNA break sites and for efficient repair of DSBs. PARP1 protein starts to appear at DNA damage sites in less than a second, with half maximum accumulation within 1.6 seconds after the damage occurs. PARP1 synthesizes polymeric adenosine diphosphate ribose (poly (ADP-ribose) or PAR) chains on itself. Next the chromatin remodeler ALC1 quickly attaches to the product of PARP1 action, a poly-ADP ribose chain, and ALC1 completes arrival at the DNA damage within 10 seconds of the occurrence of the damage. About half of the maximum chromatin relaxation, presumably due to action of ALC1, occurs by 10 seconds. This then allows recruitment of the DNA repair enzyme MRE11, to initiate DNA repair, within 13 seconds.

γH2AX, the phosphorylated form of H2AX is also involved in the early steps leading to chromatin decondensation after DNA double-strand breaks. The histone variant H2AX constitutes about 10% of the H2A histones in human chromatin. γH2AX (H2AX phosphorylated on serine 139) can be detected as soon as 20 seconds after irradiation of cells (with DNA double-strand break formation), and half maximum accumulation of γH2AX occurs in one minute. The extent of chromatin with phosphorylated γH2AX is about two million base pairs at the site of a DNA double-strand break. γH2AX does not, itself, cause chromatin decondensation, but within 30 seconds of irradiation, RNF8 protein can be detected in association with γH2AX. RNF8 mediates extensive chromatin decondensation, through its subsequent interaction with CHD4, a component of the nucleosome remodeling and deacetylase complex NuRD.

DDB2 occurs in a heterodimeric complex with DDB1. This complex further complexes with the ubiquitin ligase protein CUL4A and with PARP1. This larger complex rapidly associates with UV-induced damage within chromatin, with half-maximum association completed in 40 seconds. The PARP1 protein, attached to both DDB1 and DDB2, then PARylates (creates a poly-ADP ribose chain) on DDB2 that attracts the DNA remodeling protein ALC1. Action of ALC1 relaxes the chromatin at the site of UV damage to DNA. This relaxation allows other proteins in the nucleotide excision repair pathway to enter the chromatin and repair UV-induced cyclobutane pyrimidine dimer damages.

After rapid chromatin remodeling, cell cycle checkpoints are activated to allow DNA repair to occur before the cell cycle progresses. First, two kinases, ATM and ATR are activated within 5 or 6 minutes after DNA is damaged. This is followed by phosphorylation of the cell cycle checkpoint protein Chk1, initiating its function, about 10 minutes after DNA is damaged.

DNA damage checkpoints

After DNA damage, cell cycle checkpoints are activated. Checkpoint activation pauses the cell cycle and gives the cell time to repair the damage before continuing to divide. DNA damage checkpoints occur at the G1/S and G2/M boundaries. An intra-S checkpoint also exists. Checkpoint activation is controlled by two master kinases, ATM and ATR. ATM responds to DNA double-strand breaks and disruptions in chromatin structure, whereas ATR primarily responds to stalled replication forks. These kinases phosphorylate downstream targets in a signal transduction cascade, eventually leading to cell cycle arrest. A class of checkpoint mediator proteins including BRCA1, MDC1, and 53BP1 has also been identified. These proteins seem to be required for transmitting the checkpoint activation signal to downstream proteins.

DNA damage checkpoint is a signal transduction pathway that blocks cell cycle progression in G1, G2 and metaphase and slows down the rate of S phase progression when DNA is damaged. It leads to a pause in cell cycle allowing the cell time to repair the damage before continuing to divide.

Checkpoint Proteins can be separated into four groups: phosphatidylinositol 3-kinase (PI3K)-like protein kinase, proliferating cell nuclear antigen (PCNA)-like group, two serine/threonine(S/T) kinases and their adaptors. Central to all DNA damage induced checkpoints responses is a pair of large protein kinases belonging to the first group of PI3K-like protein kinases-the ATM (Ataxia telangiectasia mutated) and ATR (Ataxia- and Rad-related) kinases, whose sequence and functions have been well conserved in evolution. All DNA damage response requires either ATM or ATR because they have the ability to bind to the chromosomes at the site of DNA damage, together with accessory proteins that are platforms on which DNA damage response components and DNA repair complexes can be assembled.

An important downstream target of ATM and ATR is p53, as it is required for inducing apoptosis following DNA damage. The cyclin-dependent kinase inhibitor p21 is induced by both p53-dependent and p53-independent mechanisms and can arrest the cell cycle at the G1/S and G2/M checkpoints by deactivating cyclin/cyclin-dependent kinase complexes.

The prokaryotic SOS response

The SOS response is the changes in gene expression in Escherichia coli and other bacteria in response to extensive DNA damage. The prokaryotic SOS system is regulated by two key proteins: LexA and RecA. The LexA homodimer is a transcriptional repressor that binds to operator sequences commonly referred to as SOS boxes. In Escherichia coli it is known that LexA regulates transcription of approximately 48 genes including the lexA and recA genes. The SOS response is known to be widespread in the Bacteria domain, but it is mostly absent in some bacterial phyla, like the Spirochetes. The most common cellular signals activating the SOS response are regions of single-stranded DNA (ssDNA), arising from stalled replication forks or double-strand breaks, which are processed by DNA helicase to separate the two DNA strands. In the initiation step, RecA protein binds to ssDNA in an ATP hydrolysis driven reaction creating RecA–ssDNA filaments. RecA–ssDNA filaments activate LexA autoprotease activity, which ultimately leads to cleavage of LexA dimer and subsequent LexA degradation. The loss of LexA repressor induces transcription of the SOS genes and allows for further signal induction, inhibition of cell division and an increase in levels of proteins responsible for damage processing.

In Escherichia coli, SOS boxes are 20-nucleotide long sequences near promoters with palindromic structure and a high degree of sequence conservation. In other classes and phyla, the sequence of SOS boxes varies considerably, with different length and composition, but it is always highly conserved and one of the strongest short signals in the genome. The high information content of SOS boxes permits differential binding of LexA to different promoters and allows for timing of the SOS response. The lesion repair genes are induced at the beginning of SOS response. The error-prone translesion polymerases, for example, UmuCD'2 (also called DNA polymerase V), are induced later on as a last resort. Once the DNA damage is repaired or bypassed using polymerases or through recombination, the amount of single-stranded DNA in cells is decreased, lowering the amounts of RecA filaments decreases cleavage activity of LexA homodimer, which then binds to the SOS boxes near promoters and restores normal gene expression.

Eukaryotic transcriptional responses to DNA damage

Eukaryotic cells exposed to DNA damaging agents also activate important defensive pathways by inducing multiple proteins involved in DNA repair, cell cycle checkpoint control, protein trafficking and degradation. Such genome wide transcriptional response is very complex and tightly regulated, thus allowing coordinated global response to damage. Exposure of yeast Saccharomyces cerevisiae to DNA damaging agents results in overlapping but distinct transcriptional profiles. Similarities to environmental shock response indicates that a general global stress response pathway exist at the level of transcriptional activation. In contrast, different human cell types respond to damage differently indicating an absence of a common global response. The probable explanation for this difference between yeast and human cells may be in the heterogeneity of mammalian cells. In an animal different types of cells are distributed among different organs that have evolved different sensitivities to DNA damage.

In general global response to DNA damage involves expression of multiple genes responsible for postreplication repair, homologous recombination, nucleotide excision repair, DNA damage checkpoint, global transcriptional activation, genes controlling mRNA decay, and many others. A large amount of damage to a cell leaves it with an important decision: undergo apoptosis and die, or survive at the cost of living with a modified genome. An increase in tolerance to damage can lead to an increased rate of survival that will allow a greater accumulation of mutations. Yeast Rev1 and human polymerase η are members of [Y family translesion DNA polymerases present during global response to DNA damage and are responsible for enhanced mutagenesis during a global response to DNA damage in eukaryotes.

Aging

Pathological effects of poor DNA repair

DNA repair rate is an important determinant of cell pathology

Experimental animals with genetic deficiencies in DNA repair often show decreased life span and increased cancer incidence. For example, mice deficient in the dominant NHEJ pathway and in telomere maintenance mechanisms get lymphoma and infections more often, and, as a consequence, have shorter lifespans than wild-type mice. In similar manner, mice deficient in a key repair and transcription protein that unwinds DNA helices have premature onset of aging-related diseases and consequent shortening of lifespan. However, not every DNA repair deficiency creates exactly the predicted effects; mice deficient in the NER pathway exhibited shortened life span without correspondingly higher rates of mutation.

If the rate of DNA damage exceeds the capacity of the cell to repair it, the accumulation of errors can overwhelm the cell and result in early senescence, apoptosis, or cancer. Inherited diseases associated with faulty DNA repair functioning result in premature aging, increased sensitivity to carcinogens, and correspondingly increased cancer risk (see below). On the other hand, organisms with enhanced DNA repair systems, such as Deinococcus radiodurans, the most radiation-resistant known organism, exhibit remarkable resistance to the double-strand break-inducing effects of radioactivity, likely due to enhanced efficiency of DNA repair and especially NHEJ.

Longevity and caloric restriction

Most life span influencing genes affect the rate of DNA damage

A number of individual genes have been identified as influencing variations in life span within a population of organisms. The effects of these genes is strongly dependent on the environment, in particular, on the organism's diet. Caloric restriction reproducibly results in extended lifespan in a variety of organisms, likely via nutrient sensing pathways and decreased metabolic rate. The molecular mechanisms by which such restriction results in lengthened lifespan are as yet unclear; however, the behavior of many genes known to be involved in DNA repair is altered under conditions of caloric restriction. Several agents reported to have anti-aging properties have been shown to attenuate constitutive level of mTOR signaling, an evidence of reduction of metabolic activity, and concurrently to reduce constitutive level of DNA damage induced by endogenously generated reactive oxygen species.

For example, increasing the gene dosage of the gene SIR-2, which regulates DNA packaging in the nematode worm Caenorhabditis elegans, can significantly extend lifespan. The mammalian homolog of SIR-2 is known to induce downstream DNA repair factors involved in NHEJ, an activity that is especially promoted under conditions of caloric restriction. Caloric restriction has been closely linked to the rate of base excision repair in the nuclear DNA of rodents, although similar effects have not been observed in mitochondrial DNA.

The C. elegans gene AGE-1, an upstream effector of DNA repair pathways, confers dramatically extended life span under free-feeding conditions but leads to a decrease in reproductive fitness under conditions of caloric restriction. This observation supports the pleiotropy theory of the biological origins of aging, which suggests that genes conferring a large survival advantage early in life will be selected for even if they carry a corresponding disadvantage late in life.

Medicine and DNA repair modulation

Hereditary DNA repair disorders

Defects in the NER mechanism are responsible for several genetic disorders, including:

Xeroderma pigmentosum: hypersensitivity to sunlight/UV, resulting in increased skin cancer incidence and premature aging
Cockayne syndrome: hypersensitivity to UV and chemical agents
Trichothiodystrophy: sensitive skin, brittle hair and nails

Mental retardation often accompanies the latter two disorders, suggesting increased vulnerability of developmental neurons.

Other DNA repair disorders include:

Werner's syndrome: premature aging and retarded growth
Bloom's syndrome: sunlight hypersensitivity, high incidence of malignancies (especially leukemias).
Ataxia telangiectasia: sensitivity to ionizing radiation and some chemical agents

All of the above diseases are often called "segmental progerias" ("accelerated aging diseases") because their victims appear elderly and suffer from aging-related diseases at an abnormally young age, while not manifesting all the symptoms of old age.

Other diseases associated with reduced DNA repair function include Fanconi anemia, hereditary breast cancer and hereditary colon cancer.

Cancer

Because of inherent limitations in the DNA repair mechanisms, if humans lived long enough, they would all eventually develop cancer. There are at least 34 Inherited human DNA repair gene mutations that increase cancer risk. Many of these mutations cause DNA repair to be less effective than normal. In particular, Hereditary nonpolyposis colorectal cancer (HNPCC) is strongly associated with specific mutations in the DNA mismatch repair pathway. BRCA1 and BRCA2, two important genes whose mutations confer a hugely increased risk of breast cancer on carriers, are both associated with a large number of DNA repair pathways, especially NHEJ and homologous recombination.

Cancer therapy procedures such as chemotherapy and radiotherapy work by overwhelming the capacity of the cell to repair DNA damage, resulting in cell death. Cells that are most rapidly dividing – most typically cancer cells – are preferentially affected. The side-effect is that other non-cancerous but rapidly dividing cells such as progenitor cells in the gut, skin, and hematopoietic system are also affected. Modern cancer treatments attempt to localize the DNA damage to cells and tissues only associated with cancer, either by physical means (concentrating the therapeutic agent in the region of the tumor) or by biochemical means (exploiting a feature unique to cancer cells in the body). In the context of therapies targeting DNA damage response genes, the latter approach has been termed 'synthetic lethality'.

Perhaps the most well-known of these 'synthetic lethality' drugs is the poly(ADP-ribose) polymerase 1 (PARP1) inhibitor olaparib, which was approved by the Food and Drug Administration in 2015 for the treatment in women of BRCA-defective ovarian cancer. Tumor cells with partial loss of DNA damage response (specifically, homologous recombination repair) are dependent on another mechanism – single-strand break repair – which is a mechanism consisting, in part, of the PARP1 gene product. Olaparib is combined with chemotherapeutics to inhibit single-strand break repair induced by DNA damage caused by the co-administered chemotherapy. Tumor cells relying on this residual DNA repair mechanism are unable to repair the damage and hence are not able to survive and proliferate, whereas normal cells can repair the damage with the functioning homologous recombination mechanism.

Many other drugs for use against other residual DNA repair mechanisms commonly found in cancer are currently under investigation. However, synthetic lethality therapeutic approaches have been questioned due to emerging evidence of acquired resistance, achieved through rewiring of DNA damage response pathways and reversion of previously-inhibited defects.

DNA repair defects in cancer

It has become apparent over the past several years that the DNA damage response acts as a barrier to the malignant transformation of preneoplastic cells. Previous studies have shown an elevated DNA damage response in cell-culture models with oncogene activation and preneoplastic colon adenomas. DNA damage response mechanisms trigger cell-cycle arrest, and attempt to repair DNA lesions or promote cell death/senescence if repair is not possible. Replication stress is observed in preneoplastic cells due to increased proliferation signals from oncogenic mutations. Replication stress is characterized by: increased replication initiation/origin firing; increased transcription and collisions of transcription-replication complexes; nucleotide deficiency; increase in reactive oxygen species (ROS).

Replication stress, along with the selection for inactivating mutations in DNA damage response genes in the evolution of the tumor, leads to downregulation and/or loss of some DNA damage response mechanisms, and hence loss of DNA repair and/or senescence/programmed cell death. In experimental mouse models, loss of DNA damage response-mediated cell senescence was observed after using a short hairpin RNA (shRNA) to inhibit the double-strand break response kinase ataxia telangiectasia (ATM), leading to increased tumor size and invasiveness. Humans born with inherited defects in DNA repair mechanisms (for example, Li-Fraumeni syndrome) have a higher cancer risk.

The prevalence of DNA damage response mutations differs across cancer types; for example, 30% of breast invasive carcinomas have mutations in genes involved in homologous recombination. In cancer, downregulation is observed across all DNA damage response mechanisms (base excision repair (BER), nucleotide excision repair (NER), DNA mismatch repair (MMR), homologous recombination repair (HR), non-homologous end joining (NHEJ) and translesion DNA synthesis (TLS). As well as mutations to DNA damage repair genes, mutations also arise in the genes responsible for arresting the cell cycle to allow sufficient time for DNA repair to occur, and some genes are involved in both DNA damage repair and cell cycle checkpoint control, for example ATM and checkpoint kinase 2 (CHEK2) – a tumor suppressor that is often absent or downregulated in non-small cell lung cancer.

	HR	NHEJ	SSA	FA	BER	NER	MMR
ATM	x	x	x
ATR	x	x	x
PAXIP	x	x
RPA	x		x			x
BRCA1	x			x
BRCA2	x			x
RAD51	x			x
RFC	x				x	x
XRCC1					x	x
PCNA					x	x	x
PARP1		x			x
ERCC1	x		x	x		x
MSH3	x		x				x

Table: Genes involved in DNA damage response pathways and frequently mutated in cancer (HR = homologous recombination; NHEJ = non-homologous end joining; SSA = single-strand annealing; FA = fanconi anemia pathway; BER = base excision repair; NER = nucleotide excision repair; MMR = mismatch repair)

Epigenetic DNA repair defects in cancer

Classically, cancer has been viewed as a set of diseases that are driven by progressive genetic abnormalities that include mutations in tumour-suppressor genes and oncogenes, and chromosomal aberrations. However, it has become apparent that cancer is also driven by epigenetic alterations.

Epigenetic alterations refer to functionally relevant modifications to the genome that do not involve a change in the nucleotide sequence. Examples of such modifications are changes in DNA methylation (hypermethylation and hypomethylation) and histone modification, changes in chromosomal architecture (caused by inappropriate expression of proteins such as HMGA2 or HMGA1) and changes caused by microRNAs. Each of these epigenetic alterations serves to regulate gene expression without altering the underlying DNA sequence. These changes usually remain through cell divisions, last for multiple cell generations, and can be considered to be epimutations (equivalent to mutations).

While large numbers of epigenetic alterations are found in cancers, the epigenetic alterations in DNA repair genes, causing reduced expression of DNA repair proteins, appear to be particularly important. Such alterations are thought to occur early in progression to cancer and to be a likely cause of the genetic instability characteristic of cancers.

Reduced expression of DNA repair genes causes deficient DNA repair. When DNA repair is deficient DNA damages remain in cells at a higher than usual level and these excess damages cause increased frequencies of mutation or epimutation. Mutation rates increase substantially in cells defective in DNA mismatch repair or in homologous recombinational repair (HRR). Chromosomal rearrangements and aneuploidy also increase in HRR defective cells.

Higher levels of DNA damage not only cause increased mutation, but also cause increased epimutation. During repair of DNA double strand breaks, or repair of other DNA damages, incompletely cleared sites of repair can cause epigenetic gene silencing.

Deficient expression of DNA repair proteins due to an inherited mutation can cause increased risk of cancer. Individuals with an inherited impairment in any of 34 DNA repair genes have an increased risk of cancer, with some defects causing up to a 100% lifetime chance of cancer (e.g. p53 mutations). However, such germline mutations (which cause highly penetrant cancer syndromes) are the cause of only about 1 percent of cancers.

Frequencies of epimutations in DNA repair genes

A chart of common DNA damaging agents, examples of lesions they cause in DNA, and pathways used to repair these lesions. Also shown are many of the genes in these pathways, an indication of which genes are epigenetically regulated to have reduced (or increased) expression in various cancers. It also shows genes in the error prone microhomology-mediated end joining pathway with increased expression in various cancers.

Deficiencies in DNA repair enzymes are occasionally caused by a newly arising somatic mutation in a DNA repair gene, but are much more frequently caused by epigenetic alterations that reduce or silence expression of DNA repair genes. For example, when 113 colorectal cancers were examined in sequence, only four had a missense mutation in the DNA repair gene MGMT, while the majority had reduced MGMT expression due to methylation of the MGMT promoter region (an epigenetic alteration). Five different studies found that between 40% and 90% of colorectal cancers have reduced MGMT expression due to methylation of the MGMT promoter region.

Similarly, out of 119 cases of mismatch repair-deficient colorectal cancers that lacked DNA repair gene PMS2 expression, PMS2 was deficient in 6 due to mutations in the PMS2 gene, while in 103 cases PMS2 expression was deficient because its pairing partner MLH1 was repressed due to promoter methylation (PMS2 protein is unstable in the absence of MLH1). In the other 10 cases, loss of PMS2 expression was likely due to epigenetic overexpression of the microRNA, miR-155, which down-regulates MLH1.

In a further example, epigenetic defects were found in various cancers (e.g. breast, ovarian, colorectal and head and neck). Two or three deficiencies in the expression of ERCC1, XPF or PMS2 occur simultaneously in the majority of 49 colon cancers evaluated by Facista et al.

The chart in this section shows some frequent DNA damaging agents, examples of DNA lesions they cause, and the pathways that deal with these DNA damages. At least 169 enzymes are either directly employed in DNA repair or influence DNA repair processes. Of these, 83 are directly employed in repairing the 5 types of DNA damages illustrated in the chart.

Some of the more well studied genes central to these repair processes are shown in the chart. The gene designations shown in red, gray or cyan indicate genes frequently epigenetically altered in various types of cancers. Wikipedia articles on each of the genes highlighted by red, gray or cyan describe the epigenetic alteration(s) and the cancer(s) in which these epimutations are found. Review articles, and broad experimental survey articles also document most of these epigenetic DNA repair deficiencies in cancers.

Red-highlighted genes are frequently reduced or silenced by epigenetic mechanisms in various cancers. When these genes have low or absent expression, DNA damages can accumulate. Replication errors past these damages can lead to increased mutations and, ultimately, cancer. Epigenetic repression of DNA repair genes in accurate DNA repair pathways appear to be central to carcinogenesis.

The two gray-highlighted genes RAD51 and BRCA2, are required for homologous recombinational repair. They are sometimes epigenetically over-expressed and sometimes under-expressed in certain cancers. As indicated in the Wikipedia articles on RAD51 and BRCA2, such cancers ordinarily have epigenetic deficiencies in other DNA repair genes. These repair deficiencies would likely cause increased unrepaired DNA damages. The over-expression of RAD51 and BRCA2 seen in these cancers may reflect selective pressures for compensatory RAD51 or BRCA2 over-expression and increased homologous recombinational repair to at least partially deal with such excess DNA damages. In those cases where RAD51 or BRCA2 are under-expressed, this would itself lead to increased unrepaired DNA damages. Replication errors past these damages could cause increased mutations and cancer, so that under-expression of RAD51 or BRCA2 would be carcinogenic in itself.

Cyan-highlighted genes are in the microhomology-mediated end joining (MMEJ) pathway and are up-regulated in cancer. MMEJ is an additional error-prone inaccurate repair pathway for double-strand breaks. In MMEJ repair of a double-strand break, an homology of 5–25 complementary base pairs between both paired strands is sufficient to align the strands, but mismatched ends (flaps) are usually present. MMEJ removes the extra nucleotides (flaps) where strands are joined, and then ligates the strands to create an intact DNA double helix. MMEJ almost always involves at least a small deletion, so that it is a mutagenic pathway. FEN1, the flap endonuclease in MMEJ, is epigenetically increased by promoter hypomethylation and is over-expressed in the majority of cancers of the breast, prostate, stomach, neuroblastomas, pancreas, and lung. PARP1 is also over-expressed when its promoter region ETS site is epigenetically hypomethylated, and this contributes to progression to endometrial cancer and BRCA-mutated serous ovarian cancer. Other genes in the MMEJ pathway are also over-expressed in a number of cancers (see MMEJ for summary), and are also shown in cyan.

Genome-wide distribution of DNA repair in human somatic cells

Differential activity of DNA repair pathways across various regions of the human genome causes mutations to be very unevenly distributed within tumor genomes. In particular, the gene-rich, early-replicating regions of the human genome exhibit lower mutation frequencies than the gene-poor, late-replicating heterochromatin. One mechanism underlying this involves the histone modification H3K36me3, which can recruit mismatch repair proteins, thereby lowering mutation rates in H3K36me3-marked regions. Another important mechanism concerns nucleotide excision repair, which can be recruited by the transcription machinery, lowering somatic mutation rates in active genes and other open chromatin regions.

Evolution

The basic processes of DNA repair are highly conserved among both prokaryotes and eukaryotes and even among bacteriophages (viruses which infect bacteria); however, more complex organisms with more complex genomes have correspondingly more complex repair mechanisms. The ability of a large number of protein structural motifs to catalyze relevant chemical reactions has played a significant role in the elaboration of repair mechanisms during evolution. For an extremely detailed review of hypotheses relating to the evolution of DNA repair, see.

The fossil record indicates that single-cell life began to proliferate on the planet at some point during the Precambrian period, although exactly when recognizably modern life first emerged is unclear. Nucleic acids became the sole and universal means of encoding genetic information, requiring DNA repair mechanisms that in their basic form have been inherited by all extant life forms from their common ancestor. The emergence of Earth's oxygen-rich atmosphere (known as the "oxygen catastrophe") due to photosynthetic organisms, as well as the presence of potentially damaging free radicals in the cell due to oxidative phosphorylation, necessitated the evolution of DNA repair mechanisms that act specifically to counter the types of damage induced by oxidative stress.

Rate of evolutionary change

On some occasions, DNA damage is not repaired, or is repaired by an error-prone mechanism that results in a change from the original sequence. When this occurs, mutations may propagate into the genomes of the cell's progeny. Should such an event occur in a germ line cell that will eventually produce a gamete, the mutation has the potential to be passed on to the organism's offspring. The rate of evolution in a particular species (or, in a particular gene) is a function of the rate of mutation. As a consequence, the rate and accuracy of DNA repair mechanisms have an influence over the process of evolutionary change. DNA damage protection and repair does not influence the rate of adaptation by gene regulation and by recombination and selection of alleles. On the other hand, DNA damage repair and protection does influence the rate of accumulation of irreparable, advantageous, code expanding, inheritable mutations, and slows down the evolutionary mechanism for expansion of the genome of organisms with new functionalities. The tension between evolvability and mutation repair and protection needs further investigation.

Technology

A technology named clustered regularly interspaced short palindromic repeat (shortened to CRISPR-Cas9) was discovered in 2012. The new technology allows anyone with molecular biology training to alter the genes of any species with precision, by inducing DNA damage at a specific point and then altering DNA repair mechanisms to insert new genes. It is cheaper, more efficient, and more precise than other technologies. With the help of CRISPR–Cas9, parts of a genome can be edited by scientists by removing, adding, or altering parts in a DNA sequence.

Search This Blog

Tuesday, August 18, 2020

Human mitochondrial genetics

Quantity

Inheritance patterns

Genes

Electron transport chain, and humanin

rRNA

tRNA

Location of genes

Genetic code variants

Replication, repair, transcription, and translation

Mitochondrial DNA polymerase

Damage and transcription error

Chromosomally mediated mtDNA replication errors

Mitochondrial diseases

Contribution of mitochondrial versus nuclear genome

Presentation

Use in forensics

Examples

Human mitochondrial molecular clock

Rate variability

Actual vs. observed rates

Population size

Structural variability

Temporal variability

Parallel mutations and saturation

Heteroplasmy

Methods

Pedigree based

Phylogeny based

Pedigree vs. phylogeny comparison

Estimating based on AMH archaeology

Estimating based on CHLCA

Early, HVR, sequence-based methods

Coding region sequence based methods

Estimates from ancient DNA

Inter-comparing rates and studies

DNA repair

DNA damage

Sources

Types

Nuclear versus mitochondrial

Senescence and apoptosis

Mutation

Mechanisms

Direct reversal

Single-strand damage

Double-strand breaks

Translesion synthesis

Global response to DNA damage

Initial steps

DNA damage checkpoints

The prokaryotic SOS response

Eukaryotic transcriptional responses to DNA damage

Aging

Pathological effects of poor DNA repair

Longevity and caloric restriction

Medicine and DNA repair modulation

Hereditary DNA repair disorders

Cancer

DNA repair defects in cancer

Epigenetic DNA repair defects in cancer

Frequencies of epimutations in DNA repair genes

Genome-wide distribution of DNA repair in human somatic cells

Evolution

Rate of evolutionary change

Technology

Hard problem of consciousness