Molecular phylogenetics (/məˈlɛkjʊlərˌfaɪloʊdʒəˈnɛtɪks,mɒ-,moʊ-/) is the branch of phylogeny
that analyzes genetic, hereditary molecular differences, predominantly
in DNA sequences, to gain information on an organism's evolutionary
relationships. From these analyses, it is possible to determine the
processes by which diversity among species has been achieved. The result
of a molecular phylogenetic analysis is expressed in a phylogenetic tree. Molecular phylogenetics is one aspect of molecular systematics, a broader term that also includes the use of molecular data in taxonomy and biogeography.
Molecular phylogenetics and molecular evolution
correlate. Molecular evolution is the process of selective changes
(mutations) at a molecular level (genes, proteins, etc.) throughout
various branches in the tree of life (evolution). Molecular
phylogenetics makes inferences of the evolutionary relationships that
arise due to molecular evolution and results in the construction of a
phylogenetic tree.
Early attempts at molecular systematics were also termed chemotaxonomy and made use of proteins, enzymes, carbohydrates, and other molecules that were separated and characterized using techniques such as chromatography. These have been replaced in recent times largely by DNA sequencing, which produces the exact sequences of nucleotides or bases
in either DNA or RNA segments extracted using different techniques. In
general, these are considered superior for evolutionary studies, since
the actions of evolution are ultimately reflected in the genetic
sequences. At present, it is still a long and expensive process to
sequence the entire DNA of an organism (its genome). However, it is quite feasible to determine the sequence of a defined area of a particular chromosome. Typical molecular systematic analyses require the sequencing of around 1000 base pairs.
At any location within such a sequence, the bases found in a given
position may vary between organisms. The particular sequence found in a
given organism is referred to as its haplotype. In principle, since there are four base types, with 1000 base pairs, we could have 41000
distinct haplotypes. However, for organisms within a particular species
or in a group of related species, it has been found empirically that
only a minority of sites show any variation at all, and most of the
variations that are found are correlated, so that the number of distinct
haplotypes that are found is relatively small.
In a molecular systematic analysis, the haplotypes are determined for a defined area of genetic material; a substantial sample of individuals of the target species or other taxon
is used; however, many current studies are based on single individuals.
Haplotypes of individuals of closely related, yet different, taxa are
also determined. Finally, haplotypes from a smaller number of
individuals from a definitely different taxon are determined: these are
referred to as an outgroup.
The base sequences for the haplotypes are then compared. In the
simplest case, the difference between two haplotypes is assessed by
counting the number of locations where they have different bases: this
is referred to as the number of substitutions (other kinds of differences between haplotypes can also occur, for example, the insertion of a section of nucleic acid in one haplotype that is not present in another). The difference between organisms is usually re-expressed as a percentage divergence,
by dividing the number of substitutions by the number of base pairs
analysed: the hope is that this measure will be independent of the
location and length of the section of DNA that is sequenced.
An older and superseded approach was to determine the divergences between the genotypes of individuals by DNA-DNA hybridization.
The advantage claimed for using hybridization rather than gene
sequencing was that it was based on the entire genotype, rather than on
particular sections of DNA. Modern sequence comparison techniques
overcome this objection by the use of multiple sequences.
Once the divergences between all pairs of samples have been determined, the resulting triangular matrix of differences is submitted to some form of statistical cluster analysis, and the resulting dendrogram
is examined in order to see whether the samples cluster in the way that
would be expected from current ideas about the taxonomy of the group.
Any group of haplotypes that are all more similar to one another than
any of them is to any other haplotype may be said to constitute a clade, which may be visually represented as the figure displayed on the right demonstrates. Statistical techniques such as bootstrapping and jackknifing help in providing reliability estimates for the positions of haplotypes within the evolutionary trees.
Techniques and applications
Every living organism contains deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and proteins. In general, closely related organisms have a high degree of similarity in the molecular structure
of these substances, while the molecules of organisms distantly related
often show a pattern of dissimilarity. Conserved sequences, such as
mitochondrial DNA, are expected to accumulate mutations over time, and
assuming a constant rate of mutation, provide a molecular clock for dating divergence. Molecular phylogeny uses such data to build a "relationship tree" that shows the probable evolution of various organisms. With the invention of Sanger sequencing in 1977, it became possible to isolate and identify these molecular structures. High-throughput sequencing may also be used to obtain the transcriptome of an organism, allowing inference of phylogenetic relationships using transcriptomic data.
The most common approach is the comparison of homologous sequences for genes using sequence alignment techniques to identify similarity. Another application of molecular phylogeny is in DNA barcoding, wherein the species of an individual organism is identified using small sections of mitochondrial DNA or chloroplast DNA.
Another application of the techniques that make this possible can be
seen in the very limited field of human genetics, such as the
ever-more-popular use of genetic testing to determine a child's paternity, as well as the emergence of a new branch of criminal forensics focused on evidence known as genetic fingerprinting.
Molecular phylogenetic analysis
There
are several methods available for performing a molecular phylogenetic
analysis. One method, including a comprehensive step-by-step protocol on
constructing a phylogenetic tree, including DNA/Amino Acid contiguous
sequence assembly, multiple sequence alignment,
model-test (testing best-fitting substitution models), and phylogeny
reconstruction using Maximum Likelihood and Bayesian Inference, is
available at Nature Protocol.
Another molecular phylogenetic analysis technique has been
described by Pevsner and shall be summarized in the sentences to follow
(Pevsner, 2015). A phylogenetic analysis typically consists of five
major steps. The first stage comprises sequence acquisition. The
following step consists of performing a multiple sequence alignment,
which is the fundamental basis of constructing a phylogenetic tree. The
third stage includes different models of DNA and amino acid
substitution. Several models of substitution exist. A few examples
include Hamming distance, the Jukes and Cantor one-parameter model, and the Kimura two-parameter model (see Models of DNA evolution).
The fourth stage consists of various methods of tree building,
including distance-based and character-based methods. The normalized
Hamming distance and the Jukes-Cantor correction formulas provide the
degree of divergence and the probability that a nucleotide changes to
another, respectively. Common tree-building methods include unweighted
pair group method using arithmetic mean (UPGMA) and Neighbor joining, which are distance-based methods, Maximum parsimony, which is a character-based method, and Maximum likelihood estimation and Bayesian inference,
which are character-based/model-based methods. UPGMA is a simple
method; however, it is less accurate than the neighbor-joining approach.
Finally, the last step comprises evaluating the trees. This assessment
of accuracy is composed of consistency, efficiency, and robustness.
MEGA (molecular evolutionary genetics analysis)
is an analysis software that is user-friendly and free to download and
use. This software is capable of analyzing both distance-based and
character-based tree methodologies. MEGA also contains several options
one may choose to utilize, such as heuristic approaches and
bootstrapping. Bootstrapping
is an approach that is commonly used to measure the robustness of
topology in a phylogenetic tree, which demonstrates the percentage each
clade is supported after numerous replicates. In general, a value
greater than 70% is considered significant. The flow chart displayed on
the right visually demonstrates the order of the five stages of
Pevsner's molecular phylogenetic analysis technique that have been
described.
Limitations
Molecular systematics is an essentially cladistic approach: it assumes that classification must correspond to phylogenetic descent, and that all valid taxa must be monophyletic.
This is a limitation when attempting to determine the optimal tree(s),
which often involves bisecting and reconnecting portions of the
phylogenetic tree(s).
The recent discovery of extensive horizontal gene transfer
among organisms provides a significant complication to molecular
systematics, indicating that different genes within the same organism
can have different phylogenies. HGTs can be detected and excluded using a
number of phylogenetic methods (see Inferring horizontal gene transfer § Explicit phylogenetic methods).
In addition, molecular phylogenies are sensitive to the
assumptions and models that go into making them. Firstly, sequences must
be aligned; then, issues such as long-branch attraction, saturation, and taxon
sampling problems must be addressed. This means that strikingly
different results can be obtained by applying different models to the
same dataset. The tree-building method also brings with it specific assumptions about
tree topology, evolution speeds, and sampling. The simplistic UPGMA
assumes a rooted tree and a uniform molecular clock, both of which can
be incorrect.
p53, also known as Tumor protein P53, cellular tumor antigen p53 (UniProt name), or transformation-related protein 53 (TRP53)
is a regulatory protein that is often mutated in human cancers. The p53
proteins (originally thought to be, and often spoken of as, a single
protein) are crucial in vertebrates, where they prevent cancer formation. As such, p53 has been described as "the guardian of the genome" because of its role in conserving stability by preventing genome mutation. Hence TP53 is classified as a tumor suppressor gene.
The TP53 gene is the most frequently mutated gene (>50%) in human cancer, indicating that the TP53 gene plays a crucial role in preventing cancer formation. TP53 gene encodes proteins that bind to DNA and regulate gene expression to prevent mutations of the genome. In addition to the full-length protein, the human TP53 gene encodes at least 12 protein isoforms.
Gene
In humans, the TP53 gene is located on the short arm of chromosome 17 (17p13.1). The gene spans 20 kb, with a non-coding exon 1 and a very long first intron of 10 kb, overlapping the Hp53int1
gene. The coding sequence contains five regions showing a high degree
of conservation in vertebrates, predominantly in exons 2, 5, 6, 7 and 8,
but the sequences found in invertebrates show only distant resemblance
to mammalian TP53. TP53orthologs have been identified in most mammals for which complete genome data are available.
Human TP53 gene
In humans, a common polymorphism involves the substitution of an arginine for a proline at codon
position 72 of exon 4. Many studies have investigated a genetic link
between this variation and cancer susceptibility; however, the results
have been controversial. For instance, a meta-analysis from 2009 failed
to show a link for cervical cancer. A 2011 study found that the TP53 proline mutation did have a profound effect on pancreatic cancer risk among males. A study of Arab women found that proline homozygosity at TP53 codon 72 is associated with a decreased risk for breast cancer. One study suggested that TP53 codon 72 polymorphisms, MDM2 SNP309, and A2164G may collectively be associated with non-oropharyngeal cancer susceptibility and that MDM2 SNP309 in combination with TP53 codon 72 may accelerate the development of non-oropharyngeal cancer in women. A 2011 study found that TP53 codon 72 polymorphism was associated with an increased risk of lung cancer.
Meta-analyses from 2011 found no significant associations between TP53 codon 72 polymorphisms and both colorectal cancer risk and endometrial cancer risk. A 2011 study of a Brazilian birth cohort found an association between the non-mutant arginine TP53 and individuals without a family history of cancer.
Another 2011 study found that the p53 homozygous (Pro/Pro) genotype was
associated with a significantly increased risk for renal cell
carcinoma.
Function
DNA damage and repair
p53 plays a role in regulation or progression through the cell cycle, apoptosis, and genomic stability by means of several mechanisms:
It can activate DNA repair proteins when DNA has sustained damage. Thus, it may be an important factor in aging.
It can arrest growth by holding the cell cycle at the G1/S regulation point
on DNA damage recognition—if it holds the cell here for long enough,
the DNA repair proteins will have time to fix the damage and the cell
will be allowed to continue the cell cycle.
It can initiate apoptosis (i.e., programmed cell death) if DNA damage proves to be irreparable.
WAF1/CIP1 encodes for p21 and hundreds of other down-stream genes. p21 (WAF1) binds to the G1-S/CDK (CDK4/CDK6, CDK2, and CDK1) complexes (molecules important for the G1/S transition in the cell cycle) inhibiting their activity.
When p21(WAF1) is complexed with CDK2, the cell cannot continue
to the next stage of cell division. A mutant p53 will no longer bind DNA
in an effective way, and, as a consequence, the p21 protein will not be
available to act as the "stop signal" for cell division.
Studies of human embryonic stem cells (hESCs) commonly describe the
nonfunctional p53-p21 axis of the G1/S checkpoint pathway with
subsequent relevance for cell cycle regulation and the DNA damage
response (DDR). Importantly, p21 mRNA is clearly present and upregulated
after the DDR in hESCs, but p21 protein is not detectable. In this cell
type, p53 activates numerous microRNAs (like miR-302a, miR-302b, miR-302c, and miR-302d) that directly inhibit the p21 expression in hESCs.
The p21 protein binds directly to cyclin-CDK complexes that drive
forward the cell cycle and inhibits their kinase activity, thereby
causing cell cycle arrest to allow repair to take place. p21 can also
mediate growth arrest associated with differentiation and a more
permanent growth arrest associated with cellular senescence. The p21
gene contains several p53 response elements that mediate direct binding
of the p53 protein, resulting in transcriptional activation of the gene
encoding the p21 protein.
The p53 and RB1 pathways are linked via p14ARF, raising the possibility that the pathways may regulate each other.
p53 expression can be stimulated by UV light, which also causes DNA damage. In this case, p53 can initiate events leading to tanning.
Stem cells
Levels of p53 play an important role in the maintenance of stem cells throughout development and the rest of human life.
In human embryonic stem cells (hESCs)s, p53 is maintained at low inactive levels. This is because activation of p53 leads to rapid differentiation of hESCs.
Studies have shown that knocking out p53 delays differentiation and
that adding p53 causes spontaneous differentiation, showing how p53
promotes differentiation of hESCs and plays a key role in cell cycle as a
differentiation regulator. When p53 becomes stabilized and activated in
hESCs, it increases p21 to establish a longer G1. This typically leads
to abolition of S-phase entry, which stops the cell cycle in G1, leading
to differentiation. Work in mouse embryonic stem cells has recently
shown however that the expression of P53 does not necessarily lead to
differentiation. p53 also activates miR-34a and miR-145, which then repress the hESCs pluripotency factors, further instigating differentiation.
In adult stem cells, p53 regulation is important for maintenance of stemness in adult stem cell niches. Mechanical signals such as hypoxia affect levels of p53 in these niche cells through the hypoxia inducible factors, HIF-1α and HIF-2α. While HIF-1α stabilizes p53, HIF-2α suppresses it.
Suppression of p53 plays important roles in cancer stem cell phenotype,
induced pluripotent stem cells and other stem cell roles and behaviors,
such as blastema formation. Cells with decreased levels of p53 have
been shown to reprogram into stem cells with a much greater efficiency
than normal cells.
Papers suggest that the lack of cell cycle arrest and apoptosis gives
more cells the chance to be reprogrammed. Decreased levels of p53 were
also shown to be a crucial aspect of blastema formation in the legs of salamanders.
p53 regulation is very important in acting as a barrier between stem
cells and a differentiated stem cell state, as well as a barrier between
stem cells being functional and being cancerous.
Other
Apart from the cellular and molecular effects above, p53 has a tissue-level anticancer effect that works by inhibiting angiogenesis.
As tumors grow they need to recruit new blood vessels to supply them,
and p53 inhibits that by (i) interfering with regulators of tumor hypoxia
that also affect angiogenesis, such as HIF1 and HIF2, (ii) inhibiting
the production of angiogenic promoting factors, and (iii) directly
increasing the production of angiogenesis inhibitors, such as arresten.
The immune response to infection also involves p53 and NF-κB. Checkpoint control of the cell cycle and of apoptosis by p53 is inhibited by some infections such as Mycoplasma bacteria, raising the specter of oncogenic infection.
Regulation
p53 acts as a cellular stress sensor. It is normally kept at low levels by being constantly marked for degradation by the E3 ubiquitin ligase protein MDM2. p53 is activated in response to myriad stressors – including DNA damage (induced by either UV, IR, or chemical agents such as hydrogen peroxide), oxidative stress, osmotic shock, ribonucleotide depletion, viral lung infections
and deregulated oncogene expression. This activation is marked by two
major events. First, the half-life of the p53 protein is increased
drastically, leading to a quick accumulation of p53 in stressed cells.
Second, a conformational change forces p53 to be activated as a transcription regulator in these cells. The critical event leading to the activation of p53 is the phosphorylation of its N-terminal
domain. The N-terminal transcriptional activation domain contains a
large number of phosphorylation sites and can be considered as the
primary target for protein kinases transducing stress signals.
The protein kinases
that are known to target this transcriptional activation domain of p53
can be roughly divided into two groups. A first group of protein kinases
belongs to the MAPK
family (JNK1-3, ERK1-2, p38 MAPK), which is known to respond to several
types of stress, such as membrane damage, oxidative stress, osmotic
shock, heat shock, etc. A second group of protein kinases (ATR, ATM, CHK1 and CHK2, DNA-PK, CAK, TP53RK)
is implicated in the genome integrity checkpoint, a molecular cascade
that detects and responds to several forms of DNA damage caused by
genotoxic stress. Oncogenes also stimulate p53 activation, mediated by the protein p14ARF.
In unstressed cells, p53 levels are kept low through a continuous degradation of p53. A protein called Mdm2 (also called HDM2 in humans), binds to p53, preventing its action and transports it from the nucleus to the cytosol. Mdm2 also acts as an ubiquitin ligase and covalently attaches ubiquitin to p53 and thus marks p53 for degradation by the proteasome. However, ubiquitylation of p53 is reversible. On activation of p53, Mdm2 is also activated, setting up a feedback loop. p53 levels can show oscillations
(or repeated pulses) in response to certain stresses, and these pulses
can be important in determining whether the cells survive the stress, or
die.
MI-63 binds to MDM2, reactivating p53 in situations where p53's function has become inhibited.
A ubiquitin specific protease, USP7 (or HAUSP), can cleave ubiquitin off p53, thereby protecting it from proteasome-dependent degradation via the ubiquitin ligase pathway. This is one means by which p53 is stabilized in response to oncogenic insults. USP42 has also been shown to deubiquitinate p53 and may be required for the ability of p53 to respond to stress.
Recent research has shown that HAUSP is mainly localized in the
nucleus, though a fraction of it can be found in the cytoplasm and
mitochondria. Overexpression of HAUSP results in p53 stabilization.
However, depletion of HAUSP does not result in a decrease in p53 levels
but rather increases p53 levels due to the fact that HAUSP binds and
deubiquitinates Mdm2. It has been shown that HAUSP is a better binding
partner to Mdm2 than p53 in unstressed cells.
USP10,
however, has been shown to be located in the cytoplasm in unstressed
cells and deubiquitinates cytoplasmic p53, reversing Mdm2
ubiquitination. Following DNA damage, USP10 translocates to the nucleus
and contributes to p53 stability. Also USP10 does not interact with
Mdm2.
Phosphorylation of the N-terminal end of p53 by the
above-mentioned protein kinases disrupts Mdm2-binding. Other proteins,
such as Pin1, are then recruited to p53 and induce a conformational
change in p53, which prevents Mdm2-binding even more. Phosphorylation
also allows for binding of transcriptional coactivators, like p300 and PCAF, which then acetylate the C-terminal
end of p53, exposing the DNA binding domain of p53, allowing it to
activate or repress specific genes. Deacetylase enzymes, such as Sirt1 and Sirt7, can deacetylate p53, leading to an inhibition of apoptosis. Some oncogenes can also stimulate the transcription of proteins that bind to MDM2 and inhibit its activity.
Epigenetic marks like histone methylation can also regulate p53,
for example, p53 interacts directly with a repressive Trim24 cofactor
that binds histones in regions of the genome that are epigenetically
repressed.
Trim24 prevents p53 from activating its targets, but only in these
regions, effectively giving p53 the ability to 'read out' the histone
profile at key target genes and act in a gene-specific manner.
Role in disease
If the TP53 gene is damaged, tumor suppression is severely compromised. People who inherit only one functional copy of the TP53 gene will most likely develop tumors in early adulthood, a disorder known as Li–Fraumeni syndrome.
The TP53 gene can also be modified by mutagens (chemicals, radiation, or viruses), increasing the likelihood for uncontrolled cell division. More than 50 percent of human tumors contain a mutation or deletion of the TP53 gene. Loss of p53 creates genomic instability that most often results in an aneuploidy phenotype.
Increasing the amount of p53 may seem a solution for treatment of
tumors or prevention of their spreading. This, however, is not a usable
method of treatment, since it can cause premature aging. Restoring endogenous
normal p53 function holds some promise. Research has shown that this
restoration can lead to regression of certain cancer cells without
damaging other cells in the process. The ways by which tumor regression
occurs depends mainly on the tumor type. For example, restoration of
endogenous p53 function in lymphomas may induce apoptosis,
while cell growth may be reduced to normal levels. Thus,
pharmacological reactivation of p53 presents itself as a viable cancer
treatment option. The first commercial gene therapy, Gendicine, was approved in China in 2003 for the treatment of head and neck squamous cell carcinoma. It delivers a functional copy of the p53 gene using an engineered adenovirus.
Certain pathogens can also affect the p53 protein that the TP53 gene expresses. One such example, human papillomavirus
(HPV), encodes a protein, E6, which binds to the p53 protein and
inactivates it. This mechanism, in synergy with the inactivation of the
cell cycle regulator pRb by the HPV protein E7, allows for repeated cell division manifested clinically as warts. Certain HPV types, in particular types 16 and 18, can also lead to progression from a benign wart to low or high-grade cervical dysplasia, which are reversible forms of precancerous lesions. Persistent infection of the cervix over the years can cause irreversible changes leading to carcinoma in situ
and eventually invasive cervical cancer. This results from the effects
of HPV genes, particularly those encoding E6 and E7, which are the two
viral oncoproteins that are preferentially retained and expressed in
cervical cancers by integration of the viral DNA into the host genome.
The p53 protein is continually produced and degraded in cells of
healthy people, resulting in damped oscillation. The degradation of the p53 protein is associated with binding of MDM2. In a negative feedback
loop, MDM2 itself is induced by the p53 protein. Mutant p53 proteins
often fail to induce MDM2, causing p53 to accumulate at very high
levels. Moreover, the mutant p53 protein itself can inhibit normal p53
protein levels. In some cases, single missense mutations in p53 have
been shown to disrupt p53 stability and function.
This image shows different patterns of p53 expression in endometrial cancers on chromogenic immunohistochemistry,
whereof all except wild-type are variably termed
abnormal/aberrant/mutation-type and are strongly predictive of an
underlying TP53 mutation:
Wild-type, upper left: Endometrial endometrioid carcinoma
showing normal wild-type pattern of p53 expression with variable
proportion of tumor cell nuclei staining with variable intensity. Note,
this wild-type pattern should not be reported as "positive," because
this is ambiguous reporting language.
Overexpression, upper right: Endometrial endometrioid
carcinoma, grade 3, with overexpression, showing strong staining in
virtually all tumor cell nuclei, much stronger compared with the
internal control of fibroblasts in the center. Note, there is some
cytoplasmic background indicating that this staining is quite strong but
this should not be interpreted as abnormal cytoplasmic pattern.
Complete absence, lower left: Endometrial serous carcinoma
showing complete absence of p53 expression with internal control showing
moderate to strong but variable staining. Note, wild-type pattern in
normal atrophic glands at 12 and 6 o'clock.
Both cytoplasmic and nuclear, lower right: Endometrial
endometrioid carcinoma showing cytoplasmic p53 expression with internal
control (stroma and normal endometrial glands) showing nuclear wild-type
pattern. The cytoplasmic pattern is accompanied by nuclear staining of
similar intensity.
Suppression of p53 in human breast cancer cells is shown to lead to increased CXCR5 chemokine receptor gene expression and activated cell migration in response to chemokineCXCL13.
One study found that p53 and Myc proteins were key to the survival of Chronic Myeloid Leukaemia (CML) cells. Targeting p53 and Myc proteins with drugs gave positive results on mice with CML.
Experimental analysis of p53 mutations
Most
p53 mutations are detected by DNA sequencing. However, it is known that
single missense mutations can have a large spectrum from rather mild to
very severe functional effects.
The large spectrum of cancer phenotypes due to mutations in the TP53 gene is also supported by the fact that different isoforms of p53 proteins have different cellular mechanisms for prevention against cancer. Mutations in TP53
can give rise to different isoforms, preventing their overall
functionality in different cellular mechanisms and thereby extending the
cancer phenotype from mild to severe. Recent studies show that p53
isoforms are differentially expressed in different human tissues, and
the loss-of-function or gain-of-function mutations within the isoforms can cause tissue-specific cancer or provide cancer stem cellpotential in different tissues. TP53 mutation also hits energy metabolism and increases glycolysis in breast cancer cells.
The dynamics of p53 proteins, along with its antagonist Mdm2, indicate that the levels of p53, in units of concentration, oscillate as a function of time. This "damped" oscillation is both clinically documented and mathematically modelled. Mathematical models also indicate that the p53 concentration oscillates much faster once teratogens, such as double-stranded breaks (DSB) or UV radiation, are introduced to the system. This supports and models the current understanding of p53 dynamics, where DNA damage induces p53 activation (see p53 regulation
for more information). Current models can also be useful for modelling
the mutations in p53 isoforms and their effects on p53 oscillation,
thereby promoting de novo tissue-specific pharmacological drug discovery.
Warren Maltzman, of the Waksman Institute of Rutgers University
first demonstrated that TP53 was responsive to DNA damage in the form of
ultraviolet radiation. In a series of publications in 1991–92, Michael Kastan of Johns Hopkins University, reported that TP53 was a critical part of a signal transduction pathway that helped cells respond to DNA damage.
In 1993, p53 was voted molecule of the year by Science magazine.
an acidic N-terminus transcription-activation domain (TAD), also known as activation domain 1 (AD1), which activates transcription factors.
The N-terminus contains two complementary transcriptional activation
domains, with a major one at residues 1–42 and a minor one at residues
55–75, specifically involved in the regulation of several pro-apoptotic
genes.
activation domain 2 (AD2) important for apoptotic activity: residues 43–63.
proline rich domain important for the apoptotic activity of p53 by nuclear exportation via MAPK: residues 64–92.
central DNA-binding core domain (DBD). Contains one zinc atom and several arginine amino acids: residues 102–292. This region is responsible for binding the p53 co-repressor LMO3.
homo-oligomerisation domain (OD): residues 307–355. Tetramerization is essential for the activity of p53 in vivo.
C-terminal involved in downregulation of DNA binding of the central domain: residues 356–393.
Mutations that deactivate p53 in cancer usually occur in the DBD.
Most of these mutations destroy the ability of the protein to bind to
its target DNA sequences, and thus prevents transcriptional activation
of these genes. As such, mutations in the DBD are recessiveloss-of-function mutations. Molecules of p53 with mutations in the OD dimerise with wild-type
p53, and prevent them from activating transcription. Therefore, OD
mutations have a dominant negative effect on the function of p53.
SDS-PAGE analysis indicates that p53 is a 53-kilodalton (kDa) protein. However, the actual mass of the full-length p53 protein (p53α) based on the sum of masses of the amino acid residues is only 43.7 kDa. This difference is due to the high number of proline residues in the protein, which slow its migration on SDS-PAGE, thus making it appear heavier than it actually is.
Isoforms
As with 95% of human genes, TP53 encodes more than one protein. All these p53 proteins are called the p53 isoforms. These proteins range in size from 3.5 to 43.7 kDa. Several isoforms
were discovered in 2005, and so far 12 human p53 isoforms have been
identified (p53α, p53β, p53γ, ∆40p53α, ∆40p53β, ∆40p53γ, ∆133p53α,
∆133p53β, ∆133p53γ, ∆160p53α, ∆160p53β, ∆160p53γ). Furthermore, p53
isoforms are expressed in a tissue dependent manner and p53α is never
expressed alone.
The full length p53 isoform proteins can be subdivided into different protein domains. Starting from the N-terminus,
there are first the amino-terminal transcription-activation domains
(TAD 1, TAD 2), which are needed to induce a subset of p53 target genes.
This domain is followed by the proline rich domain (PXXP), whereby the
motif PXXP is repeated (P is a proline and X can be any amino acid). It
is required among others for p53 mediated apoptosis.
Some isoforms lack the proline rich domain, such as Δ133p53β,γ and
Δ160p53α,β,γ; hence some isoforms of p53 are not mediating apoptosis,
emphasizing the diversifying roles of the TP53 gene. Afterwards there is the DNA binding domain (DBD), which enables the proteins to sequence specific binding. The C-terminus domain completes the protein. It includes the nuclear localization signal (NLS), the nuclear export signal
(NES) and the oligomerisation domain (OD). The NLS and NES are
responsible for the subcellular regulation of p53. Through the OD, p53
can form a tetramer and then bind to DNA. Among the isoforms, some
domains can be missing, but all of them share most of the highly
conserved DNA-binding domain.
The isoforms are formed by different mechanisms. The beta and the
gamma isoforms are generated by multiple splicing of intron 9, which
leads to a different C-terminus. Furthermore, the usage of an internal
promoter in intron 4 causes the ∆133 and ∆160 isoforms, which lack the
TAD domain and a part of the DBD. Moreover, alternative initiation of
translation at codon 40 or 160 bear the ∆40p53 and ∆160p53 isoforms.
Due to the isoformic nature of p53 proteins, there have been several sources of evidence showing that mutations within the TP53
gene giving rise to mutated isoforms are causative agents of various
cancer phenotypes, from mild to severe, due to single mutation in the TP53 gene (refer to section Experimental analysis of p53 mutations for more details).
The nuclear envelope consists of two lipid bilayer membranes: an inner nuclear membrane and an outer nuclear membrane. The space between the membranes is called the perinuclear space. It is usually about 10–50 nm wide. The outer nuclear membrane is continuous with the endoplasmic reticulum membrane. The nuclear envelope has many nuclear pores that allow materials to move between the cytosol and the nucleus. Intermediate filament proteins called lamins form a structure called the nuclear lamina on the inner aspect of the inner nuclear membrane and give structural support to the nucleus.
Structure
The
nuclear envelope is made up of two lipid bilayer membranes, an inner
nuclear membrane and an outer nuclear membrane. These membranes are
connected to each other by nuclear pores. Two sets of intermediate
filaments provide support for the nuclear envelope. An internal network
forms the nuclear lamina on the inner nuclear membrane. A looser network forms outside to give external support. The actual shape of the nuclear envelope is irregular. It has invaginations and protrusions and can be observed with an electron microscope.
Outer membrane
The outer nuclear membrane also shares a common border with the endoplasmic reticulum.
While it is physically linked, the outer nuclear membrane contains
proteins found in far higher concentrations than the endoplasmic
reticulum. All four nesprin proteins (nuclear envelope spectrin repeat proteins) present in mammals are expressed in the outer nuclear membrane. Nesprin proteins connect cytoskeletal filaments to the nucleoskeleton. Nesprin-mediated connections to the cytoskeleton contribute to nuclear positioning and to the cell’s mechanosensory function. KASH domain proteins of Nesprin-1 and -2 are part of a LINC complex (linker of nucleoskeleton and cytoskeleton) and can bind directly to cystoskeletal components, such as actin filaments, or can bind to proteins in the perinuclear space. Nesprin-3 and -4 may play a role in unloading enormous cargo; Nesprin-3 proteins bind plectin and link the nuclear envelope to cytoplasmic intermediate filaments. Nesprin-4 proteins bind the plus end directed motor kinesin-1. The outer nuclear membrane is also involved in development, as it fuses with the inner nuclear membrane to form nuclear pores.
The inner nuclear membrane encloses the nucleoplasm, and is covered by the nuclear lamina, a mesh of intermediate filaments which stabilizes the nuclear membrane as well as being involved in chromatin function. It is connected to the outer membrane by nuclear pores
which penetrate the membranes. While the two membranes and the
endoplasmic reticulum are linked, proteins embedded in the membranes
tend to stay put rather than dispersing across the continuum. It is lined with a fiber network called the nuclear lamina which is 10-40 nm thick and provides strength.
The nuclear envelope is punctured by around a thousand nuclear pore complexes, about 100 nm across, with an inner channel about 40 nm wide. The complexes contain a number of nucleoporins, proteins that link the inner and outer nuclear membranes.
Cell division
During the G2 phase of interphase, the nuclear membrane increases its surface area and doubles its number of nuclear pore complexes.
In eukaryotes such as yeast which undergo closed mitosis, the nuclear membrane stays intact during cell division. The spindle fibers either form within the membrane, or penetrate it without tearing it apart.
In other eukaryotes (animals as well as plants), the nuclear membrane must break down during the prometaphase stage of mitosis to allow the mitotic spindle fibers to access the chromosomes inside. The breakdown and reformation processes are not well understood.
Breakdown
In mammals, the nuclear membrane can break down within minutes, following a set of steps during the early stages of mitosis.
First, M-Cdk's phosphorylate nucleoporinpolypeptides
and they are selectively removed from the nuclear pore complexes. After
that, the rest of the nuclear pore complexes break apart
simultaneously. Biochemical evidence suggests that the nuclear pore
complexes disassemble into stable pieces rather than disintegrating into
small polypeptide fragments.
M-Cdk's also phosphorylate elements of the nuclear lamina (the
framework that supports the envelope) leading to the disassembly of the
lamina and hence the envelope membranes into small vesicles.
Electron and fluorescence microscopy
has given strong evidence that the nuclear membrane is absorbed by the
endoplasmic reticulum—nuclear proteins not normally found in the
endoplasmic reticulum show up during mitosis.
In addition to the breakdown of the nuclear membrane during the prometaphase stage of mitosis, the nuclear membrane also ruptures in migrating mammalian cells during the interphase stage of the cell cycle.[20]
This transient rupture is likely caused by nuclear deformation. The
rupture is rapidly repaired by a process dependent on "endosomal sorting
complexes required for transport" (ESCRT) made up of cytosolic protein complexes.[20]
During nuclear membrane rupture events, DNA double-strand breaks occur.
Thus the survival of cells migrating through confined environments
appears to depend on efficient nuclear envelope and DNA repair machineries.
Aberrant nuclear envelope breakdown has also been observed in
laminopathies and in cancer cells leading to mislocalization of cellular
proteins, the formation of micronuclei and genomic instability.
Reformation
Exactly how the nuclear membrane reforms during telophase of mitosis is debated. Two theories exist—
Vesicle fusion — where vesicles of nuclear membrane fuse together to rebuild the nuclear membrane
Re-shaping of the endoplasmic reticulum—where the parts of the
endoplasmic reticulum containing the absorbed nuclear membrane envelop
the nuclear space, reforming a closed membrane.
Origin of the nuclear membrane
A study of the comparative genomics, evolution and origins of the nuclear membrane led to the proposal that the nucleus emerged in the primitive eukaryotic ancestor (the “prekaryote”), and was triggered by the archaeo-bacterial symbiosis. Several ideas have been proposed for the evolutionary origin of the nuclear membrane.
These ideas include the invagination of the plasma membrane in a
prokaryote ancestor, or the formation of a genuine new membrane system
following the establishment of proto-mitochondria
in the archaeal host. The adaptive function of the nuclear membrane
may have been to serve as a barrier to protect the genome from reactive oxygen species (ROS) produced by the cells' pre-mitochondria.
In genetics, paternal mtDNA transmission and paternal mtDNA inheritance refer to the incidence of mitochondrial DNA (mtDNA) being passed from a father
to his offspring. Paternal mtDNA inheritance is observed in a small
proportion of species; in general, mtDNA is passed unchanged from a
mother to her offspring, making it an example of non-Mendelian inheritance. In contrast, mtDNA transmission from both parents occurs regularly in certain bivalves.
In animals
Paternal mtDNA inheritance in animals varies. For example, in Mytilidae mussels, paternal mtDNA "is transmitted through the sperm and establishes itself only in the male gonad." In testing 172 sheep, "The Mitochondrial DNA from three lambs in two half-sib families were found to show paternal inheritance." An instance of paternal leakage resulted in a study on chickens. There has been evidences that paternal leakage is an integral part of mitochondrial inheritance of Drosophila simulans.
In humans
In human mitochondrial genetics,
there is debate over whether or not paternal mtDNA transmission is
possible. Many studies hold that paternal mtDNA is never transmitted to
offspring. This thought is central to mtDNA genealogical DNA testing and to the theory of mitochondrial Eve. The fact that mitochondrial DNA is maternally inherited enables researchers to trace maternal lineage far back in time. Y chromosomal DNA, paternally inherited, is used in an analogous way to trace the agnate lineage.
Since the father's mtDNA is located
in the sperm midpiece (the mitochondrial sheath), which is lost at
fertilization, all children of the same mother are hemizygous for
maternal mtDNA and are thus identical to each other and to their mother.
Because of its cytoplasmic location in eukaryotes, mtDNA does not undergo meiosis
and there is normally no crossing-over, hence there is no opportunity
for introgression of the father's mtDNA. All mtDNA is thus inherited
maternally; mtDNA has been used to infer the pedigree of the well-known
"mitochondrial Eve."
In sexual reproduction, paternal mitochondria found in the sperm
are actively decomposed, thus preventing "paternal leakage".
Mitochondria in mammalian sperm are usually destroyed by the egg cell
after fertilization. In 1999 it was reported that paternal sperm
mitochondria (containing mtDNA) are marked with ubiquitin to select them for later destruction inside the embryo. Some in vitro fertilization (IVF) techniques, particularly intracytoplasmic sperm injection (ICSI) of a sperm into an oocyte, may interfere with this.
It is now understood that the tail of the sperm, which contains additional mtDNA, may also enter the egg. This had led to increased controversy about the fate of paternal mtDNA.
Over the last 5 years, there has
been considerable debate as to whether there is recombination in human
mitochondrial DNA (mtDNA) (for references, see Piganeau and Eyre-Walker,
2004). That debate appears to have finally come to an end with the
publication of some direct evidence of recombination. Schwartz and
Vissing (2002) presented the case of a 28-year-old man who had both
maternal and paternally derived mtDNA in his muscle tissue – in all his
other tissues he had only maternally derived mtDNA. It was the first
time that paternal leakage and, consequently, heteroplasmy was observed
in human mtDNA. In a recent paper, Kraytsberg et al (2004) take this
observation one step further, and claim to show that there has been
recombination between the maternal and paternal mtDNA in this
individual.
Some sources state that so little paternal mtDNA is transmitted as to
be negligible ("At most, one presumes it must be less than 1 in 1000,
since there are 100 000 mitochondria in the human egg and only 100 in
the sperm (Satoh and Kuroiwa, 1991).")
or that paternal mtDNA is so rarely transmitted as to be negligible
("Nevertheless, studies have established that paternal mtDNA is so
rarely transmitted to offspring that mtDNA analyses remain valid..."). A few studies indicate that, very rarely, a small portion of a person's mitochondria can be inherited from the father.
The controversy about human paternal leakage was summed up in the 1996 study Misconceptions about mitochondria and mammalian fertilization: Implications for theories on human evolution, which was peer-reviewed and printed in Proceedings of the National Academy of Sciences. According to the study's abstract:
In vertebrates, inheritance of
mitochondria is thought to be predominantly maternal, and mitochondrial
DNA analysis has become a standard taxonomic tool. In accordance with
the prevailing view of strict maternal inheritance, many sources assert
that during fertilization, the sperm tail, with its mitochondria, gets
excluded from the embryo. This is incorrect. In the majority of
mammals—including humans—the midpiece mitochondria can be identified in
the embryo even though their ultimate fate is unknown. The "missing
mitochondria" story seems to have survived—and proliferated—unchallenged
in a time of contention between hypotheses of human origins, because it
supports the "African Eve" model of recent radiation of Homo sapiens
out of Africa.
The mixing of maternal and paternal mtDNA was thought to have been found in chimpanzees in 1999 and in humans in 1999
and 2018. This last finding is significant, as biparental mtDNA was
observed in subsequent generations in three different families leading
to the conclusion that, although the maternal transmission dogma remains
strong, there is evidence that paternal transmission does exist and
there is a probably a mechanism which, if elucidated, can be a new tool
in the reproductive field (e.g. avoiding mitochondrial replacement
therapy, and just using this mechanism so that the offspring inherit the
paternal mitochondria).
However, there has been only a single documented case among humans in
which as much as 90% of a single tissue type's mitochondria was
inherited through paternal transmission.
According to the 2005 study More evidence for non-maternal inheritance of mitochondrial DNA?, heteroplasmy
is a "newly discovered form of inheritance for mtDNA. Heteroplasmy
introduces slight statistical uncertainty in normal inheritance
patterns."
Heteroplasmy may result from a mutation during development which is
propagated to only a subset of the adult cells, or may occur when two
slightly different mitochondrial sequences are inherited from the mother
as a result of several hundred mitochondria being present in the ovum.
However, the 2005 study states:
Multiple types (or recombinant
types) of quite dissimilar mitochondrial DNA from different parts of the
known mtDNA phylogeny are often reported in single individuals. From
re-analyses and corrigenda of forensic mtDNA data, it is apparent that
the phenomenon of mosaic or mixed mtDNA can be ascribed solely to contamination and sample mix up.
A study published in PNAS in 2018 titled Biparental Inheritance of Mitochondrial DNA in Humans
has found paternal mtDNA in 17 individuals from three unrelated
multigeneration families with a high level of mtDNA heteroplasmy
(ranging from 24 to 76%) in a total of 17 individuals.
A comprehensive exploration of
mtDNA segregation in these families shows biparental mtDNA transmission
with an autosomal dominantlike inheritance mode. Our results suggest
that, although the central dogma of maternal inheritance of mtDNA
remains valid, there are some exceptional cases where paternal mtDNA
could be passed to the offspring.
In protozoa
Some organisms, such as Cryptosporidium, have mitochondria with no DNA whatsoever.
In plants, it has also been reported that mitochondria can occasionally be inherited from the father, e.g. in bananas. Some Conifers also show paternal inheritance of mitochondria, such as the coast redwood, Sequoia sempervirens.