A Medley of Potpourri

Sunday, May 6, 2018

Transcription factor

From Wikipedia, the free encyclopedia

Transcription factor glossary
gene expression – the process by which information from a gene is used in the synthesis of a functional gene product such as a protein transcription – the process of making messenger RNA (mRNA) from a DNA template by RNA polymerase transcription factor – a protein that binds to DNA and regulates gene expression by promoting or suppressing transcription transcriptional regulation – controlling the rate of gene transcription for example by helping or hindering RNA polymerase binding to DNA upregulation, activation, or promotion – increase the rate of gene transcription downregulation, repression, or suppression – decrease the rate of gene transcription coactivator – a protein that works with transcription factors to increase the rate of gene transcription corepressor – a protein that works with transcription factors to decrease the rate of gene transcription response element – a specific sequence of DNA that a transcription factor binds to

Illustration of an activator

In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence.^[1]^[2] The function of TFs is to regulate - turn on and off - genes in order to make sure that they are expressed in the right cell at the right time and in the right amount throughout the life of the cell and the organism. Groups of TFs function in a coordinated fashion to direct cell division, cell growth, and cell death throughout life; cell migration and organization (body plan) during embryonic development; and intermittently in response to signals from outside the cell, such as a hormone. There are up to 2600 TFs in the human genome.

TFs work alone or with other proteins in a complex, by promoting (as an activator), or blocking (as a repressor) the recruitment of RNA polymerase (the enzyme that performs the transcription of genetic information from DNA to RNA) to specific genes.^[3]^[4]^[5]

A defining feature of TFs is that they contain at least one DNA-binding domain (DBD), which attaches to a specific sequence of DNA adjacent to the genes that they regulate.^[6]^[7] TFs are grouped into classes based on their DBDs.^[8]^[9] Other proteins such as coactivators, chromatin remodelers, histone acetyltransferases, histone deacetylases, kinases, and methylases are also essential to gene regulation, but lack DNA-binding domains, and therefore are not TFs.^[10]

TFs are of interest in medicine because TF mutations can cause specific diseases, and medications can be potentially targeted toward them.

Number

Transcription factors are essential for the regulation of gene expression and are, as a consequence, found in all living organisms. The number of transcription factors found within an organism increases with genome size, and larger genomes tend to have more transcription factors per gene.^[11]

There are approximately 2600 proteins in the human genome that contain DNA-binding domains, and most of these are presumed to function as transcription factors,^[12] though other studies indicate it to be a smaller number.^[13] Therefore, approximately 10% of genes in the genome code for transcription factors, which makes this family the single largest family of human proteins. Furthermore, genes are often flanked by several binding sites for distinct transcription factors, and efficient expression of each of these genes requires the cooperative action of several different transcription factors (see, for example, hepatocyte nuclear factors). Hence, the combinatorial use of a subset of the approximately 2000 human transcription factors easily accounts for the unique regulation of each gene in the human genome during development.^[10]

Mechanism

Transcription factors bind to either enhancer or promoter regions of DNA adjacent to the genes that they regulate. Depending on the transcription factor, the transcription of the adjacent gene is either up- or down-regulated. Transcription factors use a variety of mechanisms for the regulation of gene expression.^[14] These mechanisms include:

stabilize or block the binding of RNA polymerase to DNA
catalyze the acetylation or deacetylation of histone proteins. The transcription factor can either do this directly or recruit other proteins with this catalytic activity. Many transcription factors use one or the other of two opposing mechanisms to regulate transcription:^[15]
- histone acetyltransferase (HAT) activity – acetylates histone proteins, which weakens the association of DNA with histones, which make the DNA more accessible to transcription, thereby up-regulating transcription
- histone deacetylase (HDAC) activity – deacetylates histone proteins, which strengthens the association of DNA with histones, which make the DNA less accessible to transcription, thereby down-regulating transcription
recruit coactivator or corepressor proteins to the transcription factor DNA complex^[16]

Function

Transcription factors are one of the groups of proteins that read and interpret the genetic "blueprint" in the DNA. They bind to the DNA and help initiate a program of increased or decreased gene transcription. As such, they are vital for many important cellular processes. Below are some of the important functions and biological roles transcription factors are involved in:

Basal transcription regulation

In eukaryotes, an important class of transcription factors called general transcription factors (GTFs) are necessary for transcription to occur.^[17]^[18]^[19] Many of these GTFs do not actually bind DNA, but rather are part of the large transcription preinitiation complex that interacts with RNA polymerase directly. The most common GTFs are TFIIA, TFIIB, TFIID (see also TATA binding protein), TFIIE, TFIIF, and TFIIH.^[20] The preinitiation complex binds to promoter regions of DNA upstream to the gene that they regulate.

Differential enhancement of transcription

Other transcription factors differentially regulate the expression of various genes by binding to enhancer regions of DNA adjacent to regulated genes. These transcription factors are critical to making sure that genes are expressed in the right cell at the right time and in the right amount, depending on the changing requirements of the organism.

Development

Many transcription factors in multicellular organisms are involved in development.^[21] Responding to stimuli, these transcription factors turn on/off the transcription of the appropriate genes, which, in turn, allows for changes in cell morphology or activities needed for cell fate determination and cellular differentiation. The Hox transcription factor family, for example, is important for proper body pattern formation in organisms as diverse as fruit flies to humans.^[22]^[23] Another example is the transcription factor encoded by the Sex-determining Region Y (SRY) gene, which plays a major role in determining sex in humans.^[24]

Response to intercellular signals

Cells can communicate with each other by releasing molecules that produce signaling cascades within another receptive cell. If the signal requires upregulation or downregulation of genes in the recipient cell, often transcription factors will be downstream in the signaling cascade.^[25] Estrogen signaling is an example of a fairly short signaling cascade that involves the estrogen receptor transcription factor: Estrogen is secreted by tissues such as the ovaries and placenta, crosses the cell membrane of the recipient cell, and is bound by the estrogen receptor in the cell's cytoplasm. The estrogen receptor then goes to the cell's nucleus and binds to its DNA-binding sites, changing the transcriptional regulation of the associated genes.^[26]

Response to environment

Not only do transcription factors act downstream of signaling cascades related to biological stimuli but they can also be downstream of signaling cascades involved in environmental stimuli. Examples include heat shock factor (HSF), which upregulates genes necessary for survival at higher temperatures,^[27] hypoxia inducible factor (HIF), which upregulates genes necessary for cell survival in low-oxygen environments,^[28] and sterol regulatory element binding protein (SREBP), which helps maintain proper lipid levels in the cell.^[29]

Cell cycle control

Many transcription factors, especially some that are proto-oncogenes or tumor suppressors, help regulate the cell cycle and as such determine how large a cell will get and when it can divide into two daughter cells.^[30]^[31] One example is the Myc oncogene, which has important roles in cell growth and apoptosis.^[32]

Pathogenesis

Transcription factors can also be used to alter gene expression in a host cell to promote pathogenesis. A well studied example of this are the transcription-activator like effectors (TAL effectors) secreted by Xanthomonas bacteria. When injected into plants, these proteins can enter the nucleus of the plant cell, bind plant promoter sequences, and activate transcription of plant genes that aid in bacterial infection.^[33] TAL effectors contain a central repeat region in which there is a simple relationship between the identity of two critical residues in sequential repeats and sequential DNA bases in the TAL effector’s target site.^[34]^[35] This property likely makes it easier for these proteins to evolve in order to better compete with the defense mechanisms of the host cell.^[36]

Regulation

It is common in biology for important processes to have multiple layers of regulation and control. This is also true with transcription factors: Not only do transcription factors control the rates of transcription to regulate the amounts of gene products (RNA and protein) available to the cell but transcription factors themselves are regulated (often by other transcription factors). Below is a brief synopsis of some of the ways that the activity of transcription factors can be regulated:

Synthesis

Transcription factors (like all proteins) are transcribed from a gene on a chromosome into RNA, and then the RNA is translated into protein. Any of these steps can be regulated to affect the production (and thus activity) of a transcription factor. An implication of this is that transcription factors can regulate themselves. For example, in a negative feedback loop, the transcription factor acts as its own repressor: If the transcription factor protein binds the DNA of its own gene, it down-regulates the production of more of itself. This is one mechanism to maintain low levels of a transcription factor in a cell.^[37]

Nuclear localization

In eukaryotes, transcription factors (like most proteins) are transcribed in the nucleus but are then translated in the cell's cytoplasm. Many proteins that are active in the nucleus contain nuclear localization signals that direct them to the nucleus. But, for many transcription factors, this is a key point in their regulation.^[38] Important classes of transcription factors such as some nuclear receptors must first bind a ligand while in the cytoplasm before they can relocate to the nucleus.^[38]

Activation

Transcription factors may be activated (or deactivated) through their signal-sensing domain by a number of mechanisms including:

ligand binding – Not only is ligand binding able to influence where a transcription factor is located within a cell but ligand binding can also affect whether the transcription factor is in an active state and capable of binding DNA or other cofactors (see, for example, nuclear receptors).
phosphorylation^[39]^[40] – Many transcription factors such as STAT proteins must be phosphorylated before they can bind DNA.
interaction with other transcription factors (e.g., homo- or hetero-dimerization) or coregulatory proteins

Accessibility of DNA-binding site

In eukaryotes, DNA is organized with the help of histones into compact particles called nucleosomes, where sequences of about 147 DNA base pairs make ~1.65 turns around histone protein octamers. DNA within nucleosomes is inaccessible to many transcription factors. Some transcription factors, so-called pioneering factors are still able to bind their DNA binding sites on the nucleosomal DNA. For most other transcription factors, the nucleosome should be actively unwound by molecular motors such as chromatin remodelers.^[41] Alternatively, the nucleosome can be partially unwrapped by thermal fluctuations, allowing temporary access to the transcription factor binding site. In many cases, a transcription factor needs to compete for binding to its DNA binding site with other transcription factors and histones or non-histone chromatin proteins.^[42] Pairs of transcription factors and other proteins can play antagonistic roles (activator versus repressor) in the regulation of the same gene.

Availability of other cofactors/transcription factors

Most transcription factors do not work alone. Many large TF families form complex homotypic or heterotypic interactions through dimerization.^[43] For gene transcription to occur, a number of transcription factors must bind to DNA regulatory sequences. This collection of transcription factors, in turn, recruit intermediary proteins such as cofactors that allow efficient recruitment of the preinitiation complex and RNA polymerase. Thus, for a single transcription factor to initiate transcription, all of these other proteins must also be present, and the transcription factor must be in a state where it can bind to them if necessary. Cofactors are proteins that modulate the effects of transcription factors. Cofactors are interchangeable between specific gene promoters; the protein complex that occupies the promoter DNA and the amino acid sequence of the cofactor determine its spatial conformation. For example, certain steroid receptors can exchange cofactors with NF-κB, which is a switch between inflammation and cellular differentiation; thereby steroids can affect the inflammatory response and function of certain tissues.^[44]

Structure

Schematic diagram of the amino acid sequence (amino terminus to the left and carboxylic acid terminus to the right) of a prototypical transcription factor that contains (1) a DNA-binding domain (DBD), (2) signal-sensing domain (SSD), and a transactivation domain (TAD). The order of placement and the number of domains may differ in various types of transcription factors. In addition, the transactivation and signal-sensing functions are frequently contained within the same domain.

Transcription factors are modular in structure and contain the following domains:^[1]

DNA-binding domain (DBD), which attaches to specific sequences of DNA (enhancer or promoter. Necessary component for all vectors. Used to drive transcription of the vector's transgene promoter sequences) adjacent to regulated genes. DNA sequences that bind transcription factors are often referred to as response elements.
Trans-activating domain (TAD), which contains binding sites for other proteins such as transcription coregulators. These binding sites are frequently referred to as activation functions (AFs).^[45]
An optional signal-sensing domain (SSD) (e.g., a ligand binding domain), which senses external signals and, in response, transmits these signals to the rest of the transcription complex, resulting in up- or down-regulation of gene expression. Also, the DBD and signal-sensing domains may reside on separate proteins that associate within the transcription complex to regulate gene expression.

Trans-activating domain

TAD is domain of the transcription factor that binds other proteins such as transcription coregulators. Proteins containing TADs are Gal4, Gcn4, Oaf1, Leu3, Rtg3, Pho4, Gln3 in yeast and p53, NFAT, NF-κB and VP16 in mammals.^[46] Many TADs are as short as 9 amino acids (present in e.g., p53, VP16, MLL, E2A, HSF1, NF-IL6, NFAT1 and NF-κB Gal4, Pdr1, Oaf1, Gcn4, VP16, Pho4, Msn2, Ino2 and P201).

DNA-binding domain

Domain architecture example: Lactose Repressor (LacI). The N-terminal DNA binding domain (labeled) of the lac repressor binds its target DNA sequence (gold) in the major groove using a helix-turn-helix motif. Effector molecule binding (green) occurs in the core domain (labeled), a signal sensing domain. This triggers an allosteric response mediated by the linker region (labeled).

The portion (domain) of the transcription factor that binds DNA is called its DNA-binding domain. Below is a partial list of some of the major families of DNA-binding domains/transcription factors:

Family	InterPro	Pfam	SCOP
basic helix-loop-helix^[47]	InterPro: IPR001092	Pfam PF00010	SCOP 47460
basic-leucine zipper (bZIP)^[48]	InterPro: IPR004827	Pfam PF00170	SCOP 57959
C-terminal effector domain of the bipartite response regulators	InterPro: IPR001789	Pfam PF00072	SCOP 46894
AP2/ERF/GCC box	InterPro: IPR001471	Pfam PF00847	SCOP 54176
helix-turn-helix^[49]
homeodomain proteins, which are encoded by homeobox genes, are transcription factors. Homeodomain proteins play critical roles in the regulation of development.^[50]^[51]	InterPro: IPR009057	Pfam PF00046	SCOP 46689
lambda repressor-like	InterPro: IPR010982		SCOP 47413
srf-like (serum response factor)	InterPro: IPR002100	Pfam PF00319	SCOP 55455
paired box^[52]
winged helix	InterPro: IPR013196	Pfam PF08279	SCOP 46785
zinc fingers^[53]
* multi-domain Cys₂His₂ zinc fingers^[54]	InterPro: IPR007087	Pfam PF00096	SCOP 57667
* Zn₂/Cys₆			SCOP 57701
* Zn₂/Cys₈ nuclear receptor zinc finger	InterPro: IPR001628	Pfam PF00105	SCOP 57716

Response elements

The DNA sequence that a transcription factor binds to is called a transcription factor-binding site or response element.^[55]

Transcription factors interact with their binding sites using a combination of electrostatic (of which hydrogen bonds are a special case) and Van der Waals forces. Due to the nature of these chemical interactions, most transcription factors bind DNA in a sequence specific manner. However, not all bases in the transcription factor-binding site may actually interact with the transcription factor. In addition, some of these interactions may be weaker than others. Thus, transcription factors do not bind just one sequence but are capable of binding a subset of closely related sequences, each with a different strength of interaction.

For example, although the consensus binding site for the TATA-binding protein (TBP) is TATAAAA, the TBP transcription factor can also bind similar sequences such as TATATAT or TATATAA.

Because transcription factors can bind a set of related sequences and these sequences tend to be short, potential transcription factor binding sites can occur by chance if the DNA sequence is long enough. It is unlikely, however, that a transcription factor will bind all compatible sequences in the genome of the cell. Other constraints, such as DNA accessibility in the cell or availability of cofactors may also help dictate where a transcription factor will actually bind. Thus, given the genome sequence it is still difficult to predict where a transcription factor will actually bind in a living cell.

Additional recognition specificity, however, may be obtained through the use of more than one DNA-binding domain (for example tandem DBDs in the same transcription factor or through dimerization of two transcription factors) that bind to two or more adjacent sequences of DNA.

Clinical significance

Transcription factors are of clinical significance for at least two reasons: (1) mutations can be associated with specific diseases, and (2) they can be targets of medications.

Disorders

Due to their important roles in development, intercellular signaling, and cell cycle, some human diseases have been associated with mutations in transcription factors.^[56]

Many transcription factors are either tumor suppressors or oncogenes, and, thus, mutations or aberrant regulation of them is associated with cancer. Three groups of transcription factors are known to be important in human cancer: (1) the NF-kappaB and AP-1 families, (2) the STAT family and (3) the steroid receptors.^[57]

Below are a few of the more well-studied examples:

Condition	Description	Locus
Rett syndrome	Mutations in the MECP2 transcription factor are associated with Rett syndrome, a neurodevelopmental disorder.^[58]^[59]	Xq28
Diabetes	A rare form of diabetes called MODY (Maturity onset diabetes of the young) can be caused by mutations in hepatocyte nuclear factors (HNFs)^[60] or insulin promoter factor-1 (IPF1/Pdx1).^[61]	multiple
Developmental verbal dyspraxia	Mutations in the FOXP2 transcription factor are associated with developmental verbal dyspraxia, a disease in which individuals are unable to produce the finely coordinated movements required for speech.^[62]	7q31
Autoimmune diseases	Mutations in the FOXP3 transcription factor cause a rare form of autoimmune disease called IPEX.^[63]	Xp11.23-q13.3
Li-Fraumeni syndrome	Caused by mutations in the tumor suppressor p53.^[64]	17p13.1
Breast cancer	The STAT family is relevant to breast cancer.^[65]	multiple
Multiple cancers	The HOX family are involved in a variety of cancers.^[66]	multiple

Potential drug targets

Approximately 10% of currently prescribed drugs directly target the nuclear receptor class of transcription factors.^[67] Examples include tamoxifen and bicalutamide for the treatment of breast and prostate cancer, respectively, and various types of anti-inflammatory and anabolic steroids.^[68] In addition, transcription factors are often indirectly modulated by drugs through signaling cascades. It might be possible to directly target other less-explored transcription factors such as NF-κB with drugs.^[69]^[70]^[71]^[72] Transcription factors outside the nuclear receptor family are thought to be more difficult to target with small molecule therapeutics since it is not clear that they are "drugable" but progress has been made on Pax2^[73] ^[74] and the notch pathway.^[75]

Role in evolution

Gene duplications have played a crucial role in the evolution of species. This applies particularly to transcription factors. Once they occur as duplicates, accumulated mutations encoding for one copy can take place without negatively affecting the regulation of downstream targets. However, changes of the DNA binding specificities of the single-copy LEAFY transcription factor, which occurs in most land plants, have recently been elucidated. In that respect, a single-copy transcription factor can undergo a change of specificity through a promiscuous intermediate without losing function. Similar mechanisms have been proposed in the context of all alternative phylogenetic hypotheses, and the role of transcription factors in the evolution of all species.^[76]^[77]

Analysis

There are different technologies available to analyze transcription factors. On the genomic level, DNA-sequencing^[78] and database research are commonly used^[79] The protein version of the transcription factor is detectable by using specific antibodies. The sample is detected on a western blot. By using electrophoretic mobility shift assay (EMSA),^[80] the activation profile of transcription factors can be detected. A multiplex approach for activation profiling is a TF chip system where several different transcription factors can be detected in parallel. This technology is based on DNA microarrays, providing the specific DNA-binding sequence for the transcription factor protein on the array surface.^[81]

Classes

As described in more detail below, transcription factors may be classified by their (1) mechanism of action, (2) regulatory function, or (3) sequence homology (and hence structural similarity) in their DNA-binding domains.

Mechanistic

There are two mechanistic classes of transcription factors:

General transcription factors are involved in the formation of a preinitiation complex. The most common are abbreviated as TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH. They are ubiquitous and interact with the core promoter region surrounding the transcription start site(s) of all class II genes.^[82]
Upstream transcription factors are proteins that bind somewhere upstream of the initiation site to stimulate or repress transcription. These are roughly synonymous with specific transcription factors, because they vary considerably depending on what recognition sequences are present in the proximity of the gene.^[83]

Examples of specific transcription factors^[83]
Factor	Structural type	Recognition sequence	Binds as
SP1	Zinc finger	5'-GGGCGG-3'	Monomer
AP-1	Basic zipper	5'-TGA(G/C)TCA-3'	Dimer
C/EBP	Basic zipper	5'-ATTGCGCAAT-3'	Dimer
Heat shock factor	Basic zipper	5'-XGAAX-3'	Trimer
ATF/CREB	Basic zipper	5'-TGACGTCA-3'	Dimer
c-Myc	Basic helix-loop-helix	5'-CACGTG-3'	Dimer
Oct-1	Helix-turn-helix	5'-ATGCAAAT-3'	Monomer
NF-1	Novel	5'-TTGGCXXXXXGCCAA-3'	Dimer
(G/C) = G or C X = A, T, G or C

Functional

Transcription factors have been classified according to their regulatory function:^[10]

I. constitutively active – present in all cells at all times – general transcription factors, Sp1, NF1, CCAAT
II. conditionally active – requires activation
- II.A developmental (cell specific) – expression is tightly controlled, but, once expressed, require no additional activation – GATA, HNF, PIT-1, MyoD, Myf5, Hox, Winged Helix
- II.B signal-dependent – requires external signal for activation
  - II.B.1 extracellular ligand (endocrine or paracrine)-dependent – nuclear receptors
  - II.B.2 intracellular ligand (autocrine)-dependent - activated by small intracellular molecules – SREBP, p53, orphan nuclear receptors
  - II.B.3 cell membrane receptor-dependent – second messenger signaling cascades resulting in the phosphorylation of the transcription factor
    - II.B.3.a resident nuclear factors – reside in the nucleus regardless of activation state – CREB, AP-1, Mef2
    - II.B.3.b latent cytoplasmic factors – inactive form reside in the cytoplasm, but, when activated, are translocated into the nucleus – STAT, R-SMAD, NF-κB, Notch, TUBBY, NFAT

Structural

Transcription factors are often classified based on the sequence similarity and hence the tertiary structure of their DNA-binding domains:^[84]^[9]^[85]^[8]

1 Superclass: Basic Domains
- 1.1 Class: Leucine zipper factors (bZIP)
  - 1.1.1 Family: AP-1(-like) components; includes (c-Fos/c-Jun)
  - 1.1.2 Family: CREB
  - 1.1.3 Family: C/EBP-like factors
  - 1.1.4 Family: bZIP / PAR
  - 1.1.5 Family: Plant G-box binding factors
  - 1.1.6 Family: ZIP only
- 1.2 Class: Helix-loop-helix factors (bHLH)
  - 1.2.1 Family: Ubiquitous (class A) factors
  - 1.2.2 Family: Myogenic transcription factors (MyoD)
  - 1.2.3 Family: Achaete-Scute
  - 1.2.4 Family: Tal/Twist/Atonal/Hen
- 1.3 Class: Helix-loop-helix / leucine zipper factors (bHLH-ZIP)
  - 1.3.1 Family: Ubiquitous bHLH-ZIP factors; includes USF (USF1, USF2); SREBP (SREBP)
  - 1.3.2 Family: Cell-cycle controlling factors; includes c-Myc
- 1.4 Class: NF-1
  - 1.4.1 Family: NF-1 (A, B, C, X)
- 1.5 Class: RF-X
  - 1.5.1 Family: RF-X (1, 2, 3, 4, 5, ANK)
- 1.6 Class: bHSH
2 Superclass: Zinc-coordinating DNA-binding domains
- 2.1 Class: Cys4 zinc finger of nuclear receptor type
  - 2.1.1 Family: Steroid hormone receptors
  - 2.1.2 Family: Thyroid hormone receptor-like factors
- 2.2 Class: diverse Cys4 zinc fingers
  - 2.2.1 Family: GATA-Factors
- 2.3 Class: Cys2His2 zinc finger domain
  - 2.3.1 Family: Ubiquitous factors, includes TFIIIA, Sp1
  - 2.3.2 Family: Developmental / cell cycle regulators; includes Krüppel
  - 2.3.4 Family: Large factors with NF-6B-like binding properties
- 2.4 Class: Cys6 cysteine-zinc cluster
- 2.5 Class: Zinc fingers of alternating composition
3 Superclass: Helix-turn-helix
- 3.1 Class: Homeo domain
  - 3.1.1 Family: Homeo domain only; includes Ubx
  - 3.1.2 Family: POU domain factors; includes Oct
  - 3.1.3 Family: Homeo domain with LIM region
  - 3.1.4 Family: homeo domain plus zinc finger motifs
- 3.2 Class: Paired box
  - 3.2.1 Family: Paired plus homeo domain
  - 3.2.2 Family: Paired domain only
- 3.3 Class: Fork head / winged helix
  - 3.3.1 Family: Developmental regulators; includes forkhead
  - 3.3.2 Family: Tissue-specific regulators
  - 3.3.3 Family: Cell-cycle controlling factors
  - 3.3.0 Family: Other regulators
- 3.4 Class: Heat Shock Factors
  - 3.4.1 Family: HSF
- 3.5 Class: Tryptophan clusters
  - 3.5.1 Family: Myb
  - 3.5.2 Family: Ets-type
  - 3.5.3 Family: Interferon regulatory factors
- 3.6 Class: TEA ( transcriptional enhancer factor) domain
  - 3.6.1 Family: TEA (TEAD1, TEAD2, TEAD3, TEAD4)
4 Superclass: beta-Scaffold Factors with Minor Groove Contacts
- 4.1 Class: RHR (Rel homology region)
  - 4.1.1 Family: Rel/ankyrin; NF-kappaB
  - 4.1.2 Family: ankyrin only
  - 4.1.3 Family: NFAT (Nuclear Factor of Activated T-cells) (NFATC1, NFATC2, NFATC3)
- 4.2 Class: STAT
  - 4.2.1 Family: STAT
- 4.3 Class: p53
  - 4.3.1 Family: p53
- 4.4 Class: MADS box
  - 4.4.1 Family: Regulators of differentiation; includes (Mef2)
  - 4.4.2 Family: Responders to external signals, SRF (serum response factor) (SRF)
  - 4.4.3 Family: Metabolic regulators (ARG80)
- 4.5 Class: beta-Barrel alpha-helix transcription factors
- 4.6 Class: TATA binding proteins
  - 4.6.1 Family: TBP
- 4.7 Class: HMG-box
  - 4.7.1 Family: SOX genes, SRY
  - 4.7.2 Family: TCF-1 (TCF1)
  - 4.7.3 Family: HMG2-related, SSRP1
  - 4.7.4 Family: UBF
  - 4.7.5 Family: MATA
- 4.8 Class: Heteromeric CCAAT factors
  - 4.8.1 Family: Heteromeric CCAAT factors
- 4.9 Class: Grainyhead
  - 4.9.1 Family: Grainyhead
- 4.10 Class: Cold-shock domain factors
  - 4.10.1 Family: csd
- 4.11 Class: Runt
  - 4.11.1 Family: Runt
0 Superclass: Other Transcription Factors
- 0.1 Class: Copper fist proteins
- 0.2 Class: HMGI(Y) (HMGA1)
  - 0.2.1 Family: HMGI(Y)
- 0.3 Class: Pocket domain
- 0.4 Class: E1A-like factors
- 0.5 Class: AP2/EREBP-related factors
  - 0.5.1 Family: AP2
  - 0.5.2 Family: EREBP
  - 0.5.3 Superfamily: AP2/B3
    - 0.5.3.1 Family: ARF
    - 0.5.3.2 Family: ABI
    - 0.5.3.3 Family: RAV

In 1.3 Million Years, Our Solar System Will Contain Two Stars

March 25, 2018

Original link: http://www.thescinewsreporter.com/2018/03/in-13-million-years-our-solar-system.html

The Sun is used to having plenty of personal space, given that its nearest stellar neighbor, the Alpha Centauri system, is located about four light years away. While that's not very distant in cosmic terms, it's wide enough for our solar system to not be influenced by these alien stars.

But in about 1.3 million years, a star named Gliese 710, which is about 60 percent as massive as the Sun, is projected to interrupt the Sun's hermitude by crashing right on through the far-flung reaches of the solar system. While astronomers have been aware of this stellar meetup for years, new observations from the European Space Agency's Gaia satellite, released on Thursday, have constrained the trajectory of Gliese 710's impending visit, and charted out nearly 100 other upcoming close encounters with wandering stars.

According to the Gaia team, Gliese 710 will swoop through the Oort cloud, a vast shell of icy debris at the outer limits of the solar system, at a distance of roughly 90 light days, or 1.4 trillion miles. To put that into perspective, the star will be about 16,000 times farther from the Sun than Earth.

That may sound like a good stretch of space, but it is well within the boundaries of the Sun's domain. During the encounter, Gliese 710 will shine nearly three times brighter in Earth's skies than Mars. It could also spitball comets and ice worlds from the distant reaches of the solar system toward Earth, increasing the likelihood of deadly impacts.

Of course, we have over one million years to prepare for this disruptive passerby, but it's worth noting that it is far from the only star Gaia has identified as a potential trouble-maker.

Gaia, launched in 2013, has calculated the positions, magnitudes, parallaxes, and proper motions of millions of stars during its quest to create the most precise catalogue of the Milky Way's stellar population. Using this enormous dataset, scientists have plotted out the trajectories of 300,000 stars over the next five million years, and discovered that 97 of them will breach a radius of 93 trillion miles around the Sun.

Of those stars, 16 will travel within 37 trillion miles around the Sun, which is the rough distance at which stars begin to impact the solar system (though the extent to which they cause a ruckus depends on their mass and velocity).

It won't be the first time the Sun has had its personal space invaded by a stellar tourist. Only 70,000 years ago, around the time early humans were suffering from major volcano-induced endangerment, a dwarf star checked out the scene in the Oort cloud. Some scientists have even suggested that repeated encounters with nearby "death stars" are responsible for the cycle of mass extinctions on Earth, though the theory is controversial.

It goes to show that even the Sun has to deal with uninvited guests dropping by and causing mayhem. But now, thanks to Gaia, at least we can get an early heads-up to prepare for these otherworldly encounters.

DNA methylation

From Wikipedia, the free encyclopedia

Representation of a DNA molecule that is methylated. The two white spheres represent methyl groups. They are bound to two cytosine nucleotide molecules that make up the DNA sequence.

DNA methylation is a process by which methyl groups are added to the DNA molecule. Methylation can change the activity of a DNA segment without changing the sequence. When located in a gene promoter, DNA methylation typically acts to repress gene transcription. DNA methylation is essential for normal development and is associated with a number of key processes including genomic imprinting, X-chromosome inactivation, repression of transposable elements, aging and carcinogenesis.

Two of DNA's four bases, cytosine and adenine, can be methylated. Cytosine methylation is widespread in both eukaryotes and prokaryotes, even though the rate of cytosine DNA methylation can differ greatly between species: 14% of cytosines are methylated in Arabidopsis thaliana, 8% in Physarum,^[1] 4% in Mus musculus, 2.3% in Escherichia coli, 0.03% in Drosophila, 0.006% in Dictyostelium^[2] and virtually none (< 0.0002%) in Caenorhabditis^[3] or yeast species such as S. cerevisiae and S. pombe (but not N. crassa).^[4]^[5] Adenine methylation has been observed in bacterial, plant and recently in mammalian DNA,^[6]^[7] but has received considerably less attention.

Methylation of cytosine to form 5-methylcytosine occurs at the same 5 position on the pyrimidine ring where the DNA base thymine's methyl group is located; the same position distinguishes thymine from the analogous RNA base uracil, which has no methyl group. Spontaneous deamination of 5-methylcytosine converts it to thymine. This results in a T:G mismatch. Repair mechanisms then correct it back to the original C:G pair; alternatively, they may substitute G for A, turning the original C:G pair into an T:A pair, effectively changing a base and introducing a mutation. This misincorporated base will not be corrected during DNA replication as thymine is a DNA base. If the mismatch is not repaired and the cell enters the cell cycle the strand carrying the T will be complemented by an A in one of the daughter cells, such that the mutation becomes permanent. The near-universal replacement of uracil by thymine in DNA, but not RNA, may have evolved as an error-control mechanism, to facilitate the removal of uracils generated by the spontaneous deamination of cytosine.^[8] DNA methylation as well as many of its contemporary DNA methyltransferases has been thought to evolve from early world primitive RNA methylation activity and is supported by several lines of evidence.^[9]

In plants and other organisms, DNA methylation is found in three different sequence contexts: CG (or CpG), CHG or CHH (where H correspond to A, T or C). In mammals however, DNA methylation is almost exclusively found in CpG dinucleotides, with the cytosines on both strands being usually methylated. Non-CpG methylation can however be observed in embryonic stem cells,^[10]^[11]^[12] and has also been indicated in neural development.^[13] Furthermore, non-CpG methylation has also been observed in hematopoietic progenitor cells, and it occurred mainly in a CpApC sequence context.^[14]

Conserved function of DNA methylation

Typical DNA methylation landscape in mammals

The DNA methylation landscape of vertebrates is very particular compared to other organisms. In vertebrates, around 60–80% of CpG are methylated in somatic cells^[15] and DNA methylation appears as a default state that has to be specifically excluded from defined locations.^[16]^[17] By contrast, the genome of most plants, invertebrates, fungi or protists show “mosaic” methylation patterns, where only specific genomic elements are targeted, and they are characterized by the alternation of methylated and unmethylated domains.^[4]^[18]

High CpG methylation in mammalian genomes has an evolutionary cost because it increases the frequency of spontaneous mutations. Loss of amino-groups occurs with a high frequency for cytosines, with different consequences depending on their methylation. Methylated C residues spontaneously deaminate to form T residues over time; hence CpG dinucleotides steadily deaminate to TpG dinucleotides, which is evidenced by the under-representation of CpG dinucleotides in the human genome (they occur at only 21% of the expected frequency).^[19] (On the other hand, spontaneous deamination of unmethylated C residues gives rise to U residues, a change that is quickly recognized and repaired by the cell.)

CpG islands

In mammals, the only exception for this global CpG depletion resides in a specific category of GC- and CpG-rich sequences termed CpG islands that are generally unmethylated and therefore retained the expected CpG content.^[20] CpG islands are usually defined as regions with 1) a length greater than 200bp, 2) a G+C content greater than 50%, 3) a ratio of observed to expected CpG greater than 0.6, although other definitions are sometimes used.^[21] Excluding repeated sequences, there are around 25,000 CpG islands in the human genome, 75% of which being less than 850bp long.^[22] They are major regulatory units and around 50% of CpG islands are located in gene promoter regions, while another 25% lie in gene bodies, often serving as alternative promoters. Reciprocally, around 60-70% of human genes have a CpG island in their promoter region.^[23]^[24] The majority of CpG islands are constitutively unmethylated and enriched for permissive chromatin modification such as H3K4 methylation. In somatic tissues, only 10% of CpG islands are methylated, the majority of them being located in intergenic and intragenic regions.

Repression of CpG-dense promoters

DNA methylation was probably present at some extent in very early eukaryote ancestors. In virtually every organism analyzed, methylation in promoter regions correlates negatively with gene expression.^[4]^[25] CpG-dense promoters of actively transcribed genes are never methylated, but reciprocally transcriptionally silent genes do not necessarily carry a methylated promoter. In mouse and human, around 60–70% of genes have a CpG island in their promoter region and most of these CpG islands remain unmethylated independently of the transcriptional activity of the gene, in both differentiated and undifferentiated cell types.^[26]^[27] Of note, whereas DNA methylation of CpG islands is unambiguously linked with transcriptional repression, the function of DNA methylation in CG-poor promoters remains unclear; albeit there is little evidence that it could be functionally relevant.^[28]

DNA methylation may affect the transcription of genes in two ways. First, the methylation of DNA itself may physically impede the binding of transcriptional proteins to the gene,^[29] and second, and likely more important, methylated DNA may be bound by proteins known as methyl-CpG-binding domain proteins (MBDs). MBD proteins then recruit additional proteins to the locus, such as histone deacetylases and other chromatin remodeling proteins that can modify histones, thereby forming compact, inactive chromatin, termed heterochromatin. This link between DNA methylation and chromatin structure is very important. In particular, loss of methyl-CpG-binding protein 2 (MeCP2) has been implicated in Rett syndrome; and methyl-CpG-binding domain protein 2 (MBD2) mediates the transcriptional silencing of hypermethylated genes in cancer.

Repression of transposable elements

DNA methylation is a powerful transcriptional repressor, at least in CpG dense contexts. Transcriptional repression of protein-coding genes appears essentially limited to very specific classes of genes that need to be silent permanently and in almost all tissues. While DNA methylation does not have the flexibility required for the fine-tuning of gene regulation, its stability is perfect to ensure the permanent silencing of transposable elements. Transposon control is one the most ancient function of DNA methylation that is shared by animals, plants and multiple protists.^[30] It is even suggested that DNA methylation evolved precisely for this purpose.^[31]

Methylation of the gene body of highly transcribed genes

A function that appears even more conserved than transposon silencing is positively correlated with gene expression. In almost all species where DNA methylation is present, DNA methylation is especially enriched in the body of highly transcribed genes.^[4]^[25] The function of gene body methylation is not well understood. A body of evidence suggests that it could regulate splicing^[32] and suppress the activity of intragenic transcriptional units (cryptic promoters or transposable elements).^[33] Gene-body methylation appears closely tied to H3K36 methylation. In yeast and mammals, H3K36 methylation is highly enriched in the body of highly transcribed genes. In yeast at least, H3K36me3 recruits enzymes such as histone deacetylases to condense chromatin and prevent the activation of cryptic start sites.^[34] In mammals, DNMT3a and DNMT3b PWWP domain binds to H3K36me3 and the two enzymes are recruited to the body of actively transcribed genes.

In mammals

Dynamic of DNA methylation during mouse embryonic development. E3.5-E6, etc., refer to days after fertilization. PGC: primordial germ cells

During embryonic development

DNA methylation patterns are largely erased and then re-established between generations in mammals. Almost all of the methylations from the parents are erased, first during gametogenesis, and again in early embryogenesis, with demethylation and remethylation occurring each time. Demethylation in early embryogenesis occurs in the preimplantation period in two stages – initially in the zygote, then during the first few embryonic replication cycles of morula and blastula. A wave of methylation then takes place during the implantation stage of the embryo, with CpG islands protected from methylation. This results in global repression and allows housekeeping genes to be expressed in all cells. In the post-implantation stage, methylation patterns are stage- and tissue-specific, with changes that would define each individual cell type lasting stably over a long period.^[35]

Whereas DNA methylation is not necessary per se for transcriptional silencing, it is thought nonetheless to represent a “locked” state that definitely inactivates transcription. In particular, DNA methylation appears critical for the maintenance of mono-allelic silencing in the context of genomic imprinting and X chromosome inactivation.^[36]^[37] In these cases, expressed and silent alleles differ by their methylation status, and loss of DNA methylation results in loss of imprinting and re-expression of Xist in somatic cells. During embryonic development, few genes change their methylation status, at the important exception of many genes specifically expressed in the germline.^[38] DNA methylation appears absolutely required in differentiated cells, as knockout of any of the three competent DNA methyltransferase results in embryonic or post-partum lethality. By contrast, DNA methylation is dispensable in undifferentiated cell types, such as the inner cell mass of the blastocyst, primordial germ cells or embryonic stem cells. Since DNA methylation appears to directly regulate only a limited number of genes, how precisely DNA methylation absence causes the death of differentiated cells remain an open question.

Due to the phenomenon of genomic imprinting, maternal and paternal genomes are differentially marked and must be properly reprogrammed every time they pass through the germline. Therefore, during gametogenesis, primordial germ cells must have their original biparental DNA methylation patterns erased and re-established based on the sex of the transmitting parent. After fertilization the paternal and maternal genomes are once again demethylated and remethylated (except for differentially methylated regions associated with imprinted genes). This reprogramming is likely required for totipotency of the newly formed embryo and erasure of acquired epigenetic changes.^[39]

In cancer

In many disease processes, such as cancer, gene promoter CpG islands acquire abnormal hypermethylation, which results in transcriptional silencing that can be inherited by daughter cells following cell division. Alterations of DNA methylation have been recognized as an important component of cancer development. Hypomethylation, in general, arises earlier and is linked to chromosomal instability and loss of imprinting, whereas hypermethylation is associated with promoters and can arise secondary to gene (oncogene suppressor) silencing, but might be a target for epigenetic therapy.^[40]

Global hypomethylation has also been implicated in the development and progression of cancer through different mechanisms.^[41] Typically, there is hypermethylation of tumor suppressor genes and hypomethylation of oncogenes.^[42]

Generally, in progression to cancer, hundreds of genes are silenced or activated. Although silencing of some genes in cancers occurs by mutation, a large proportion of carcinogenic gene silencing is a result of altered DNA methylation. DNA methylation causing silencing in cancer typically occurs at multiple CpG sites in the CpG islands that are present in the promoters of protein coding genes.

Altered expressions of microRNAs also silence or activate many genes in progression to cancer (see microRNAs in cancer). Altered microRNA expression occurs through hyper/hypo-methylation of CpG sites in CpG islands in promoters controlling transcription of the microRNAs.

Silencing of DNA repair genes through methylation of CpG islands in their promoters appears to be especially important in progression to cancer.

In atherosclerosis

Epigenetic modifications such as DNA methylation have been implicated in cardiovascular disease, including atherosclerosis. In animal models of atherosclerosis, vascular tissue as well as blood cells such as mononuclear blood cells exhibit global hypomethylation with gene-specific areas of hypermethylation. DNA methylation polymorphisms may be used as an early biomarker of atherosclerosis since they are present before lesions are observed, which may provide an early tool for detection and risk prevention.^[43]

Two of the cell types targeted for DNA methylation polymorphisms are monocytes and lymphocytes, which experience an overall hypomethylation. One proposed mechanism behind this global hypomethylation is elevated homocysteine levels causing hyperhomocysteinemia, a known risk factor for cardiovascular disease. High plasma levels of homocysteine inhibit DNA methyltransferases, which causes hypomethylation. Hypomethylation of DNA affects gene that alter smooth muscle cell proliferation, cause endothelial cell dysfunction, and increase inflammatory mediators, all of which are critical in forming atherosclerotic lesions.^[44] High levels of homocysteine also result in hypermethylation of CpG islands in the promoter region of the estrogen receptor alpha (ERα) gene, causing its down regulation.^[45] ERα protects against atherosclerosis due to its action as a growth suppressor, causing the smooth muscle cells to remain in a quiescent state.^[46] Hypermethylation of the ERα promoter thus allows intimal smooth muscle cells to proliferate excessively and contribute to the development of the atherosclerotic lesion.^[47]

Another gene that experiences a change in methylation status in atherosclerosis is the monocarboxylate transporter (MCT3), which produces a protein responsible for the transport of lactate and other ketone bodies out of many cell types, including vascular smooth muscle cells. In atherosclerosis patients, there is an increase in methylation of the CpG islands in exon 2, which decreases MCT3 protein expression. The down regulation of MCT3 impairs lactate transport, and significantly increases smooth muscle cell proliferation, which further contributes to the atherosclerotic lesion. An ex vivo experiment using the demethylating agent Decitabine (5-aza-2 -deoxycytidine) was shown to induce MCT3 expression in a dose dependant manner, as all hypermethylated sites in the exon 2 CpG island became demethylated after treatment. This may serve as a novel therapeutic agent to treat atherosclerosis, although no human studies have been conducted thus far.^[48]

In aging

In humans and other mammals, DNA methylation levels can be used to accurately estimate the age of tissues and cell types, forming an accurate epigenetic clock.^[49]

A longitudinal study of twin children showed that, between the ages of 5 and 10, there was divergence of methylation patterns due to environmental rather than genetic influences.^[50] There is a global loss of DNA methylation during aging.^[42]

In a study that analyzed the complete DNA methylomes of CD4⁺ T cells in a newborn, a 26 years old individual and a 103 years old individual was observed that the loss of methylation is proportional to age. Hypomethylated CpGs observed in the centenarian DNAs compared with the neonates covered all genomic compartments (promoters, intergenic, intronic and exonic regions).^[51] However, some genes become hypermethylated with age, including genes for the estrogen receptor, p16, and insulin-like growth factor 2.^[42]

In exercise

High intensity exercise has been shown to result in reduced DNA methylation in skeletal muscle.^[52] Promoter methylation of PGC-1α and PDK4 were immediately reduced after high intensity exercise, whereas PPAR-γ methylation was not reduced until three hours after exercise.^[52] By contrast, six months of exercise in previously sedentary middle-age men resulted in increased methylation in adipose tissue.^[53] One study showed a possible increase in global genomic DNA methylation of white blood cells with more physical activity in non-Hispanics.^[54]

In B-cell differentiation

A study that investigated the methylome of B cells along their differentiation cycle, using whole-genome bisulfite sequencing (WGBS), showed that there is a hypomethylation from the earliest stages to the most differentiated stages. The largest methylation difference is between the stages of germinal center B cells and memory B cells. Furthermore, this study showed that there is a similarity between B cell tumors and long-lived B cells in their DNA methylation signatures.^[14]

In the brain

Research has suggested that long-term memory storage in humans may be regulated by DNA methylation.^[55]^[56]

DNA methyltransferases (in mammals)

Possible pathways of cytosine methylation and demethylation. Abbreviations: S-Adenosyl-L-homocysteine (SAH), S-adenosyl-L-methionine (SAM), DNA methyltransferase (DNA MTase), Uracil-DNA glycosylase (UNG)

In mammalian cells, DNA methylation occurs mainly at the C5 position of CpG dinucleotides and is carried out by two general classes of enzymatic activities – maintenance methylation and de novo methylation.^[57]

Maintenance methylation activity is necessary to preserve DNA methylation after every cellular DNA replication cycle. Without the DNA methyltransferase (DNMT), the replication machinery itself would produce daughter strands that are unmethylated and, over time, would lead to passive demethylation. DNMT1 is the proposed maintenance methyltransferase that is responsible for copying DNA methylation patterns to the daughter strands during DNA replication. Mouse models with both copies of DNMT1 deleted are embryonic lethal at approximately day 9, due to the requirement of DNMT1 activity for development in mammalian cells.

It is thought that DNMT3a and DNMT3b are the de novo methyltransferases that set up DNA methylation patterns early in development. DNMT3L is a protein that is homologous to the other DNMT3s but has no catalytic activity. Instead, DNMT3L assists the de novo methyltransferases by increasing their ability to bind to DNA and stimulating their activity. Finally, DNMT2 (TRDMT1) has been identified as a DNA methyltransferase homolog, containing all 10 sequence motifs common to all DNA methyltransferases; however, DNMT2 (TRDMT1) does not methylate DNA but instead methylates cytosine-38 in the anticodon loop of aspartic acid transfer RNA.^[58]

Since many tumor suppressor genes are silenced by DNA methylation during carcinogenesis, there have been attempts to re-express these genes by inhibiting the DNMTs. 5-Aza-2'-deoxycytidine (decitabine) is a nucleoside analog that inhibits DNMTs by trapping them in a covalent complex on DNA by preventing the β-elimination step of catalysis, thus resulting in the enzymes' degradation. However, for decitabine to be active, it must be incorporated into the genome of the cell, which can cause mutations in the daughter cells if the cell does not die. In addition, decitabine is toxic to the bone marrow, which limits the size of its therapeutic window. These pitfalls have led to the development of antisense RNA therapies that target the DNMTs by degrading their mRNAs and preventing their translation. However, it is currently unclear whether targeting DNMT1 alone is sufficient to reactivate tumor suppressor genes silenced by DNA methylation.

In plants

Significant progress has been made in understanding DNA methylation in the model plant Arabidopsis thaliana. DNA methylation in plants differs from that of mammals: while DNA methylation in mammals mainly occurs on the cytosine nucleotide in a CpG site, in plants the cytosine can be methylated at CpG, CpHpG, and CpHpH sites, where H represents any nucleotide but not guanine. Overall, Arabidopsis DNA is highly methylated, mass spectrometry analysis estimated 14% of cytosines to be modified.^[5]

The principal Arabidopsis DNA methyltransferase enzymes, which transfer and covalently attach methyl groups onto DNA, are DRM2, MET1, and CMT3. Both the DRM2 and MET1 proteins share significant homology to the mammalian methyltransferases DNMT3 and DNMT1, respectively, whereas the CMT3 protein is unique to the plant kingdom. There are currently two classes of DNA methyltransferases: 1) the de novo class, or enzymes that create new methylation marks on the DNA; and 2) a maintenance class that recognizes the methylation marks on the parental strand of DNA and transfers new methylation to the daughters strands after DNA replication. DRM2 is the only enzyme that has been implicated as a de novo DNA methyltransferase. DRM2 has also been shown, along with MET1 and CMT3 to be involved in maintaining methylation marks through DNA replication.^[59] Other DNA methyltransferases are expressed in plants but have no known function.

It is not clear how the cell determines the locations of de novo DNA methylation, but evidence suggests that, for many (though not all) locations, RNA-directed DNA methylation (RdDM) is involved. In RdDM, specific RNA transcripts are produced from a genomic DNA template, and this RNA forms secondary structures called double-stranded RNA molecules.^[60] The double-stranded RNAs, through either the small interfering RNA (siRNA) or microRNA (miRNA) pathways direct de-novo DNA methylation of the original genomic location that produced the RNA.^[60] This sort of mechanism is thought to be important in cellular defense against RNA viruses and/or transposons, both of which often form a double-stranded RNA that can be mutagenic to the host genome. By methylating their genomic locations, through an as yet poorly understood mechanism, they are shut off and are no longer active in the cell, protecting the genome from their mutagenic effect. Recently, it was described that methylation of the DNA is the main determinant of embryogenic cultures formation from explants in woody plants and is regarded the main mechanism that explains the poor response of mature explants to somatic embryogenesis in the plants (Isah 2016).

In insects

Functional DNA methylation has been discovered in Honey Bees.^[61]^[62] DNA methylation marks are mainly on the gene body, and current opinions on the function of DNA methylation is gene regulation via alternative splicing ^[63]

DNA methylation levels in Drosophila melanogaster are nearly undetectable.^[64] Sensitive methods applied to Drosophila DNA Suggest levels in the range of 0.1–0.3% of total cytosine.^[65] This low level of methylation ^[66] appears to reside in genomic sequence patterns that are very different from patterns seen in humans, or in other animal or plant species to date. Genomic methylation in D. melanogaster was found at specific short motifs (concentrated in specific 5-base sequence motifs that are CA- and CT-rich but depleted of guanine) and is independent of DNMT2 activity. Further, highly sensitive mass spectrometry approaches,^[67] have now demonstrated the presence of low (0.07%) but significant levels of adenine methylation during the earliest stages of Drosophila embryogenesis.

In fungi

Many fungi have low levels (0.1 to 0.5%) of cytosine methylation, whereas other fungi have as much as 5% of the genome methylated.^[68] This value seems to vary both among species and among isolates of the same species.^[69] There is also evidence that DNA methylation may be involved in state-specific control of gene expression in fungi.^{[citation needed]} However, at a detection limit of 250 attomoles by using ultra-high sensitive mass spectrometry DNA methylation was not confirmed in single cellular yeast species such as Saccharomyces cerevisiae or Schizosaccharomyces pombe, indicating that yeasts do not possess this DNA modification.^[5]

Although brewers' yeast (Saccharomyces), fission yeast (Schizosaccharomyces), and Aspergillus flavus^[70] have no detectable DNA methylation, the model filamentous fungus Neurospora crassa has a well-characterized methylation system.^[71] Several genes control methylation in Neurospora and mutation of the DNA methyl transferase, dim-2, eliminates all DNA methylation but does not affect growth or sexual reproduction. While the Neurospora genome has very little repeated DNA, half of the methylation occurs in repeated DNA including transposon relics and centromeric DNA. The ability to evaluate other important phenomena in a DNA methylase-deficient genetic background makes Neurospora an important system in which to study DNA methylation.

In lower eukaryotes

DNA methylation is largely absent from Dictyostelium discoidium^[72] where it appears to occur at about 0.006% of cytosines.^[2] In contrast, DNA methylation is widely distributed in Physarum polycephalum ^[73] where 5-methylcytosine makes up as much as 8% of total cytosine^[1]

In bacteria

Adenine or cytosine methylation is part of the restriction modification system of many bacteria, in which specific DNA sequences are methylated periodically throughout the genome. A methylase is the enzyme that recognizes a specific sequence and methylates one of the bases in or near that sequence. Foreign DNAs (which are not methylated in this manner) that are introduced into the cell are degraded by sequence-specific restriction enzymes and cleaved. Bacterial genomic DNA is not recognized by these restriction enzymes. The methylation of native DNA acts as a sort of primitive immune system, allowing the bacteria to protect themselves from infection by bacteriophage.

E. coli DNA adenine methyltransferase (Dam) is an enzyme of ~32 kDa that does not belong to a restriction/modification system. The target recognition sequence for E. coli Dam is GATC, as the methylation occurs at the N6 position of the adenine in this sequence (G meATC). The three base pairs flanking each side of this site also influence DNA–Dam binding. Dam plays several key roles in bacterial processes, including mismatch repair, the timing of DNA replication, and gene expression. As a result of DNA replication, the status of GATC sites in the E. coli genome changes from fully methylated to hemimethylated. This is because adenine introduced into the new DNA strand is unmethylated. Re-methylation occurs within two to four seconds, during which time replication errors in the new strand are repaired. Methylation, or its absence, is the marker that allows the repair apparatus of the cell to differentiate between the template and nascent strands. It has been shown that altering Dam activity in bacteria results in increased spontaneous mutation rate. Bacterial viability is compromised in dam mutants that also lack certain other DNA repair enzymes, providing further evidence for the role of Dam in DNA repair.

One region of the DNA that keeps its hemimethylated status for longer is the origin of replication, which has an abundance of GATC sites. This is central to the bacterial mechanism for timing DNA replication. SeqA binds to the origin of replication, sequestering it and thus preventing methylation. Because hemimethylated origins of replication are inactive, this mechanism limits DNA replication to once per cell cycle.

Expression of certain genes, for example those coding for pilus expression in E. coli, is regulated by the methylation of GATC sites in the promoter region of the gene operon. The cells' environmental conditions just after DNA replication determine whether Dam is blocked from methylating a region proximal to or distal from the promoter region. Once the pattern of methylation has been created, the pilus gene transcription is locked in the on or off position until the DNA is again replicated. In E. coli, these pilus operons have important roles in virulence in urinary tract infections. It has been proposed^{[by whom?]} that inhibitors of Dam may function as antibiotics.

On the other hand, DNA cytosine methylase targets CCAGG and CCTGG sites to methylate cytosine at the C5 position (C meC(A/T) GG). The other methylase enzyme, EcoKI, causes methylation of adenines in the sequences AAC(N₆)GTGC and GCAC(N₆)GTT.

Molecular cloning

Most strains used by molecular biologists are derivatives of E. coli K-12, and possess both Dam and Dcm, but there are commercially available strains that are dam-/dcm- (lack of activity of either methylase). In fact, it is possible to unmethylate the DNA extracted from dam+/dcm+ strains by transforming it into dam-/dcm- strains. This would help digest sequences that are not being recognized by methylation-sensitive restriction enzymes.^[74]^[75]

The restriction enzyme DpnI can recognize 5'-GmeATC-3' sites and digest the methylated DNA. Being such a short motif, it occurs frequently in sequences by chance, and as such its primary use for researchers is to degrade template DNA following PCRs (PCR products lack methylation, as no methylases are present in the reaction). Similarly, some commercially available restriction enzymes are sensitive to methylation at their cognate restriction sites, and must as mentioned previously be used on DNA passed through a dam-/dcm- strain to allow cutting.

Detection

DNA methylation can be detected by the following assays currently used in scientific research:^[76]

Mass spectrometry is a very sensitive and reliable analytical method to detect DNA methylation. MS in general is however not informative about the sequence context of the methylation, thus limited in studying the function of this DNA modification.
Methylation-Specific PCR (MSP), which is based on a chemical reaction of sodium bisulfite with DNA that converts unmethylated cytosines of CpG dinucleotides to uracil or UpG, followed by traditional PCR.^[77] However, methylated cytosines will not be converted in this process, and primers are designed to overlap the CpG site of interest, which allows one to determine methylation status as methylated or unmethylated.
Whole genome bisulfite sequencing, also known as BS-Seq, which is a high-throughput genome-wide analysis of DNA methylation. It is based on aforementioned sodium bisulfite conversion of genomic DNA, which is then sequenced on a Next-generation sequencing platform. The sequences obtained are then re-aligned to the reference genome to determine methylation states of CpG dinucleotides based on mismatches resulting from the conversion of unmethylated cytosines into uracil.
Reduced representation bisulfite sequencing, also known as RRBS knows several working protocols. The first RRBS protocol was called RRBS and aims for around 10% of the methylome, a reference genome is needed. Later came more protocols that were able to sequence a smaller portion of the genome and higher sample multiplexing. EpiGBS was the first protocol were you could multiplex 96 sample in one lane of Illumina sequencing and were a reference genome was not longer needed. A de novo reference construction from the Watson and Crick reads made population screening of SNP's and SMP's simultaneously a fact.
The HELP assay, which is based on restriction enzymes' differential ability to recognize and cleave methylated and unmethylated CpG DNA sites.
GLAD-PCR assay, which is based on new type of enzymes – site-specific methyl-directed DNA endonucleases, which hydrolyze only methylated DNA.
ChIP-on-chip assays, which is based on the ability of commercially prepared antibodies to bind to DNA methylation-associated proteins like MeCP2.
Restriction landmark genomic scanning, a complicated and now rarely used assay based upon restriction enzymes' differential recognition of methylated and unmethylated CpG sites; the assay is similar in concept to the HELP assay.
Methylated DNA immunoprecipitation (MeDIP), analogous to chromatin immunoprecipitation, immunoprecipitation is used to isolate methylated DNA fragments for input into DNA detection methods such as DNA microarrays (MeDIP-chip) or DNA sequencing (MeDIP-seq).
Pyrosequencing of bisulfite treated DNA. This is sequencing of an amplicon made by a normal forward primer but a biotinylated reverse primer to PCR the gene of choice. The Pyrosequencer then analyses the sample by denaturing the DNA and adding one nucleotide at a time to the mix according to a sequence given by the user. If there is a mis-match, it is recorded and the percentage of DNA for which the mis-match is present is noted. This gives the user a percentage methylation per CpG island.
Molecular break light assay for DNA adenine methyltransferase activity – an assay that relies on the specificity of the restriction enzyme DpnI for fully methylated (adenine methylation) GATC sites in an oligonucleotide labeled with a fluorophore and quencher. The adenine methyltransferase methylates the oligonucleotide making it a substrate for DpnI. Cutting of the oligonucleotide by DpnI gives rise to a fluorescence increase.^[78]^[79]
Methyl Sensitive Southern Blotting is similar to the HELP assay, although uses Southern blotting techniques to probe gene-specific differences in methylation using restriction digests. This technique is used to evaluate local methylation near the binding site for the probe.
MethylCpG Binding Proteins (MBPs) and fusion proteins containing just the Methyl Binding Domain (MBD) are used to separate native DNA into methylated and unmethylated fractions. The percentage methylation of individual CpG islands can be determined by quantifying the amount of the target in each fraction.^[80] Extremely sensitive detection can be achieved in FFPE tissues with abscription-based detection.
High Resolution Melt Analysis (HRM or HRMA), is a post-PCR analytical technique. The target DNA is treated with sodium bisulfite, which chemically converts unmethylated cytosines into uracils, while methylated cytosines are preserved. PCR amplification is then carried out with primers designed to amplify both methylated and unmethylated templates. After this amplification, highly methylated DNA sequences contain a higher number of CpG sites compared to unmethylated templates, which results in a different melting temperature that can be used in quantitative methylation detection.^[81]^[82]
Ancient DNA methylation reconstruction, a method to reconstruct high-resolution DNA methylation from ancient DNA samples. The method is based on the natural degradation processes that occur in ancient DNA: with time, methylated cytosines are degraded into thymines, whereas unmethylated cytosines are degraded into uracils. This asymmetry in degradation signals was used to reconstruct the full methylation maps of the Neanderthal and the Denisovan ^[83]

Differentially methylated regions (DMRs)

Differentially methylated regions, are genomic regions with different methylation statuses among multiple samples (tissues, cells, individuals or others), are regarded as possible functional regions involved in gene transcriptional regulation. The identification of DMRs among multiple tissues (T-DMRs) provides a comprehensive survey of epigenetic differences among human tissues.^[84] For example, these methylated regions that are unique to a particular tissue allow individuals to differentiate between tissue type, such as semen and vaginal fluid. Current research conducted by Lee et al., showed DACT1 and USP49 positively identified semen by examining T-DMRs.^[85] DMRs between cancer and normal samples (C-DMRs) demonstrate the aberrant methylation in cancers.^[86] It is well known that DNA methylation is associated with cell differentiation and proliferation.^[87] Many DMRs have been found in the development stages (D-DMRs) ^[88] and in the reprogrammed progress (R-DMRs).^[89] In addition, there are intra-individual DMRs (Intra-DMRs) with longitudinal changes in global DNA methylation along with the increase of age in a given individual.^[90] There are also inter-individual DMRs (Inter-DMRs) with different methylation patterns among multiple individuals.^[91]

QDMR (Quantitative Differentially Methylated Regions) is a quantitative approach to quantify methylation difference and identify DMRs from genome-wide methylation profiles by adapting Shannon entropy <http://bioinfo.hrbmu.edu.cn/qdmr>. The platform-free and species-free nature of QDMR makes it potentially applicable to various methylation data. This approach provides an effective tool for the high-throughput identification of the functional regions involved in epigenetic regulation. QDMR can be used as an effective tool for the quantification of methylation difference and identification of DMRs across multiple samples.^[92]

Gene-set analysis (a.k.a. pathway analysis; usually performed tools such as DAVID, GoSeq or GSEA) has been shown to be severely biased when applied to high-throughput methylation data (e.g. MeDIP-seq, MeDIP-ChIP, HELP-seq etc.), and a wide range of studies have thus mistakenly reported hyper-methylation of genes related to development and differentiation; it has been suggested that this can be corrected using sample label permutations or using a statistical model to control for differences in the numberes of CpG probes / CpG sites that target each gene.^[93]

DNA methylation marks

DNA methylation marks, are genomic regions with specific methylation pattern in a specific biological state such as tissue, cell type, individual), are regarded as possible functional regions involved in gene transcriptional regulation. Although various human cell types may have the same genome, these cells have different methylomes. The systematic identification and characterization of methylation marks across cell types are crucial to understanding the complex regulatory network for cell fate determination. Hongbo Liu et al. proposed an entropy-based framework termed SMART to integrate the whole genome bisulfite sequencing methylomes across 42 human tissues/cells and identified 757,887 genome segments.^[94] Nearly 75% of the segments showed uniform methylation across all cell types. From the remaining 25% of the segments, they identified cell type-specific hypo/hypermethylation marks that were specifically hypo/hypermethylated in a minority of cell types using a statistical approach and presented an atlas of the human methylation marks. Further analysis revealed that the cell type-specific hypomethylation marks were enriched through H3K27ac and transcription factor binding sites in cell type-specific manner. In particular, they observed that the cell type-specific hypomethylation marks are associated with the cell type-specific super-enhancers that drive the expression of cell identity genes. This framework provides a complementary, functional annotation of the human genome and helps to elucidate the critical features and functions of cell type-specific hypomethylation.

The entropy-based Specific Methylation Analysis and Report Tool, termed "SMART", which focuses on integrating a large number of DNA methylomes for the de novo identification of cell type-specific methylation marks. The latest version of SMART is focused on three main functions including de novo identification of differentially methylated regions (DMRs) by genome segmentation, identification of DMRs from predefined regions of interest, and identification of differentially methylated CpG sites. SMART is available at http://fame.edbc.org/smart/.

Computational prediction

DNA methylation can also be detected by computational models through sophisticated algorithms and methods. Computational models can facilitate the global profiling of DNA methylation across chromosomes, and often such models are faster and cheaper to perform than biological assays. Such up-to-date computational models include Bhasin, et al.,^[95] Bock, et al.,^[96] and Zheng, et al.^[97] ^[98] Together with biological assay, these methods greatly facilitate the DNA methylation analysis.

Search This Blog