A Medley of Potpourri: May 6, 2018

Sunday, May 6, 2018

Restriction enzyme

From Wikipedia, the free encyclopedia

A restriction enzyme or restriction endonuclease is an enzyme that cleaves DNA into fragments at or near specific recognition sites within the molecule known as restriction sites.^[1]^[2]^[3] Restriction enzymes are commonly classified into five types, which differ in their structure and whether they cut their DNA substrate at their recognition site, or if the recognition and cleavage sites are separate from one another. To cut DNA, all restriction enzymes make two incisions, once through each sugar-phosphate backbone (i.e. each strand) of the DNA double helix.

These enzymes are found in bacteria and archaea and provide a defense mechanism against invading viruses.^[4]^[5] Inside a prokaryote, the restriction enzymes selectively cut up foreign DNA in a process called restriction; meanwhile, host DNA is protected by a modification enzyme (a methyltransferase) that modifies the prokaryotic DNA and blocks cleavage. Together, these two processes form the restriction modification system.^[6]

Over 3000 restriction enzymes have been studied in detail, and more than 600 of these are available commercially.^[7] These enzymes are routinely used for DNA modification in laboratories, and they are a vital tool in molecular cloning.^[8]^[9]^[10]

History

The term restriction enzyme originated from the studies of phage λ, a virus that infects bacteria, and the phenomenon of host-controlled restriction and modification of such bacterial phage or bacteriophage.^[11] The phenomenon was first identified in work done in the laboratories of Salvador Luria and Giuseppe Bertani in the early 1950s.^[12]^[13] It was found that, for a bacteriophage λ that can grow well in one strain of Escherichia coli, for example E. coli C, when grown in another strain, for example E. coli K, its yields can drop significantly, by as much as 3-5 orders of magnitude. The host cell, in this example E. coli K, is known as the restricting host and appears to have the ability to reduce the biological activity of the phage λ. If a phage becomes established in one strain, the ability of that phage to grow also becomes restricted in other strains. In the 1960s, it was shown in work done in the laboratories of Werner Arber and Matthew Meselson that the restriction is caused by an enzymatic cleavage of the phage DNA, and the enzyme involved was therefore termed a restriction enzyme.^[4]^[14]^[15]^[16]

The restriction enzymes studied by Arber and Meselson were type I restriction enzymes, which cleave DNA randomly away from the recognition site.^[17] In 1970, Hamilton O. Smith, Thomas Kelly and Kent Wilcox isolated and characterized the first type II restriction enzyme, HindII, from the bacterium Haemophilus influenzae.^[18]^[19] Restriction enzymes of this type are more useful for laboratory work as they cleave DNA at the site of their recognition sequence. Later, Daniel Nathans and Kathleen Danna showed that cleavage of simian virus 40 (SV40) DNA by restriction enzymes yields specific fragments that can be separated using polyacrylamide gel electrophoresis, thus showing that restriction enzymes can also be used for mapping DNA.^[20] For their work in the discovery and characterization of restriction enzymes, the 1978 Nobel Prize for Physiology or Medicine was awarded to Werner Arber, Daniel Nathans, and Hamilton O. Smith.^[21] The discovery of restriction enzymes allows DNA to be manipulated, leading to the development of recombinant DNA technology that has many applications, for example, allowing the large scale production of proteins such as human insulin used by diabetics.^[12]^[22]

Origins

Restriction enzymes likely evolved from a common ancestor and became widespread via horizontal gene transfer.^[23]^[24] In addition, there is mounting evidence that restriction endonucleases evolved as a selfish genetic element.^[25]

Recognition site

A palindromic recognition site reads the same on the reverse strand as it does on the forward strand when both are read in the same orientation

Restriction enzymes recognize a specific sequence of nucleotides^[2] and produce a double-stranded cut in the DNA. The recognition sequences can also be classified by the number of bases in its recognition site, usually between 4 and 8 bases, and the number of bases in the sequence will determine how often the site will appear by chance in any given genome, e.g., a 4-base pair sequence would theoretically occur once every 4^4 or 256bp, 6 bases, 4^6 or 4,096bp, and 8 bases would be 4^8 or 65,536bp.^[26] Many of them are palindromic, meaning the base sequence reads the same backwards and forwards.^[27] In theory, there are two types of palindromic sequences that can be possible in DNA. The mirror-like palindrome is similar to those found in ordinary text, in which a sequence reads the same forward and backward on a single strand of DNA, as in GTAATG. The inverted repeat palindrome is also a sequence that reads the same forward and backward, but the forward and backward sequences are found in complementary DNA strands (i.e., of double-stranded DNA), as in GTATAC (GTATAC being complementary to CATATG).^[28] Inverted repeat palindromes are more common and have greater biological importance than mirror-like palindromes.

EcoRI digestion produces "sticky" ends,

whereas SmaI restriction enzyme cleavage produces "blunt" ends:

Recognition sequences in DNA differ for each restriction enzyme, producing differences in the length, sequence and strand orientation (5' end or 3' end) of a sticky-end "overhang" of an enzyme restriction.^[29]

Different restriction enzymes that recognize the same sequence are known as neoschizomers. These often cleave in different locales of the sequence. Different enzymes that recognize and cleave in the same location are known as isoschizomers.

Types

Naturally occurring restriction endonucleases are categorized into four groups (Types I, II III, and IV) based on their composition and enzyme cofactor requirements, the nature of their target sequence, and the position of their DNA cleavage site relative to the target sequence.^[30]^[31]^[32] DNA sequence analyses of restriction enzymes however show great variations, indicating that there are more than four types.^[33] All types of enzymes recognize specific short DNA sequences and carry out the endonucleolytic cleavage of DNA to give specific fragments with terminal 5'-phosphates. They differ in their recognition sequence, subunit composition, cleavage position, and cofactor requirements,^[34]^[35] as summarised below:

Type I enzymes (EC 3.1.21.3) cleave at sites remote from a recognition site; require both ATP and S-adenosyl-L-methionine to function; multifunctional protein with both restriction and methylase (EC 2.1.1.72) activities.
Type II enzymes (EC 3.1.21.4) cleave within or at short specific distances from a recognition site; most require magnesium; single function (restriction) enzymes independent of methylase.
Type III enzymes (EC 3.1.21.5) cleave at sites a short distance from a recognition site; require ATP (but do not hydrolyse it); S-adenosyl-L-methionine stimulates the reaction but is not required; exist as part of a complex with a modification methylase (EC 2.1.1.72).
Type IV enzymes target modified DNA, e.g. methylated, hydroxymethylated and glucosyl-hydroxymethylated DNA

Type l

Type I restriction enzymes were the first to be identified and were first identified in two different strains (K-12 and B) of E. coli.^[36] These enzymes cut at a site that differs, and is a random distance (at least 1000 bp) away, from their recognition site. Cleavage at these random sites follows a process of DNA translocation, which shows that these enzymes are also molecular motors. The recognition site is asymmetrical and is composed of two specific portions—one containing 3–4 nucleotides, and another containing 4–5 nucleotides—separated by a non-specific spacer of about 6–8 nucleotides. These enzymes are multifunctional and are capable of both restriction and modification activities, depending upon the methylation status of the target DNA. The cofactors S-Adenosyl methionine (AdoMet), hydrolyzed adenosine triphosphate (ATP), and magnesium (Mg²⁺) ions, are required for their full activity. Type I restriction enzymes possess three subunits called HsdR, HsdM, and HsdS; HsdR is required for restriction; HsdM is necessary for adding methyl groups to host DNA (methyltransferase activity), and HsdS is important for specificity of the recognition (DNA-binding) site in addition to both restriction (DNA cleavage) and modification (DNA methyltransferase) activity.^[30]^[36]

Type II

Type II site-specific deoxyribonuclease

Structure of the homodimeric restriction enzyme EcoRI (cyan and green cartoon diagram) bound to double stranded DNA (brown tubes).^[37] Two catalytic magnesium ions (one from each monomer) are shown as magenta spheres and are adjacent to the cleaved sites in the DNA made by the enzyme (depicted as gaps in the DNA backbone).

Identifiers

Databases

PDB structures

Typical type II restriction enzymes differ from type I restriction enzymes in several ways. They form homodimers, with recognition sites that are usually undivided and palindromic and 4–8 nucleotides in length. They recognize and cleave DNA at the same site, and they do not use ATP or AdoMet for their activity—they usually require only Mg²⁺ as a cofactor.^[27] These enzymes cleave the phosphodiester bond of double helix DNA. It can either cleave at the center of both strands to yield a blunt end. Or it can cleave at a staggered position leaving overhangs called sticky ends.^[38] These are the most commonly available and used restriction enzymes. In the 1990s and early 2000s, new enzymes from this family were discovered that did not follow all the classical criteria of this enzyme class, and new subfamily nomenclature was developed to divide this large family into subcategories based on deviations from typical characteristics of type II enzymes.^[27] These subgroups are defined using a letter suffix.

Type IIB restriction enzymes (e.g., BcgI and BplI) are multimers, containing more than one subunit.^[27] They cleave DNA on both sides of their recognition to cut out the recognition site. They require both AdoMet and Mg²⁺ cofactors. Type IIE restriction endonucleases (e.g., NaeI) cleave DNA following interaction with two copies of their recognition sequence.^[27] One recognition site acts as the target for cleavage, while the other acts as an allosteric effector that speeds up or improves the efficiency of enzyme cleavage. Similar to type IIE enzymes, type IIF restriction endonucleases (e.g. NgoMIV) interact with two copies of their recognition sequence but cleave both sequences at the same time.^[27] Type IIG restriction endonucleases (e.g., Eco57I) do have a single subunit, like classical Type II restriction enzymes, but require the cofactor AdoMet to be active.^[27] Type IIM restriction endonucleases, such as DpnI, are able to recognize and cut methylated DNA.^[27] Type IIS restriction endonucleases (e.g., FokI) cleave DNA at a defined distance from their non-palindromic asymmetric recognition sites;^[27] this characteristic is widely used to perform in-vitro cloning techniques such as Golden Gate cloning. These enzymes may function as dimers. Similarly, Type IIT restriction enzymes (e.g., Bpu10I and BslI) are composed of two different subunits. Some recognize palindromic sequences while others have asymmetric recognition sites.^[27]

Type III

Type III restriction enzymes (e.g., EcoP15) recognize two separate non-palindromic sequences that are inversely oriented. They cut DNA about 20–30 base pairs after the recognition site.^[39] These enzymes contain more than one subunit and require AdoMet and ATP cofactors for their roles in DNA methylation and restriction, respectively.^[40] They are components of prokaryotic DNA restriction-modification mechanisms that protect the organism against invading foreign DNA. Type III enzymes are hetero-oligomeric, multifunctional proteins composed of two subunits, Res and Mod. The Mod subunit recognises the DNA sequence specific for the system and is a modification methyltransferase; as such, it is functionally equivalent to the M and S subunits of type I restriction endonuclease. Res is required for restriction, although it has no enzymatic activity on its own. Type III enzymes recognise short 5–6 bp-long asymmetric DNA sequences and cleave 25–27 bp downstream to leave short, single-stranded 5' protrusions. They require the presence of two inversely oriented unmethylated recognition sites for restriction to occur. These enzymes methylate only one strand of the DNA, at the N-6 position of adenosyl residues, so newly replicated DNA will have only one strand methylated, which is sufficient to protect against restriction. Type III enzymes belong to the beta-subfamily of N6 adenine methyltransferases, containing the nine motifs that characterise this family, including motif I, the AdoMet binding pocket (FXGXG), and motif IV, the catalytic region (S/D/N (PP) Y/F).^[34]^[41]

Type IV

Type IV enzymes recognize modified, typically methylated DNA and are exemplified by the McrBC and Mrr systems of E. coli.^[33]

Type V

Type V restriction enzymes (e.g., the cas9-gRNA complex from CRISPRs^[42]) utilize guide RNAs to target specific non-palindromic sequences found on invading organisms. They can cut DNA of variable length, provided that a suitable guide RNA is provided. The flexibility and ease of use of these enzymes make them promising for future genetic engineering applications.^[42]^[43]

Artificial restriction enzymes

Artificial restriction enzymes can be generated by fusing a natural or engineered DNA binding domain to a nuclease domain (often the cleavage domain of the type IIS restriction enzyme FokI).^[44] Such artificial restriction enzymes can target large DNA sites (up to 36 bp) and can be engineered to bind to desired DNA sequences.^[45] Zinc finger nucleases are the most commonly used artificial restriction enzymes and are generally used in genetic engineering applications,^[46]^[47]^[48]^[49] but can also be used for more standard gene cloning applications.^[50] Other artificial restriction enzymes are based on the DNA binding domain of TAL effectors.^[51]^[52]

In 2013, a new technology CRISPR-Cas9, based on a prokaryotic viral defense system, was engineered for editing the genome, and it was quickly adopted in laboratories.^[53] For more detail, read CRISPR (Clustered regularly interspaced short palindromic repeats).

In 2017 a group in Illinois announced using an Argonaute protein taken from Pyrococcus furiosus (PfAgo) along with guide DNA to edit DNA as artificial restriction enzymes.^[54]

Artificial ribonucleases that act as restriction enzymes for RNA are also being developed. A PNA-based system, called PNAzymes, has a Cu(II)-2,9-dimethylphenanthroline group that mimics ribonucleases for specific RNA sequence and cleaves at a non-base-paired region (RNA bulge) of the targeted RNA formed when the enzyme binds the RNA. This enzyme shows selectivity by cleaving only at one site that either does not have a mismatch or is kinetically preferred out of two possible cleavage sites.^[55]

Nomenclature

Derivation of the EcoRI name
Abbreviation	Meaning	Description
E	Escherichia	genus
co	coli	specific species
R	RY13	strain
I	First identified	order of identification in the bacterium

Since their discovery in the 1970s, many restriction enzymes have been identified; for example, more than 3500 different Type II restriction enzymes have been characterized.^[56] Each enzyme is named after the bacterium from which it was isolated, using a naming system based on bacterial genus, species and strain.^[57]^[58] For example, the name of the EcoRI restriction enzyme was derived as shown in the box.

Applications

Isolated restriction enzymes are used to manipulate DNA for different scientific applications.

They are used to assist insertion of genes into plasmid vectors during gene cloning and protein production experiments. For optimal use, plasmids that are commonly used for gene cloning are modified to include a short polylinker sequence (called the multiple cloning site, or MCS) rich in restriction enzyme recognition sequences. This allows flexibility when inserting gene fragments into the plasmid vector; restriction sites contained naturally within genes influence the choice of endonuclease for digesting the DNA, since it is necessary to avoid restriction of wanted DNA while intentionally cutting the ends of the DNA. To clone a gene fragment into a vector, both plasmid DNA and gene insert are typically cut with the same restriction enzymes, and then glued together with the assistance of an enzyme known as a DNA ligase.^[59]^[60]

Restriction enzymes can also be used to distinguish gene alleles by specifically recognizing single base changes in DNA known as single nucleotide polymorphisms (SNPs).^[61]^[62] This is however only possible if a SNP alters the restriction site present in the allele. In this method, the restriction enzyme can be used to genotype a DNA sample without the need for expensive gene sequencing. The sample is first digested with the restriction enzyme to generate DNA fragments, and then the different sized fragments separated by gel electrophoresis. In general, alleles with correct restriction sites will generate two visible bands of DNA on the gel, and those with altered restriction sites will not be cut and will generate only a single band. A DNA map by restriction digest can also be generated that can give the relative positions of the genes.^[63] The different lengths of DNA generated by restriction digest also produce a specific pattern of bands after gel electrophoresis, and can be used for DNA fingerprinting.

In a similar manner, restriction enzymes are used to digest genomic DNA for gene analysis by Southern blot. This technique allows researchers to identify how many copies (or paralogues) of a gene are present in the genome of one individual, or how many gene mutations (polymorphisms) have occurred within a population. The latter example is called restriction fragment length polymorphism (RFLP).^[64]

Artificial restriction enzymes created by linking the FokI DNA cleavage domain with an array of DNA binding proteins or zinc finger arrays, denoted zinc finger nucleases (ZFN), are a powerful tool for host genome editing due to their enhanced sequence specificity. ZFN work in pairs, their dimerization being mediated in-situ through the FokI domain. Each zinc finger array (ZFA) is capable of recognizing 9–12 base pairs, making for 18–24 for the pair. A 5–7 bp spacer between the cleavage sites further enhances the specificity of ZFN, making them a safe and more precise tool that can be applied in humans. A recent Phase I clinical trial of ZFN for the targeted abolition of the CCR5 co-receptor for HIV-1 has been undertaken.^[65]

Others have proposed using the bacteria R-M system as a model for devising human anti-viral gene or genomic vaccines and therapies since the RM system serves an innate defense-role in bacteria by restricting tropism by bacteriophages.^[66] There is research on REases and ZFN that can cleave the DNA of various human viruses, including HSV-2, high-risk HPVs and HIV-1, with the ultimate goal of inducing target mutagenesis and aberrations of human-infecting viruses.^[67]^[68]^[69] Interestingly, the human genome already contains remnants of retroviral genomes that have been inactivated and harnessed for self-gain. Indeed, the mechanisms for silencing active L1 genomic retroelements by the three prime repair exonuclease 1 (TREX1) and excision repair cross complementing 1(ERCC) appear to mimic the action of RM-systems in bacteria, and the non-homologous end-joining (NHEJ) that follows the use of ZFN without a repair template.^[70]^[71]

Examples

Examples of restriction enzymes include:^[72]

Enzyme	Source	Recognition Sequence	Cut
EcoRI	Escherichia coli	5'GAATTC 3'CTTAAG	5'---G AATTC---3' 3'---CTTAA G---5'
EcoRII	Escherichia coli	5'CCWGG 3'GGWCC	5'--- CCWGG---3' 3'---GGWCC ---5'
BamHI	Bacillus amyloliquefaciens	5'GGATCC 3'CCTAGG	5'---G GATCC---3' 3'---CCTAG G---5'
HindIII	Haemophilus influenzae	5'AAGCTT 3'TTCGAA	5'---A AGCTT---3' 3'---TTCGA A---5'
TaqI	Thermus aquaticus	5'TCGA 3'AGCT	5'---T CGA---3' 3'---AGC T---5'
NotI	Nocardia otitidis	5'GCGGCCGC 3'CGCCGGCG	5'---GC GGCCGC---3' 3'---CGCCGG CG---5'
HinFI	Haemophilus influenzae	5'GANTC 3'CTNAG	5'---G ANTC---3' 3'---CTNA G---5'
Sau3AI	Staphylococcus aureus	5'GATC 3'CTAG	5'--- GATC---3' 3'---CTAG ---5'
PvuII*	Proteus vulgaris	5'CAGCTG 3'GTCGAC	5'---CAG CTG---3' 3'---GTC GAC---5'
SmaI*	Serratia marcescens	5'CCCGGG 3'GGGCCC	5'---CCC GGG---3' 3'---GGG CCC---5'
HaeIII*	Haemophilus aegyptius	5'GGCC 3'CCGG	5'---GG CC---3' 3'---CC GG---5'
HgaI^[73]	Haemophilus gallinarum	5'GACGC 3'CTGCG	5'---NN NN---3' 3'---NN NN---5'
AluI*	Arthrobacter luteus	5'AGCT 3'TCGA	5'---AG CT---3' 3'---TC GA---5'
EcoRV*	Escherichia coli	5'GATATC 3'CTATAG	5'---GAT ATC---3' 3'---CTA TAG---5'
EcoP15I	Escherichia coli	5'CAGCAGN₂₅NN 3'GTCGTCN₂₅NN	5'---CAGCAGN₂₅ NN---3' 3'---GTCGTCN₂₅NN ---5'
KpnI^[74]	Klebsiella pneumoniae	5'GGTACC 3'CCATGG	5'---GGTAC C---3' 3'---C CATGG---5'
PstI^[74]	Providencia stuartii	5'CTGCAG 3'GACGTC	5'---CTGCA G---3' 3'---G ACGTC---5'
SacI^[74]	Streptomyces achromogenes	5'GAGCTC 3'CTCGAG	5'---GAGCT C---3' 3'---C TCGAG---5'
SalI^[74]	Streptomyces albus	5'GTCGAC 3'CAGCTG	5'---G TCGAC---3' 3'---CAGCT G---5'
ScaI*^[74]	Streptomyces caespitosus	5'AGTACT 3'TCATGA	5'---AGT ACT---3' 3'---TCA TGA---5'
SpeI	Sphaerotilus natans	5'ACTAGT 3'TGATCA	5'---A CTAGT---3' 3'---TGATC A---5'
SphI^[74]	Streptomyces phaeochromogenes	5'GCATGC 3'CGTACG	5'---GCATG C---3' 3'---C GTACG---5'
StuI*^[75]^[76]	Streptomyces tubercidicus	5'AGGCCT 3'TCCGGA	5'---AGG CCT---3' 3'---TCC GGA---5'
XbaI^[74]	Xanthomonas badrii	5'TCTAGA 3'AGATCT	5'---T CTAGA---3' 3'---AGATC T---5'

Key:
* = blunt ends
N = C or G or T or A
W = A or T

How the New Science of Computational History Is Changing the Study of the Past

Applying network theory to medieval records suggests that historical events are governed by “laws of history,” just as nature is bound by the laws of physics.

One of the curious features of network science is that the same networks underlie entirely different phenomena. As a result, these phenomena have deep similarities that are far from obvious at first glance. Good examples include the spread of disease, the size of forest fires, and even the distribution of earthquake magnitude, which all follow a similar pattern. This is a direct result of their sharing the same network structure.

So it’s usually no surprise that the same “laws” emerge when physicists find the same networks underlying other phenomena. Exactly this has happened repeatedly in the social sciences. Network science now allows social scientists to model societies, to study the way ideas, gossip, fashions, and so on flow through society—and even to study how this influences opinion.

To do this they’ve used the tools developed to study other disciplines. That’s why the new field of computational social science has become so powerful so quickly.

But there’s another field of endeavor that also stands to benefit: the study of history. Throughout history, humans have formed networks that have played a profound role in the way events have unfolded. Historians have recently begun to reconstruct these networks using historical sources such as correspondence and contemporary records.

Today, Johannes Preiser-Kapeller at the Austrian Academy of Science in Vienna explains how this approach is casting a new light on various historical events. Indeed, the work has uncovered previously unknown patterns in the way history unfolds. In the same way that patterns in nature reveal the laws of physics, these discoveries are revealing the first laws of history.

Preiser-Kapeller has focused on medieval conflicts and particularly those relating to the Byzantine Empire in the 14th century, which was concentrated around Constantinople, a link between European and Asian trade networks. This was a period of significant conflict because of changing political forces, the plague, and climate change caused by a small ice age during the Middle Ages.

Preiser-Kapeller has reconstructed the political networks that existed at the time using surviving correspondence and other historical records. In these networks, each influential individual is a node, and links are drawn between those who share significant relationships. To be registered on the network, these links have to be recorded in correspondence with phrases such as My noble aunt or My imperial cousin. He also records how these change over time.

Using standard algorithms to study various measures of network structure, Preiser-Kapeller found clusters within the network, identified the most important actors in a network, and examined how individuals clustered around others who were similar in some way.

How these measures change over time turns out to have an important link to the major events that unfolded later. For example, Preiser-Kapeller says, the fragmentation of the political network created the conditions for a civil war that permanently weakened the Byzantine Empire. It ultimately collapsed in 1453.

These changes also followed some interesting patterns. “The distribution of frequencies of the number of conflict ties activated in a year tends to follow a power law,” says Preiser-Kapeller. Exactly the same power-law patterns emerge when complexity scientists study the size distribution of wars, epidemics, and religions.

An interesting question is whether the same patterns turn up elsewhere in history. To find out, he compared the Byzantium network with those from five other periods of medieval conflict in Europe, Africa, and Asia.

And the results make for interesting reading. “On average across all five polities, a change of ruler in one year increased the probability for another change in the following year threefold,” says Preiser-Kapeller. So the closer you are to an upheaval, the more likely there is to be another one soon. Or in other words, upheavals tend to cluster together.

That’s a rule that should sound familiar to geophysicists. A similar phenomenon exists in earthquake records: the more recent a big earthquake, the greater the likelihood of another big one soon. This is known as Omori’s law—that earthquakes tend to cluster together.

It’s no surprise that similar effects arise in these systems, since they are both governed by the same network science. Historians would be well within their rights to adopt this and other patterns as “laws of history.”

These laws are ripe for further study. While the complexity that arises from network theory in many areas of science has been studied for decades, there has been almost no such research in the field of history. That suggests there is low-hanging fruit to be had by the first generation of computational historians, like Preiser-Kapeller. Expect to hear more about it the near future.

Ref: arxiv.org/abs/1606.03433 : Calculating the Middle Ages? The Project “Complexities and Networks in the Medieval Mediterranean and the Near East”

Transcription factor

From Wikipedia, the free encyclopedia

Transcription factor glossary
gene expression – the process by which information from a gene is used in the synthesis of a functional gene product such as a protein transcription – the process of making messenger RNA (mRNA) from a DNA template by RNA polymerase transcription factor – a protein that binds to DNA and regulates gene expression by promoting or suppressing transcription transcriptional regulation – controlling the rate of gene transcription for example by helping or hindering RNA polymerase binding to DNA upregulation, activation, or promotion – increase the rate of gene transcription downregulation, repression, or suppression – decrease the rate of gene transcription coactivator – a protein that works with transcription factors to increase the rate of gene transcription corepressor – a protein that works with transcription factors to decrease the rate of gene transcription response element – a specific sequence of DNA that a transcription factor binds to

Illustration of an activator

In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence.^[1]^[2] The function of TFs is to regulate - turn on and off - genes in order to make sure that they are expressed in the right cell at the right time and in the right amount throughout the life of the cell and the organism. Groups of TFs function in a coordinated fashion to direct cell division, cell growth, and cell death throughout life; cell migration and organization (body plan) during embryonic development; and intermittently in response to signals from outside the cell, such as a hormone. There are up to 2600 TFs in the human genome.

TFs work alone or with other proteins in a complex, by promoting (as an activator), or blocking (as a repressor) the recruitment of RNA polymerase (the enzyme that performs the transcription of genetic information from DNA to RNA) to specific genes.^[3]^[4]^[5]

A defining feature of TFs is that they contain at least one DNA-binding domain (DBD), which attaches to a specific sequence of DNA adjacent to the genes that they regulate.^[6]^[7] TFs are grouped into classes based on their DBDs.^[8]^[9] Other proteins such as coactivators, chromatin remodelers, histone acetyltransferases, histone deacetylases, kinases, and methylases are also essential to gene regulation, but lack DNA-binding domains, and therefore are not TFs.^[10]

TFs are of interest in medicine because TF mutations can cause specific diseases, and medications can be potentially targeted toward them.

Number

Transcription factors are essential for the regulation of gene expression and are, as a consequence, found in all living organisms. The number of transcription factors found within an organism increases with genome size, and larger genomes tend to have more transcription factors per gene.^[11]

There are approximately 2600 proteins in the human genome that contain DNA-binding domains, and most of these are presumed to function as transcription factors,^[12] though other studies indicate it to be a smaller number.^[13] Therefore, approximately 10% of genes in the genome code for transcription factors, which makes this family the single largest family of human proteins. Furthermore, genes are often flanked by several binding sites for distinct transcription factors, and efficient expression of each of these genes requires the cooperative action of several different transcription factors (see, for example, hepatocyte nuclear factors). Hence, the combinatorial use of a subset of the approximately 2000 human transcription factors easily accounts for the unique regulation of each gene in the human genome during development.^[10]

Mechanism

Transcription factors bind to either enhancer or promoter regions of DNA adjacent to the genes that they regulate. Depending on the transcription factor, the transcription of the adjacent gene is either up- or down-regulated. Transcription factors use a variety of mechanisms for the regulation of gene expression.^[14] These mechanisms include:

stabilize or block the binding of RNA polymerase to DNA
catalyze the acetylation or deacetylation of histone proteins. The transcription factor can either do this directly or recruit other proteins with this catalytic activity. Many transcription factors use one or the other of two opposing mechanisms to regulate transcription:^[15]
- histone acetyltransferase (HAT) activity – acetylates histone proteins, which weakens the association of DNA with histones, which make the DNA more accessible to transcription, thereby up-regulating transcription
- histone deacetylase (HDAC) activity – deacetylates histone proteins, which strengthens the association of DNA with histones, which make the DNA less accessible to transcription, thereby down-regulating transcription
recruit coactivator or corepressor proteins to the transcription factor DNA complex^[16]

Function

Transcription factors are one of the groups of proteins that read and interpret the genetic "blueprint" in the DNA. They bind to the DNA and help initiate a program of increased or decreased gene transcription. As such, they are vital for many important cellular processes. Below are some of the important functions and biological roles transcription factors are involved in:

Basal transcription regulation

In eukaryotes, an important class of transcription factors called general transcription factors (GTFs) are necessary for transcription to occur.^[17]^[18]^[19] Many of these GTFs do not actually bind DNA, but rather are part of the large transcription preinitiation complex that interacts with RNA polymerase directly. The most common GTFs are TFIIA, TFIIB, TFIID (see also TATA binding protein), TFIIE, TFIIF, and TFIIH.^[20] The preinitiation complex binds to promoter regions of DNA upstream to the gene that they regulate.

Differential enhancement of transcription

Other transcription factors differentially regulate the expression of various genes by binding to enhancer regions of DNA adjacent to regulated genes. These transcription factors are critical to making sure that genes are expressed in the right cell at the right time and in the right amount, depending on the changing requirements of the organism.

Development

Many transcription factors in multicellular organisms are involved in development.^[21] Responding to stimuli, these transcription factors turn on/off the transcription of the appropriate genes, which, in turn, allows for changes in cell morphology or activities needed for cell fate determination and cellular differentiation. The Hox transcription factor family, for example, is important for proper body pattern formation in organisms as diverse as fruit flies to humans.^[22]^[23] Another example is the transcription factor encoded by the Sex-determining Region Y (SRY) gene, which plays a major role in determining sex in humans.^[24]

Response to intercellular signals

Cells can communicate with each other by releasing molecules that produce signaling cascades within another receptive cell. If the signal requires upregulation or downregulation of genes in the recipient cell, often transcription factors will be downstream in the signaling cascade.^[25] Estrogen signaling is an example of a fairly short signaling cascade that involves the estrogen receptor transcription factor: Estrogen is secreted by tissues such as the ovaries and placenta, crosses the cell membrane of the recipient cell, and is bound by the estrogen receptor in the cell's cytoplasm. The estrogen receptor then goes to the cell's nucleus and binds to its DNA-binding sites, changing the transcriptional regulation of the associated genes.^[26]

Response to environment

Not only do transcription factors act downstream of signaling cascades related to biological stimuli but they can also be downstream of signaling cascades involved in environmental stimuli. Examples include heat shock factor (HSF), which upregulates genes necessary for survival at higher temperatures,^[27] hypoxia inducible factor (HIF), which upregulates genes necessary for cell survival in low-oxygen environments,^[28] and sterol regulatory element binding protein (SREBP), which helps maintain proper lipid levels in the cell.^[29]

Cell cycle control

Many transcription factors, especially some that are proto-oncogenes or tumor suppressors, help regulate the cell cycle and as such determine how large a cell will get and when it can divide into two daughter cells.^[30]^[31] One example is the Myc oncogene, which has important roles in cell growth and apoptosis.^[32]

Pathogenesis

Transcription factors can also be used to alter gene expression in a host cell to promote pathogenesis. A well studied example of this are the transcription-activator like effectors (TAL effectors) secreted by Xanthomonas bacteria. When injected into plants, these proteins can enter the nucleus of the plant cell, bind plant promoter sequences, and activate transcription of plant genes that aid in bacterial infection.^[33] TAL effectors contain a central repeat region in which there is a simple relationship between the identity of two critical residues in sequential repeats and sequential DNA bases in the TAL effector’s target site.^[34]^[35] This property likely makes it easier for these proteins to evolve in order to better compete with the defense mechanisms of the host cell.^[36]

Regulation

It is common in biology for important processes to have multiple layers of regulation and control. This is also true with transcription factors: Not only do transcription factors control the rates of transcription to regulate the amounts of gene products (RNA and protein) available to the cell but transcription factors themselves are regulated (often by other transcription factors). Below is a brief synopsis of some of the ways that the activity of transcription factors can be regulated:

Synthesis

Transcription factors (like all proteins) are transcribed from a gene on a chromosome into RNA, and then the RNA is translated into protein. Any of these steps can be regulated to affect the production (and thus activity) of a transcription factor. An implication of this is that transcription factors can regulate themselves. For example, in a negative feedback loop, the transcription factor acts as its own repressor: If the transcription factor protein binds the DNA of its own gene, it down-regulates the production of more of itself. This is one mechanism to maintain low levels of a transcription factor in a cell.^[37]

Nuclear localization

In eukaryotes, transcription factors (like most proteins) are transcribed in the nucleus but are then translated in the cell's cytoplasm. Many proteins that are active in the nucleus contain nuclear localization signals that direct them to the nucleus. But, for many transcription factors, this is a key point in their regulation.^[38] Important classes of transcription factors such as some nuclear receptors must first bind a ligand while in the cytoplasm before they can relocate to the nucleus.^[38]

Activation

Transcription factors may be activated (or deactivated) through their signal-sensing domain by a number of mechanisms including:

ligand binding – Not only is ligand binding able to influence where a transcription factor is located within a cell but ligand binding can also affect whether the transcription factor is in an active state and capable of binding DNA or other cofactors (see, for example, nuclear receptors).
phosphorylation^[39]^[40] – Many transcription factors such as STAT proteins must be phosphorylated before they can bind DNA.
interaction with other transcription factors (e.g., homo- or hetero-dimerization) or coregulatory proteins

Accessibility of DNA-binding site

In eukaryotes, DNA is organized with the help of histones into compact particles called nucleosomes, where sequences of about 147 DNA base pairs make ~1.65 turns around histone protein octamers. DNA within nucleosomes is inaccessible to many transcription factors. Some transcription factors, so-called pioneering factors are still able to bind their DNA binding sites on the nucleosomal DNA. For most other transcription factors, the nucleosome should be actively unwound by molecular motors such as chromatin remodelers.^[41] Alternatively, the nucleosome can be partially unwrapped by thermal fluctuations, allowing temporary access to the transcription factor binding site. In many cases, a transcription factor needs to compete for binding to its DNA binding site with other transcription factors and histones or non-histone chromatin proteins.^[42] Pairs of transcription factors and other proteins can play antagonistic roles (activator versus repressor) in the regulation of the same gene.

Availability of other cofactors/transcription factors

Most transcription factors do not work alone. Many large TF families form complex homotypic or heterotypic interactions through dimerization.^[43] For gene transcription to occur, a number of transcription factors must bind to DNA regulatory sequences. This collection of transcription factors, in turn, recruit intermediary proteins such as cofactors that allow efficient recruitment of the preinitiation complex and RNA polymerase. Thus, for a single transcription factor to initiate transcription, all of these other proteins must also be present, and the transcription factor must be in a state where it can bind to them if necessary. Cofactors are proteins that modulate the effects of transcription factors. Cofactors are interchangeable between specific gene promoters; the protein complex that occupies the promoter DNA and the amino acid sequence of the cofactor determine its spatial conformation. For example, certain steroid receptors can exchange cofactors with NF-κB, which is a switch between inflammation and cellular differentiation; thereby steroids can affect the inflammatory response and function of certain tissues.^[44]

Structure

Schematic diagram of the amino acid sequence (amino terminus to the left and carboxylic acid terminus to the right) of a prototypical transcription factor that contains (1) a DNA-binding domain (DBD), (2) signal-sensing domain (SSD), and a transactivation domain (TAD). The order of placement and the number of domains may differ in various types of transcription factors. In addition, the transactivation and signal-sensing functions are frequently contained within the same domain.

Transcription factors are modular in structure and contain the following domains:^[1]

DNA-binding domain (DBD), which attaches to specific sequences of DNA (enhancer or promoter. Necessary component for all vectors. Used to drive transcription of the vector's transgene promoter sequences) adjacent to regulated genes. DNA sequences that bind transcription factors are often referred to as response elements.
Trans-activating domain (TAD), which contains binding sites for other proteins such as transcription coregulators. These binding sites are frequently referred to as activation functions (AFs).^[45]
An optional signal-sensing domain (SSD) (e.g., a ligand binding domain), which senses external signals and, in response, transmits these signals to the rest of the transcription complex, resulting in up- or down-regulation of gene expression. Also, the DBD and signal-sensing domains may reside on separate proteins that associate within the transcription complex to regulate gene expression.

Trans-activating domain

TAD is domain of the transcription factor that binds other proteins such as transcription coregulators. Proteins containing TADs are Gal4, Gcn4, Oaf1, Leu3, Rtg3, Pho4, Gln3 in yeast and p53, NFAT, NF-κB and VP16 in mammals.^[46] Many TADs are as short as 9 amino acids (present in e.g., p53, VP16, MLL, E2A, HSF1, NF-IL6, NFAT1 and NF-κB Gal4, Pdr1, Oaf1, Gcn4, VP16, Pho4, Msn2, Ino2 and P201).

DNA-binding domain

Domain architecture example: Lactose Repressor (LacI). The N-terminal DNA binding domain (labeled) of the lac repressor binds its target DNA sequence (gold) in the major groove using a helix-turn-helix motif. Effector molecule binding (green) occurs in the core domain (labeled), a signal sensing domain. This triggers an allosteric response mediated by the linker region (labeled).

The portion (domain) of the transcription factor that binds DNA is called its DNA-binding domain. Below is a partial list of some of the major families of DNA-binding domains/transcription factors:

Family	InterPro	Pfam	SCOP
basic helix-loop-helix^[47]	InterPro: IPR001092	Pfam PF00010	SCOP 47460
basic-leucine zipper (bZIP)^[48]	InterPro: IPR004827	Pfam PF00170	SCOP 57959
C-terminal effector domain of the bipartite response regulators	InterPro: IPR001789	Pfam PF00072	SCOP 46894
AP2/ERF/GCC box	InterPro: IPR001471	Pfam PF00847	SCOP 54176
helix-turn-helix^[49]
homeodomain proteins, which are encoded by homeobox genes, are transcription factors. Homeodomain proteins play critical roles in the regulation of development.^[50]^[51]	InterPro: IPR009057	Pfam PF00046	SCOP 46689
lambda repressor-like	InterPro: IPR010982		SCOP 47413
srf-like (serum response factor)	InterPro: IPR002100	Pfam PF00319	SCOP 55455
paired box^[52]
winged helix	InterPro: IPR013196	Pfam PF08279	SCOP 46785
zinc fingers^[53]
* multi-domain Cys₂His₂ zinc fingers^[54]	InterPro: IPR007087	Pfam PF00096	SCOP 57667
* Zn₂/Cys₆			SCOP 57701
* Zn₂/Cys₈ nuclear receptor zinc finger	InterPro: IPR001628	Pfam PF00105	SCOP 57716

Response elements

The DNA sequence that a transcription factor binds to is called a transcription factor-binding site or response element.^[55]

Transcription factors interact with their binding sites using a combination of electrostatic (of which hydrogen bonds are a special case) and Van der Waals forces. Due to the nature of these chemical interactions, most transcription factors bind DNA in a sequence specific manner. However, not all bases in the transcription factor-binding site may actually interact with the transcription factor. In addition, some of these interactions may be weaker than others. Thus, transcription factors do not bind just one sequence but are capable of binding a subset of closely related sequences, each with a different strength of interaction.

For example, although the consensus binding site for the TATA-binding protein (TBP) is TATAAAA, the TBP transcription factor can also bind similar sequences such as TATATAT or TATATAA.

Because transcription factors can bind a set of related sequences and these sequences tend to be short, potential transcription factor binding sites can occur by chance if the DNA sequence is long enough. It is unlikely, however, that a transcription factor will bind all compatible sequences in the genome of the cell. Other constraints, such as DNA accessibility in the cell or availability of cofactors may also help dictate where a transcription factor will actually bind. Thus, given the genome sequence it is still difficult to predict where a transcription factor will actually bind in a living cell.

Additional recognition specificity, however, may be obtained through the use of more than one DNA-binding domain (for example tandem DBDs in the same transcription factor or through dimerization of two transcription factors) that bind to two or more adjacent sequences of DNA.

Clinical significance

Transcription factors are of clinical significance for at least two reasons: (1) mutations can be associated with specific diseases, and (2) they can be targets of medications.

Disorders

Due to their important roles in development, intercellular signaling, and cell cycle, some human diseases have been associated with mutations in transcription factors.^[56]

Many transcription factors are either tumor suppressors or oncogenes, and, thus, mutations or aberrant regulation of them is associated with cancer. Three groups of transcription factors are known to be important in human cancer: (1) the NF-kappaB and AP-1 families, (2) the STAT family and (3) the steroid receptors.^[57]

Below are a few of the more well-studied examples:

Condition	Description	Locus
Rett syndrome	Mutations in the MECP2 transcription factor are associated with Rett syndrome, a neurodevelopmental disorder.^[58]^[59]	Xq28
Diabetes	A rare form of diabetes called MODY (Maturity onset diabetes of the young) can be caused by mutations in hepatocyte nuclear factors (HNFs)^[60] or insulin promoter factor-1 (IPF1/Pdx1).^[61]	multiple
Developmental verbal dyspraxia	Mutations in the FOXP2 transcription factor are associated with developmental verbal dyspraxia, a disease in which individuals are unable to produce the finely coordinated movements required for speech.^[62]	7q31
Autoimmune diseases	Mutations in the FOXP3 transcription factor cause a rare form of autoimmune disease called IPEX.^[63]	Xp11.23-q13.3
Li-Fraumeni syndrome	Caused by mutations in the tumor suppressor p53.^[64]	17p13.1
Breast cancer	The STAT family is relevant to breast cancer.^[65]	multiple
Multiple cancers	The HOX family are involved in a variety of cancers.^[66]	multiple

Potential drug targets

Approximately 10% of currently prescribed drugs directly target the nuclear receptor class of transcription factors.^[67] Examples include tamoxifen and bicalutamide for the treatment of breast and prostate cancer, respectively, and various types of anti-inflammatory and anabolic steroids.^[68] In addition, transcription factors are often indirectly modulated by drugs through signaling cascades. It might be possible to directly target other less-explored transcription factors such as NF-κB with drugs.^[69]^[70]^[71]^[72] Transcription factors outside the nuclear receptor family are thought to be more difficult to target with small molecule therapeutics since it is not clear that they are "drugable" but progress has been made on Pax2^[73] ^[74] and the notch pathway.^[75]

Role in evolution

Gene duplications have played a crucial role in the evolution of species. This applies particularly to transcription factors. Once they occur as duplicates, accumulated mutations encoding for one copy can take place without negatively affecting the regulation of downstream targets. However, changes of the DNA binding specificities of the single-copy LEAFY transcription factor, which occurs in most land plants, have recently been elucidated. In that respect, a single-copy transcription factor can undergo a change of specificity through a promiscuous intermediate without losing function. Similar mechanisms have been proposed in the context of all alternative phylogenetic hypotheses, and the role of transcription factors in the evolution of all species.^[76]^[77]

Analysis

There are different technologies available to analyze transcription factors. On the genomic level, DNA-sequencing^[78] and database research are commonly used^[79] The protein version of the transcription factor is detectable by using specific antibodies. The sample is detected on a western blot. By using electrophoretic mobility shift assay (EMSA),^[80] the activation profile of transcription factors can be detected. A multiplex approach for activation profiling is a TF chip system where several different transcription factors can be detected in parallel. This technology is based on DNA microarrays, providing the specific DNA-binding sequence for the transcription factor protein on the array surface.^[81]

Classes

As described in more detail below, transcription factors may be classified by their (1) mechanism of action, (2) regulatory function, or (3) sequence homology (and hence structural similarity) in their DNA-binding domains.

Mechanistic

There are two mechanistic classes of transcription factors:

General transcription factors are involved in the formation of a preinitiation complex. The most common are abbreviated as TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH. They are ubiquitous and interact with the core promoter region surrounding the transcription start site(s) of all class II genes.^[82]
Upstream transcription factors are proteins that bind somewhere upstream of the initiation site to stimulate or repress transcription. These are roughly synonymous with specific transcription factors, because they vary considerably depending on what recognition sequences are present in the proximity of the gene.^[83]

Examples of specific transcription factors^[83]
Factor	Structural type	Recognition sequence	Binds as
SP1	Zinc finger	5'-GGGCGG-3'	Monomer
AP-1	Basic zipper	5'-TGA(G/C)TCA-3'	Dimer
C/EBP	Basic zipper	5'-ATTGCGCAAT-3'	Dimer
Heat shock factor	Basic zipper	5'-XGAAX-3'	Trimer
ATF/CREB	Basic zipper	5'-TGACGTCA-3'	Dimer
c-Myc	Basic helix-loop-helix	5'-CACGTG-3'	Dimer
Oct-1	Helix-turn-helix	5'-ATGCAAAT-3'	Monomer
NF-1	Novel	5'-TTGGCXXXXXGCCAA-3'	Dimer
(G/C) = G or C X = A, T, G or C

Functional

Transcription factors have been classified according to their regulatory function:^[10]

I. constitutively active – present in all cells at all times – general transcription factors, Sp1, NF1, CCAAT
II. conditionally active – requires activation
- II.A developmental (cell specific) – expression is tightly controlled, but, once expressed, require no additional activation – GATA, HNF, PIT-1, MyoD, Myf5, Hox, Winged Helix
- II.B signal-dependent – requires external signal for activation
  - II.B.1 extracellular ligand (endocrine or paracrine)-dependent – nuclear receptors
  - II.B.2 intracellular ligand (autocrine)-dependent - activated by small intracellular molecules – SREBP, p53, orphan nuclear receptors
  - II.B.3 cell membrane receptor-dependent – second messenger signaling cascades resulting in the phosphorylation of the transcription factor
    - II.B.3.a resident nuclear factors – reside in the nucleus regardless of activation state – CREB, AP-1, Mef2
    - II.B.3.b latent cytoplasmic factors – inactive form reside in the cytoplasm, but, when activated, are translocated into the nucleus – STAT, R-SMAD, NF-κB, Notch, TUBBY, NFAT

Structural

Transcription factors are often classified based on the sequence similarity and hence the tertiary structure of their DNA-binding domains:^[84]^[9]^[85]^[8]

1 Superclass: Basic Domains
- 1.1 Class: Leucine zipper factors (bZIP)
  - 1.1.1 Family: AP-1(-like) components; includes (c-Fos/c-Jun)
  - 1.1.2 Family: CREB
  - 1.1.3 Family: C/EBP-like factors
  - 1.1.4 Family: bZIP / PAR
  - 1.1.5 Family: Plant G-box binding factors
  - 1.1.6 Family: ZIP only
- 1.2 Class: Helix-loop-helix factors (bHLH)
  - 1.2.1 Family: Ubiquitous (class A) factors
  - 1.2.2 Family: Myogenic transcription factors (MyoD)
  - 1.2.3 Family: Achaete-Scute
  - 1.2.4 Family: Tal/Twist/Atonal/Hen
- 1.3 Class: Helix-loop-helix / leucine zipper factors (bHLH-ZIP)
  - 1.3.1 Family: Ubiquitous bHLH-ZIP factors; includes USF (USF1, USF2); SREBP (SREBP)
  - 1.3.2 Family: Cell-cycle controlling factors; includes c-Myc
- 1.4 Class: NF-1
  - 1.4.1 Family: NF-1 (A, B, C, X)
- 1.5 Class: RF-X
  - 1.5.1 Family: RF-X (1, 2, 3, 4, 5, ANK)
- 1.6 Class: bHSH
2 Superclass: Zinc-coordinating DNA-binding domains
- 2.1 Class: Cys4 zinc finger of nuclear receptor type
  - 2.1.1 Family: Steroid hormone receptors
  - 2.1.2 Family: Thyroid hormone receptor-like factors
- 2.2 Class: diverse Cys4 zinc fingers
  - 2.2.1 Family: GATA-Factors
- 2.3 Class: Cys2His2 zinc finger domain
  - 2.3.1 Family: Ubiquitous factors, includes TFIIIA, Sp1
  - 2.3.2 Family: Developmental / cell cycle regulators; includes Krüppel
  - 2.3.4 Family: Large factors with NF-6B-like binding properties
- 2.4 Class: Cys6 cysteine-zinc cluster
- 2.5 Class: Zinc fingers of alternating composition
3 Superclass: Helix-turn-helix
- 3.1 Class: Homeo domain
  - 3.1.1 Family: Homeo domain only; includes Ubx
  - 3.1.2 Family: POU domain factors; includes Oct
  - 3.1.3 Family: Homeo domain with LIM region
  - 3.1.4 Family: homeo domain plus zinc finger motifs
- 3.2 Class: Paired box
  - 3.2.1 Family: Paired plus homeo domain
  - 3.2.2 Family: Paired domain only
- 3.3 Class: Fork head / winged helix
  - 3.3.1 Family: Developmental regulators; includes forkhead
  - 3.3.2 Family: Tissue-specific regulators
  - 3.3.3 Family: Cell-cycle controlling factors
  - 3.3.0 Family: Other regulators
- 3.4 Class: Heat Shock Factors
  - 3.4.1 Family: HSF
- 3.5 Class: Tryptophan clusters
  - 3.5.1 Family: Myb
  - 3.5.2 Family: Ets-type
  - 3.5.3 Family: Interferon regulatory factors
- 3.6 Class: TEA ( transcriptional enhancer factor) domain
  - 3.6.1 Family: TEA (TEAD1, TEAD2, TEAD3, TEAD4)
4 Superclass: beta-Scaffold Factors with Minor Groove Contacts
- 4.1 Class: RHR (Rel homology region)
  - 4.1.1 Family: Rel/ankyrin; NF-kappaB
  - 4.1.2 Family: ankyrin only
  - 4.1.3 Family: NFAT (Nuclear Factor of Activated T-cells) (NFATC1, NFATC2, NFATC3)
- 4.2 Class: STAT
  - 4.2.1 Family: STAT
- 4.3 Class: p53
  - 4.3.1 Family: p53
- 4.4 Class: MADS box
  - 4.4.1 Family: Regulators of differentiation; includes (Mef2)
  - 4.4.2 Family: Responders to external signals, SRF (serum response factor) (SRF)
  - 4.4.3 Family: Metabolic regulators (ARG80)
- 4.5 Class: beta-Barrel alpha-helix transcription factors
- 4.6 Class: TATA binding proteins
  - 4.6.1 Family: TBP
- 4.7 Class: HMG-box
  - 4.7.1 Family: SOX genes, SRY
  - 4.7.2 Family: TCF-1 (TCF1)
  - 4.7.3 Family: HMG2-related, SSRP1
  - 4.7.4 Family: UBF
  - 4.7.5 Family: MATA
- 4.8 Class: Heteromeric CCAAT factors
  - 4.8.1 Family: Heteromeric CCAAT factors
- 4.9 Class: Grainyhead
  - 4.9.1 Family: Grainyhead
- 4.10 Class: Cold-shock domain factors
  - 4.10.1 Family: csd
- 4.11 Class: Runt
  - 4.11.1 Family: Runt
0 Superclass: Other Transcription Factors
- 0.1 Class: Copper fist proteins
- 0.2 Class: HMGI(Y) (HMGA1)
  - 0.2.1 Family: HMGI(Y)
- 0.3 Class: Pocket domain
- 0.4 Class: E1A-like factors
- 0.5 Class: AP2/EREBP-related factors
  - 0.5.1 Family: AP2
  - 0.5.2 Family: EREBP
  - 0.5.3 Superfamily: AP2/B3
    - 0.5.3.1 Family: ARF
    - 0.5.3.2 Family: ABI
    - 0.5.3.3 Family: RAV

Search This Blog