Search This Blog

Tuesday, December 18, 2018

Transposable element

From Wikipedia, the free encyclopedia
A bacterial DNA transposon

A transposable element (TE or transposon) is a DNA sequence that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transposition often results in duplication of the same genetic material. Barbara McClintock's discovery of these jumping genes earned her a Nobel Prize in 1983.

Transposable elements make up a large fraction of the genome and are responsible for much of the mass of DNA in a eukaryotic cell. It has been shown that TEs are important in genome function and evolution. In Oxytricha, which has a unique genetic system, these elements play a critical role in development. Transposons are also very useful to researchers as a means to alter DNA inside a living organism. 

There are at least two classes of TEs: Class I TEs or retrotransposons generally function via reverse transcription, while Class II TEs or DNA transposons encode the protein transposase, which they require for insertion and excision, and some of these TEs also encode other proteins.

Discovery

Barbara McClintock discovered the first TEs in maize (Zea mays) at the Cold Spring Harbor Laboratory in New York. McClintock was experimenting with maize plants that had broken chromosomes.

In the winter of 1944–1945, McClintock planted corn kernels that were self-pollinated, meaning that the silk (style) of the flower received pollen from its own anther. These kernels came from a long line of plants that had been self-pollinated, causing broken arms on the end of their ninth chromosomes. As the maize plants began to grow, McClintock noted unusual color patterns on the leaves. For example, one leaf had two albino patches of almost identical size, located side by side on the leaf. McClintock hypothesized that during cell division certain cells lost genetic material, while others gained what they had lost. However, when comparing the chromosomes of the current generation of plants with the parent generation, she found certain parts of the chromosome had switched position. This refuted the popular genetic theory of the time that genes were fixed in their position on a chromosome. McClintock found that genes could not only move, but they could also be turned on or off due to certain environmental conditions or during different stages of cell development.

McClintock also showed that gene mutations could be reversed. She presented her report on her findings in 1951, and published an article on her discoveries in Genetics in November 1953 entitled "Induction of Instability at Selected Loci in Maize".

Her work was largely dismissed and ignored until the late 1960s–1970s when, after TEs were found in bacteria, it was rediscovered. She was awarded a Nobel Prize in Physiology or Medicine in 1983 for her discovery of TEs, more than thirty years after her initial research.

Approximately 90% of the maize genome is made up of TEs, as is 44% of the human genome.

Classification

Transposable elements represent one of several types of mobile genetic elements. TEs are assigned to one of two classes according to their mechanism of transposition, which can be described as either copy and paste (Class I TEs) or cut and paste (Class II TEs).

Class I (retrotransposons)

Class I TEs are copied in two stages: first, they are transcribed from DNA to RNA, and the RNA produced is then reverse transcribed to DNA. This copied DNA is then inserted back into the genome at a new position. The reverse transcription step is catalyzed by a reverse transcriptase, which is often encoded by the TE itself. The characteristics of retrotransposons are similar to retroviruses, such as HIV

Retrotransposons are commonly grouped into three main orders:
  1. Retrotransposons, with long terminal repeats (LTRs), which encode reverse transcriptase, similar to retroviruses
  2. Retroposons, Long interspersed nuclear elements (LINEs, LINE-1s, or L1s), which encode reverse transcriptase but lack LTRs, and are transcribed by RNA polymerase II
  3. Short interspersed nuclear elements (SINEs) do not encode reverse transcriptase and are transcribed by RNA polymerase III
(Retroviruses can also be considered TEs. For example, after conversion of retroviral RNA into DNA inside a host cell, the newly produced retroviral DNA is integrated into the genome of the host cell. These integrated DNAs are termed proviruses. The provirus is a specialized form of eukaryotic retrotransposon, which can produce RNA intermediates that may leave the host cell and infect other cells. The transposition cycle of retroviruses has similarities to that of prokaryotic TEs, suggesting a distant relationship between the two.) 

A.Structure of DNA transposons (Mariner type). Two inverted tandem repeats (TIR) flank the transposase gene. Two short tandem site duplications (TSD) are present on both sides of the insert. B. Mechanism of transposition: Two transposases recognize and bind to TIR sequences, join together and promote DNA double-strand cleavage. The DNA-transposase complex then inserts its DNA cargo at specific DNA motifs elsewhere in the genome, creating short TSDs upon integration.

Class II (DNA transposons)

The cut-and-paste transposition mechanism of class II TEs does not involve an RNA intermediate. The transpositions are catalyzed by several transposase enzymes. Some transposases non-specifically bind to any target site in DNA, whereas others bind to specific target sequences. The transposase makes a staggered cut at the target site producing sticky ends, cuts out the DNA transposon and ligates it into the target site. A DNA polymerase fills in the resulting gaps from the sticky ends and DNA ligase closes the sugar-phosphate backbone. This results in target site duplication and the insertion sites of DNA transposons may be identified by short direct repeats (a staggered cut in the target DNA filled by DNA polymerase) followed by inverted repeats (which are important for the TE excision by transposase). 

Cut-and-paste TEs may be duplicated if their transposition takes place during S phase of the cell cycle, when a donor site has already been replicated but a target site has not yet been replicated. Such duplications at the target site can result in gene duplication, which plays an important role in genomic evolution.

Not all DNA transposons transpose through the cut-and-paste mechanism. In some cases, a replicative transposition is observed in which a transposon replicates itself to a new target site (e.g. helitron). 

Class II TEs comprise less than 2% of the human genome, making the rest Class I.

Autonomous and non-autonomous

Transposition can be classified as either "autonomous" or "non-autonomous" in both Class I and Class II TEs. Autonomous TEs can move by themselves, whereas non-autonomous TEs require the presence of another TE to move. This is often because dependent TEs lack transposase (for Class II) or reverse transcriptase (for Class I). 

Activator element (Ac) is an example of an autonomous TE, and dissociation elements (Ds) is an example of a non-autonomous TE. Without Ac, Ds is not able to transpose.

Examples

  • The first TEs were discovered in maize (Zea mays) by Barbara McClintock in 1948, for which she was later awarded a Nobel Prize. She noticed chromosomal insertions, deletions, and translocations caused by these elements. These changes in the genome could, for example, lead to a change in the color of corn kernels. About 85% of the maize genome consists of TEs. The Ac/Ds system described by McClintock are Class II TEs. Transposition of Ac in tobacco has been demonstrated by B. Baker (Plant Transposable Elements, pp 161–174, 1988, Plenum Publishing Corp., ed. Nelson).
  • One family of TEs in the fruit fly Drosophila melanogaster are called P elements. They seem to have first appeared in the species only in the middle of the twentieth century; within the last 50 years, they spread through every population of the species. Gerald M. Rubin and Allan C. Spradling pioneered technology to use artificial P elements to insert genes into Drosophila by injecting the embryo.
  • Transposons in bacteria usually carry an additional gene for functions other than transposition, often for antibiotic resistance. In bacteria, transposons can jump from chromosomal DNA to plasmid DNA and back, allowing for the transfer and permanent addition of genes such as those encoding antibiotic resistance (multi-antibiotic resistant bacterial strains can be generated in this way). Bacterial transposons of this type belong to the Tn family. When the transposable elements lack additional genes, they are known as insertion sequences.
  • The most common transposable element in humans is the Alu sequence. It is approximately 300 bases long and can be found between 300,000 and one million times in the human genome. Alu alone is estimated to make up 15–17% of the human genome.
  • Mariner-like elements are another prominent class of transposons found in multiple species, including humans. The Mariner transposon was first discovered by Jacobson and Hartl in Drosophila. This Class II transposable element is known for its uncanny ability to be transmitted horizontally in many species. There are an estimated 14,000 copies of Mariner in the human genome comprising 2.6 million base pairs. The first mariner-element transposons outside of animals were found in Trichomonas vaginalis. These characteristics of the Mariner transposon inspired the science fiction novel The Mariner Project by Bob Marr.
  • Mu phage transposition is the best-known example of replicative transposition.
  • Yeast (Saccharomyces cerevisiae) genomes contain five distinct retrotransposon families: Ty1, Ty2, Ty3, Ty4 and Ty5.
  • A helitron is a TE found in eukaryotes that is thought to replicate by a rolling-circle mechanism.
  • In human embryos, two types of transposons combined to form noncoding RNA that catalyzes the development of stem cells. During the early stages of a fetus's growth, the embryo's inner cell mass expands as these stem cells enumerate. The increase of this type of cells is crucial, since stem cells later change form and give rise to all the cells in the body.
  • In peppered moths, a transposon in a gene called cortex caused the moths' wings to turn completely black. This change in coloration helped moths to blend in with ash and soot-covered areas during the Industrial Revolution.

In disease

TEs are mutagens and their movements are often the causes of genetic disease. They can damage the genome of their host cell in different ways:
  • a transposon or a retrotransposon that inserts itself into a functional gene will most likely disable that gene;
  • after a DNA transposon leaves a gene, the resulting gap will probably not be repaired correctly;
  • multiple copies of the same sequence, such as Alu sequences, can hinder precise chromosomal pairing during mitosis and meiosis, resulting in unequal crossovers, one of the main reasons for chromosome duplication.
Diseases often caused by TEs include hemophilia A and B, severe combined immunodeficiency, porphyria, predisposition to cancer, and Duchenne muscular dystrophy. LINE1 (L1) TEs that land on the human Factor VIII have been shown to cause haemophilia and insertion of L1 into the APC gene causes colon cancer, confirming that TEs play an important role in disease development. Transposable element dysregulation can cause neuronal death in Alzheimer's disease and similar tauopathies.

Additionally, many TEs contain promoters which drive transcription of their own transposase. These promoters can cause aberrant expression of linked genes, causing disease or mutant phenotypes.

Rate of transposition, induction and defense

One study estimated the rate of transposition of a particular retrotransposon, the Ty1 element in Saccharomyces cerevisiae. Using several assumptions, the rate of successful transposition event per single Ty1 element came out to be about once every few months to once every few years. Some TEs contain heat-shock like promoters and their rate of transposition increases if the cell is subjected to stress, thus increasing the mutation rate under these conditions, which might be beneficial to the cell.
Cells defend against the proliferation of TEs in a number of ways. These include piRNAs and siRNAs, which silence TEs after they have been transcribed. 

If organisms are mostly composed of TEs, one might assume that disease caused by misplaced TEs is very common, but in most cases TEs are silenced through epigenetic mechanisms like DNA methylation, chromatin remodeling and piRNA, such that little to no phenotypic effects nor movements of TEs occur as in some wild-type plant TEs. Certain mutated plants have been found to have defects in methylation-related enzymes (methyl transferase) which cause the transcription of TEs, thus affecting the phenotype.

One hypothesis suggests that only approximately 100 LINE1 related sequences are active, despite their sequences making up 17% of the human genome. In human cells, silencing of LINE1 sequences is triggered by an RNA interference (RNAi) mechanism. Surprisingly, the RNAi sequences are derived from the 5' untranslated region (UTR) of the LINE1, a long terminal which repeats itself. Supposedly, the 5' LINE1 UTR that codes for the sense promoter for LINE1 transcription also encodes the antisense promoter for the miRNA that becomes the substrate for siRNA production. Inhibition of the RNAi silencing mechanism in this region showed an increase in LINE1 transcription.

Evolution

TEs are found in almost all life forms, and the scientific community is still exploring their evolution and their effect on genome evolution. It is unclear whether TEs originated in the last universal common ancestor, arose independently multiple times, or arose once and then spread to other kingdoms by horizontal gene transfer. While some TEs confer benefits on their hosts, most are regarded as selfish DNA parasites. In this way, they are similar to viruses. Various viruses and TEs also share features in their genome structures and biochemical abilities, leading to speculation that they share a common ancestor.

Because excessive TE activity can damage exons, many organisms have acquired mechanisms to inhibit their activity. Bacteria may undergo high rates of gene deletion as part of a mechanism to remove TEs and viruses from their genomes, while eukaryotic organisms typically use RNA interference to inhibit TE activity. Nevertheless, some TEs generate large families often associated with speciation events. Evolution often deactivates DNA transposons, leaving them as introns (inactive gene sequences). In vertebrate animal cells, nearly all 100,000+ DNA transposons per genome have genes that encode inactive transposase polypeptides. In humans, all Tc1-like transposons are inactive. The first synthetic transposon designed for use in vertebrate cells, the Sleeping Beauty transposon system, is a Tc1/mariner-like transposon. It exists in the human genome as an intron and was activated through reconstruction.

Large quantities of TEs within genomes may still present evolutionary advantages, however. Interspersed repeats within genomes are created by transposition events accumulating over evolutionary time. Because interspersed repeats block gene conversion, they protect novel gene sequences from being overwritten by similar gene sequences and thereby facilitate the development of new genes. TEs may also have been co-opted by the vertebrate immune system as a means of producing antibody diversity. The V(D)J recombination system operates by a mechanism similar to that of some TEs. 

TEs can contain many types of genes, including those conferring antibiotic resistance and ability to transpose to conjugative plasmids. Some TEs also contain integrons, genetic elements that can capture and express genes from other sources. These contain integrase, which can integrate gene cassettes. There are over 40 antibiotic resistance genes identified on cassettes, as well as virulence genes. 

Transposons do not always excise their elements precisely, sometimes removing the adjacent base pairs; this phenomenon is called exon shuffling. Shuffling two unrelated exons can create a novel gene product or, more likely, an intron.

Applications

The first TE was discovered in maize (Zea mays) and is named dissociator (Ds). Likewise, the first TE to be molecularly isolated was from a plant (snapdragon). Appropriately, TEs have been an especially useful tool in plant molecular biology. Researchers use them as a means of mutagenesis. In this context, a TE jumps into a gene and produces a mutation. The presence of such a TE provides a straightforward means of identifying the mutant allele relative to chemical mutagenesis methods. 

Sometimes the insertion of a TE into a gene can disrupt that gene's function in a reversible manner, in a process called insertional mutagenesis; transposase-mediated excision of the DNA transposon restores gene function. This produces plants in which neighboring cells have different genotypes. This feature allows researchers to distinguish between genes that must be present inside of a cell in order to function (cell-autonomous) and genes that produce observable effects in cells other than those where the gene is expressed. 

TEs are also a widely used tool for mutagenesis of most experimentally tractable organisms. The Sleeping Beauty transposon system has been used extensively as an insertional tag for identifying cancer genes.

The Tc1/mariner-class of TEs Sleeping Beauty transposon system, awarded Molecule of the Year in 2009, is active in mammalian cells and is being investigated for use in human gene therapy.

TEs are used for the reconstruction of phylogenies by the means of presence/absence analyses.

De novo repeat identification

De novo repeat identification is an initial scan of sequence data that seeks to find the repetitive regions of the genome, and to classify these repeats. Many computer programs exist to perform de novo repeat identification, all operating under the same general principles. As short tandem repeats are generally 1–6 base pairs in length and are often consecutive, their identification is relatively simple. Dispersed repetitive elements, on the other hand, are more challenging to identify, due to the fact that they are longer and have often acquired mutations. However, it is important to identify these repeats as they are often found to be transposable elements (TEs).

De novo identification of transposons involves three steps: 1) find all repeats within the genome, 2) build a consensus of each family of sequences, and 3) classify these repeats. There are three groups of algorithms for the first step. One group is referred to as the k-mer approach, where a k-mer is a sequence of length k. In this approach, the genome is scanned for overrepresented k-mers; that is, k-mers that occur more often than is likely based on probability alone. The length k is determined by the type of transposon being searched for. The k-mer approach also allows mismatches, the number of which is determined by the analyst. Some k-mer approach programs use the k-mer as a base, and extend both ends of each repeated k-mer until there is no more similarity between them, indicating the ends of the repeats. Another group of algorithms employs a method called sequence self-comparison. Sequence self-comparison programs use databases such as AB-BLAST to conduct an initial sequence alignment. As these programs find groups of elements that partially overlap, they are useful for finding highly diverged transposons, or transposons with only a small region copied into other parts of the genome. Another group of algorithms follows the periodicity approach. These algorithms perform a Fourier transformation on the sequence data, identifying periodicities, regions that are repeated periodically, and are able to use peaks in the resultant spectrum to find candidate repetitive elements. This method works best for tandem repeats, but can be used for dispersed repeats as well. However, it is a slow process, making it an unlikely choice for genome scale analysis.

The second step of de novo repeat identification involves building a consensus of each family of sequences. A consensus sequence is a sequence that is created based on the repeats that comprise a TE family. A base pair in a consensus is the one that occurred most often in the sequences being compared to make the consensus. For example, in a family of 50 repeats where 42 have a T base pair in the same position, the consensus sequence would have a T at this position as well, as the base pair is representative of the family as a whole at that particular position, and is most likely the base pair found in the family's ancestor at that position. Once a consensus sequence has been made for each family, it is then possible to move on to further analysis, such as TE classification and genome masking in order to quantify the overall TE content of the genome.

Adaptive TEs

Transposable elements have been recognized as good candidates for stimulating gene adaptation, through their ability to regulate the expression levels of nearby genes. Combined with their "mobility", transposable elements can be relocated adjacent to their targeted genes, and control the expression levels of the gene, dependent upon the circumstances. 

The study conducted in 2008, "High Rate of Recent Transposable Element–Induced Adaptation in Drosophila melanogaster", used D. melanogaster that had recently migrated from Africa to other parts of the world, as a basis for studying adaptations caused by transposable elements. Although most of the TEs were located on introns, the experiment showed the significant difference on gene expressions between the population in Africa and other parts of the world. The four TEs that caused the selective sweep were more prevalent in D. melanogaster from temperate climates, leading the researchers to conclude that the selective pressures of the climate prompted genetic adaptation. From this experiment, it has been confirmed that adaptive TEs are prevalent in nature, by enabling organisms to adapt gene expression as a result of new selective pressures. 

However, not all effects of adaptive TEs are beneficial to the population. In the research conducted in 2009, "A Recent Adaptive Transposable Element Insertion Near Highly Conserved Developmental Loci in Drosophila melanogaster", a TE, inserted between Jheh 2 and Jheh 3, revealed a downgrade in the expression level of both of the genes. Down regulation of such genes has caused Drosophila to exhibit extended developmental time and reduced egg to adult viability. Although this adaptation was observed in high frequency in all non-African populations, it was not fixed in any of them. This is not hard to believe, since it is logical for a population to favor higher egg to adult viability, therefore trying to purge the trait caused by this specific TE adaptation. 

At the same time, there have been several reports showing the advantageous adaptation caused by TEs. In the research done with silkworms, "An Adaptive Transposable Element insertion in the Regulatory Region of the EO Gene in the Domesticated Silkworm", a TE insertion was observed in the cis-regulatory region of the EO gene, which regulates molting hormone 20E, and enhanced expression was recorded. While populations without the TE insert are often unable to effectively regulate hormone 20E under starvation conditions, those with the insert had a more stable development, which resulted in higher developmental uniformity.

These three experiments all demonstrated different ways in which TE insertions can be advantageous or disadvantageous, through means of regulating the expression level of adjacent genes. The field of adaptive TE research is still under development and more findings can be expected in the future.

Pseudogene

From Wikipedia, the free encyclopedia

Mechanism of classical and processed pseudogene formation
 
Pseudogenes, sometimes referred to as zombie genes in the media, are segments of DNA that are related to real genes. Pseudogenes have lost at least some functionality, relative to the complete gene, in cellular gene expression or protein-coding ability. Pseudogenes often result from the accumulation of multiple mutations within a gene whose product is not required for the survival of the organism, but can also be caused by genomic copy number variation (CNV) where segments of 1+ kb are duplicated or deleted. Although not fully functional, pseudogenes may be functional, similar to other kinds of noncoding DNA, which can perform regulatory functions. The "pseudo" in "pseudogene" implies a variation in sequence relative to the parent coding gene, but does not necessarily indicate pseudo-function. Despite being non-coding, many pseudogenes have important roles in normal physiology and abnormal pathology.

Although some pseudogenes do not have introns or promoters (such pseudogenes are copied from messenger RNA and incorporated into the chromosome, and are called "processed pseudogenes"), others have some gene-like features such as promoters, CpG islands, and splice sites. They are different from normal genes due to either a lack of protein-coding ability resulting from a variety of disabling mutations (e.g. premature stop codons or frameshifts), a lack of transcription, or their inability to encode RNA (such as with ribosomal RNA pseudogenes). The term "pseudogene" was coined in 1977 by Jacq et al. 

Because pseudogenes were initially thought of as the last stop for genomic material that could be removed from the genome, they were often labeled as junk DNA. Nonetheless, pseudogenes contain biological and evolutionary histories within their sequences. This is due to a pseudogene's shared ancestry with a functional gene: in the same way that Darwin thought of two species as possibly having a shared common ancestry followed by millions of years of evolutionary divergence, a pseudogene and its associated functional gene also share a common ancestor and have diverged as separate genetic entities over millions of years.

Properties

Pseudogenes are usually characterized by a combination of homology to a known gene and loss of some functionality. That is, although every pseudogene has a DNA sequence that is similar to some functional gene, they are usually unable to produce functional final protein products. Pseudogenes are sometimes difficult to identify and characterize in genomes, because the two requirements of homology and loss of functionality are usually implied through sequence alignments rather than biologically proven.
  1. Homology is implied by sequence identity between the DNA sequences of the pseudogene and parent gene. After aligning the two sequences, the percentage of identical base pairs is computed. A high sequence identity means that it is highly likely that these two sequences diverged from a common ancestral sequence (are homologous), and highly unlikely that these two sequences have evolved independently.
  2. Nonfunctionality can manifest itself in many ways. Normally, a gene must go through several steps to a fully functional protein: Transcription, pre-mRNA processing, translation, and protein folding are all required parts of this process. If any of these steps fails, then the sequence may be considered nonfunctional. In high-throughput pseudogene identification, the most commonly identified disablements are premature stop codons and frameshifts, which almost universally prevent the translation of a functional protein product.
Pseudogenes for RNA genes are usually more difficult to discover as they do not need to be translated and thus do not have "reading frames". 

Pseudogenes can complicate molecular genetic studies. For example, amplification of a gene by PCR may simultaneously amplify a pseudogene that shares similar sequences. This is known as PCR bias or amplification bias. Similarly, pseudogenes are sometimes annotated as genes in genome sequences.
Processed pseudogenes often pose a problem for gene prediction programs, often being misidentified as real genes or exons. It has been proposed that identification of processed pseudogenes can help improve the accuracy of gene prediction methods.

Recently 140 human pseudogenes have been shown to be translated. However, the function, if any, of the protein products is unknown.

Types and origin

There are four main types of pseudogenes, all with distinct mechanisms of origin and characteristic features. The classifications of pseudogenes are as follows:

Processed

Processed pseudogene production

Processed (or retrotransposed) pseudogenes. In higher eukaryotes, particularly mammals, retrotransposition is a fairly common event that has had a huge impact on the composition of the genome. For example, somewhere between 30–44% of the human genome consists of repetitive elements such as SINEs and LINEs. In the process of retrotransposition, a portion of the mRNA or hnRNA transcript of a gene is spontaneously reverse transcribed back into DNA and inserted into chromosomal DNA. Although retrotransposons usually create copies of themselves, it has been shown in an in vitro system that they can create retrotransposed copies of random genes, too. Once these pseudogenes are inserted back into the genome, they usually contain a poly-A tail, and usually have had their introns spliced out; these are both hallmark features of cDNAs. However, because they are derived from an RNA product, processed pseudogenes also lack the upstream promoters of normal genes; thus, they are considered "dead on arrival", becoming non-functional pseudogenes immediately upon the retrotransposition event. However, these insertions occasionally contribute exons to existing genes, usually via alternatively spliced transcripts. A further characteristic of processed pseudogenes is common truncation of the 5' end relative to the parent sequence, which is a result of the relatively non-processive retrotransposition mechanism that creates processed pseudogenes. Processed pseudogenes are continually being created in primates. Human populations, for example, have distinct sets of processed pseudogenes across its individuals.

Non-processed

One way a pseudogene may arise
 
Non-processed (or duplicated) pseudogenes. Gene duplication is another common and important process in the evolution of genomes. A copy of a functional gene may arise as a result of a gene duplication event caused by homologous recombination at, for example, repetitive sine sequences on misaligned chromosomes and subsequently acquire mutations that cause the copy to lose the original gene's function. Duplicated pseudogenes usually have all the same characteristics as genes, including an intact exon-intron structure and regulatory sequences. The loss of a duplicated gene's functionality usually has little effect on an organism's fitness, since an intact functional copy still exists. According to some evolutionary models, shared duplicated pseudogenes indicate the evolutionary relatedness of humans and the other primates. If pseudogenization is due to gene duplication, it usually occurs in the first few million years after the gene duplication, provided the gene has not been subjected to any selection pressure. Gene duplication generates functional redundancy and it is not normally advantageous to carry two identical genes. Mutations that disrupt either the structure or the function of either of the two genes are not deleterious and will not be removed through the selection process. As a result, the gene that has been mutated gradually becomes a pseudogene and will be either unexpressed or functionless. This kind of evolutionary fate is shown by population genetic modeling and also by genome analysis. According to evolutionary context, these pseudogenes will either be deleted or become so distinct from the parental genes so that they will no longer be identifiable. Relatively young pseudogenes can be recognized due to their sequence similarity.

Unitary pseudogenes

2 ways a pseuogene may be produced

Various mutations (such as indels and nonsense mutations) can prevent a gene from being normally transcribed or translated, and thus the gene may become less- or non-functional or "deactivated". These are the same mechanisms by which non-processed genes become pseudogenes, but the difference in this case is that the gene was not duplicated before pseudogenization. Normally, such a pseudogene would be unlikely to become fixed in a population, but various population effects, such as genetic drift, a population bottleneck, or, in some cases, natural selection, can lead to fixation. The classic example of a unitary pseudogene is the gene that presumably coded the enzyme L-gulono-γ-lactone oxidase (GULO) in primates. In all mammals studied besides primates (except guinea pigs), GULO aids in the biosynthesis of ascorbic acid (vitamin C), but it exists as a disabled gene (GULOP) in humans and other primates. Another more recent example of a disabled gene links the deactivation of the caspase 12 gene (through a nonsense mutation) to positive selection in humans.

It has been shown that processed pseudogenes accumulate mutations faster than non-processed pseudogenes.

Pseudo-pseudogenes

The rapid proliferation of DNA sequencing technologies has led to the identification of many apparent pseudogenes using gene prediction techniques. Pseudogenes are often identified by the appearance of a premature stop codon in a predicted mRNA sequence, which would, in theory, prevent synthesis (translation) of the normal protein product of the original gene. There have been some reports of translational readthrough of such premature stop codons in mammals, as reviewed in the "Translational readthrough" section of the stop codon article. As alluded to in the figure above, a small amount of the protein product of such readthrough may still be recognizable and function at some level. If so, the pseudogene can be subject to natural selection. That appears to have happened during the evolution of Drosophila species, as described next. 

Drosophila melanogaster

In 2016 it was reported that 4 predicted pseudogenes in multiple Drosophila species actually encode proteins with biologically important functions, "suggesting that such 'pseudo-pseudogenes' could represent a widespread phenomenon". For example, the functional protein (an olfactory receptor) is found only in neurons. This finding of tissue-specific biologically-functional genes that could have been dismissed as pseudogenes by in silico analysis complicates the analysis of sequence data. As of 2012, it appeared that there are approximately 12,000–14,000 pseudogenes in the human genome, almost comparable to the oft-cited approximate value of 20,000 genes in our genome. The current work may also help to explain why we are able to live with 20 to 100 putative homozygous loss of function mutations in our genomes.

Through reanalysis of over 50 million peptides generated from the human proteome and separated by mass spectrometry, it now (2016) appears that there are at least 19,262 human proteins produced from 16,271 genes or clusters of genes. From this analysis, 8 new protein coding genes that were previously considered pseudogenes were identified.

Examples of pseudogene function

Drosophila glutamate receptor. The term "pseudo-pseudogene" was coined for the gene encoding the chemosensory ionotropic glutamate receptor Ir75a of Drosophila sechellia, which bears a premature termination codon (PTC) and was thus classified as a pseudogene. However, in vivo the D. sechellia Ir75a locus produces a functional receptor, owing to translational read-through of the PTC. Read-through is detected only in neurons and depends on the nucleotide sequence downstream of the PTC.

siRNAs. Some endogenous siRNAs appear to be derived from pseudogenes, and thus some pseudogenes play a role in regulating protein-coding transcripts, as reviewed. One of the many examples is psiPPM1K. Processing of RNAs transcribed from psiPPM1K yield siRNAs that can act to suppress the most common type of liver cancer, hepatocellular carcinoma. This and much other research has led to considerable excitement about the possibility of targeting pseudogenes with/as therapeutic agents.

piRNAs. Some piRNAs are derived from pseudogenes located in piRNA clusters. Those piRNAs regulate genes via the piRNA pathway in mammalian testes and are crucial for limiting transposable element damage to the genome.

BRAF pseudogene acts as a ceRNA

microRNAs. There are many reports of pseudogene transcripts acting as microRNA decoys. Perhaps the earliest definitive example of such a pseudogene involved in cancer is the pseudogene of BRAF. The BRAF gene is a proto-oncogene that, when mutated, is associated with many cancers. Normally, the amount of BRAF protein is kept under control in cells through the action of miRNA. In normal situations, the amount of RNA from BRAF and the pseudogene BRAFP1 compete for miRNA, but the balance of the 2 RNAs is such that cells grow normally. However, when BRAFP1 RNA expression is increased (either experimentally or by natural mutations), less miRNA is available to control the expression of BRAF, and the increased amount of BRAF protein causes cancer. This sort of competition for regulatory elements by RNAs that are endogenous to the genome has given rise to the term ceRNA

PTEN. The PTEN gene is a known tumor suppressor gene. The PTEN pseudogene, PTENP1 is a processed pseudogene that is very similar in its genetic sequence to the wild-type gene. However, PTENP1 has a missense mutation which eliminates the codon for the initiating methionine and thus prevents translation of the normal PTEN protein. In spite of that, PTENP1 appears to play a role in oncogenesis. The 3' UTR of PTENP1 mRNA functions as a decoy of PTEN mRNA by targeting micro RNAs due to its similarity to the PTEN gene, and overexpression of the 3' UTR resulted in an increase of PTEN protein level. That is, overexpression of the PTENP1 3' UTR leads to increased regulation and suppression of cancerous tumors. The biology of this system is basically the inverse of the BRAF system described above. 

Potogenes. Pseudogenes can, over evolutionary time scales, participate in gene conversion and other mutational events that may give rise to new or newly-functional genes. This has led to the concept that pseudogenes could be viewed as potogenes: potential genes for evolutionary diversification.

Misidentified pseudogenes

Sometimes genes are thought to be pseudogenes, usually based on bioinformatic analysis, but then turn out to be functional genes. Examples include the Drosophila jingwei gene which encodes a functional alcohol dehydrogenase enzyme in vivo.

Another example is the human gene encoding phosphoglycerate mutase which was thought to be a pseudogene but which turned out to be a functional gene, now named PGAM4. Mutations in it actually cause infertility.

Bacterial pseudogenes

Pseudogenes can be found in bacteria. Most are in bacteria that are not free-living; that is, they are either symbionts or obligate intracellular parasites and thus do not require many genes that are needed by bacteria living in changeable environments. An extreme example is the genome of Mycobacterium leprae, the causative agent of leprosy. It has been reported to have 1,133 pseudogenes which give rise to approximately 50% of its transcriptome.

Enzyme promiscuity

From Wikipedia, the free encyclopedia
 
Enzyme promiscuity is the ability of an enzyme to catalyse a fortuitous side reaction in addition to its main reaction. Although enzymes are remarkably specific catalysts, they can often perform side reactions in addition to their main, native catalytic activity. These promiscuous activities are usually slow relative to the main activity and are under neutral selection. Despite ordinarily being physiologically irrelevant, under new selective pressures these activities may confer a fitness benefit therefore prompting the evolution of the formerly promiscuous activity to become the new main activity. An example of this is the atrazine chlorohydrolase (atzA encoded) from Pseudomonas sp. ADP which evolved from melamine deaminase (triA encoded), which has very small promiscuous activity towards atrazine, a man-made chemical.

Introduction

Enzymes are evolved to catalyse a particular reaction on a particular substrate with a high catalytic efficiency (kcat/KM, cf. Michaelis–Menten kinetics). However, in addition to this main activity, they possess other activities that are generally several orders of magnitude lower, and that are not a result of evolutionary selection and therefore do not partake in the physiology of the organism. This phenomenon allows new functions to be gained as the promiscuous activity could confer a fitness benefit under a new selective pressure leading to its duplication and selection as a new main activity.

Enzyme evolution

Duplication and divergence

Several theoretical models exist to predict the order of duplication and specialisation events, but the actual process is more intertwined and fuzzy (§ Reconstructed enzymes below). On one hand, gene amplification results in an increase in enzyme concentration, and potentially freedom from a restrictive regulation, therefore increasing the reaction rate (v) of the promiscuous activity of the enzyme making its effects more pronounced physiologically ("gene dosage effect"). On the other, enzymes may evolve an increased secondary activity with little loss to the primary activity ("robustness") with little adaptive conflict.

Robustness and plasticity

A study of four distinct hydrolases (human serum paraoxonase (PON1), pseudomonad phosphotriesterase (PTE), Protein tyrosine phospatase(PTP) and human carbonic anhydrase II (CAII)) has shown the main activity is "robust" towards change, whereas the promiscuous activities are weak and more "plastic". Specifically, selecting for an activity that is not the main activity (via directed evolution), does not initially diminish the main activity (hence its robustness), but greatly affects the non-selected activities (hence their plasticity).

The phosphotriesterase (PTE) from Pseudomonas diminuta was evolved to become an arylesterase (P–O to C–O hydrolase) in eighteen rounds gaining a 109 shift in specificity (ratio of KM), however most of the change occurred in the initial rounds, where the unselected vestigial PTE activity was retained and the evolved arylesterase activity grew, while in the latter rounds there was a little trade-off for the loss of the vestigial PTE activity in favour of the arylesterase activity.

This means firstly that a specialist enzyme (monofunctional) when evolved goes through a generalist stage (multifunctional), before becoming a specialist again—presumably after gene duplication according to the IAD model—and secondly that promiscuous activities are more plastic than the main activity.

Reconstructed enzymes

The most recent and most clear cut example of enzyme evolution is the rise of bioremediating enzymes in the past 60 years. Due to the very low number of amino acid changes, these provide an excellent model to investigate enzyme evolution in nature. However, using extant enzymes to determine how the family of enzymes evolved has the drawback that the newly evolved enzyme is compared to paralogues without knowing the true identity of the ancestor before the two genes divereged. This issue can be resolved thanks to ancestral reconstruction. First proposed in 1963 by Linus Pauling and Emile Zuckerkandl, ancestral reconstruction is the inference and synthesis of a gene from the ancestral form of a group of genes, which has had a recent revival thanks to improved inference techniques and low-cost artificial gene synthesis, resulting in several ancestral enzymes—dubbed "stemzymes" by some—to be studied.

Evidence gained from reconstructed enzyme suggests that the order of the events where the novel activity is improved and the gene is duplication is not clear cut, unlike what the theoretical models of gene evolution suggest. 

One study showed that the ancestral gene of the immune defence protease family in mammals had a broader specificity and a higher catalytic efficiency than the contemporary family of paralogues, whereas another study showed that the ancestral steroid receptor of vertebrates was an oestrogen receptor with slight substrate ambiguity for other hormones—indicating that these probably were not synthesised at the time.

This variability in ancestral specificity has not only been observed between different genes, but also within the same gene family. In light of the large number of paralogous fungal α-glucosidase genes with a number of specific maltose-like (maltose, turanose, maltotriose, maltulose and sucrose) and isomaltose-like (isomaltose and palatinose) substrates, a study reconstructed all key ancestors and found that the last common ancestor of the paralogues was mainly active on maltose-like substrates with only trace activity for isomaltose-like sugars, despite leading to a lineage of iso-maltose glucosidases and a lineage that further split into maltose glucosidases and iso-maltose glucosidases. Antithetically, the ancestor before the latter split had a more pronounced isomaltose-like glucosidase activity.

Primordial metabolism

Roy Jensen in 1976 theorised that primordial enzymes had to be highly promiscuous in order for metabolic networks to assemble in a patchwork fashion (hence its name, the patchwork model). This primordial catalytic versatility was later lost in favour of highly catalytic specialised orthologous enzymes. As a consequence, many central-metabolic enzymes have structural homologues that diverged before the last universal common ancestor.

Distribution

Promiscuity is however not only a primordial trait, in fact it is very widespread property in modern genomes. A series of experiments have been conducted to assess the distribution of promiscuous enzyme activities in E. coli. In E. coli 21 out of 104 single-gene knockouts tested (from the Keio collection) could be rescued by overexpressing a noncognate E. coli protein (using a pooled set of plasmids of the ASKA collection). The mechanisms by which the noncognate ORF could rescue the knockout can be grouped into eight categories: isozyme overexpression (homologues), substrate ambiguity, transport ambiguity (scavenging), catalytic promiscuity, metabolic flux maintenance (including overexpression of the large component of a synthase in the absence of the amine transferase subunit), pathway bypass, regulatory effects and unknown mechanisms. Similarly, overexpressing the ORF collection allowed E. coli to gain over an order of magnitude in resistance in 86 out 237 toxic environment.

Homology

Homologues are sometimes known to display promiscuity towards each other's main reactions. This crosswise promiscuity has been most studied with members of the alkaline phosphatase superfamily, which catalyse hydrolytic reaction on the sulfate, phosphonate, monophosphate, diphosphate or triphosphate ester bond of several compounds. Despite the divergence the homologues have a varying degree of reciprocal promiscuity: the differences in promiscuity are due to mechanisms involved, particularly the intermediate required.

Degree of promiscuity

Enzymes are generally in a state that is not only a compromise between stability and catalytic efficiency, but also for specificity and evolvability, the latter two dictating whether an enzyme is a generalist (highly evolvable due to large promiscuity, but low main activity) or a specialist (high main activity, poorly evolvable due to low promiscuity). Examples of these are enzymes for primary and secondary metabolism in plants. Other factors can come into play, for example the glycerophosphodiesterase (gpdQ) from Enterobacter aerogenes shows different values for its promiscuous activities depending on the two metal ions it binds, which is dictated by ion availability. In some cases promiscuity can be increased by relaxing the specificity of the active site by enlarging it with a single mutation as was the case of a D297G mutant of the E. coli L-Ala-D/L-Glu epimerase (ycjG) and E323G mutant of a pseudomonad muconate lactonizing enzyme II, allowing them to promiscuously catalyse the activity of O-succinylbenzoate synthase (menC). Conversely, promiscuity can be decreased as was the case of γ-humulene synthase (a sesquiterpene synthase) from Abies grandis that is known to produce 52 different sesquiterpenes from farnesyl diphosphate upon several mutations.

Studies on enzymes with broad-specificity—not promiscuous, but conceptually close—such as mammalian trypsin and chymotrypsin, and the bifunctional isopropylmalate isomerase/homoaconitase from Pyrococcus horikoshii have revealed that active site loop mobility contributes substantially to the catalytic elasticity of the enzyme.

Toxicity

A promiscuous activity is a non-native activity the enzyme did not evolve to do, but arises due to an accommodating conformation of the active site. However, the main activity of the enzyme is a result not only of selection towards a high catalytic rate towards a particular substrate to produce a particular product, but also to avoid the production of toxic or unnecessary products. For example, if a tRNA syntheses loaded an incorrect amino acid onto a tRNA, the resulting peptide would have unexpectedly altered properties, consequently to enhance fidelity several additional domains are present. Similar in reaction to tRNA syntheses, the first subunit of tyrocidine synthetase (tyrA) from Bacillus brevis adenylates a molecule of phenylalanine in order to use the adenyl moiety as a handle to produce tyrocidine, a cyclic non-ribosomal peptide. When the specificity of enzyme was probed, it was found that it was highly selective against natural amino acids that were not phenylalanine, but was much more tolerant towards unnatural amino acids. Specifically, most amino acids were not catalysed, whereas the next most catalysed native amino acid was the structurally similar tyrosine, but at a thousandth as much as phenylalanine, whereas several unnatural amino acids where catalysed better than tyrosine, namely D-phenylalanine, β-cyclohexyl-L-alanine, 4-amino-L-phenylalanine and L-norleucine.

One peculiar case of selected secondary activity are polymerases and restriction endonucleases, where incorrect activity is actually a result of a compromise between fidelity and evolvability. For example, for restriction endonucleases incorrect activity (star activity) is often lethal for the organism, but a small amount allows new functions to evolve against new pathogens.

Plant secondary metabolism

Anthocyanins (delphinidin pictured) confer plants, particularly their flowers, with a variety of colors to attract pollinators and a typical example of plant secondary metabolite.

Plants produce a large number of secondary metabolites thanks to enzymes that, unlike those involved in primary metabolism, are less catalytically efficient but have a larger mechanistic elasticity (reaction types) and broader specificities. The liberal drift threshold (caused by the low selective pressure due the small population size) allows the fitness gain endowed by one of the products to maintain the other activities even though they may be physiologically useless.

Biocatalysis

In biocatalysis, many reactions are sought that are absent in nature. To do this, enzymes with a small promiscuous activity towards the required reaction are identified and evolved via directed evolution or rational design.

An example of a commonly evolved enzyme is ω-transaminase which can replace a ketone with a chiral amine and consequently libraries of different homologues are commercially available for rapid biomining (eg. Codexis). 

Another example is the possibility of using the promiscuous activities of cysteine synthase (cysM) towards nucleophiles to produce non-proteinogenic amino acids.

Reaction similarity

Similarity between enzymatic reactions (EC) can be calculated by using bond changes, reaction centres or substructure metrics (EC-BLAST).

Drugs and promiscuity

Whereas promiscuity is mainly studied in terms of standard enzyme kinetics, drug binding and subsequent reaction is a promiscuous activity as the enzyme catalyses an inactivating reaction towards a novel substrate it did not evolve to catalyse. This could be because of the demonstration that there are only a small number of distinct ligand binding pockets in proteins.

Mammalian xenobiotic metabolism, on the other hand, was evolved to have a broad specificity to oxidise, bind and eliminate foreign lipophilic compounds which may be toxic, such as plant alkaloids, so their ability to detoxify anthropogenic xenobiotics is an extension of this.

Computer-aided software engineering

From Wikipedia, the free encyclopedia ...