Search This Blog

Tuesday, December 18, 2018

Pseudogene

From Wikipedia, the free encyclopedia

Mechanism of classical and processed pseudogene formation
 
Pseudogenes, sometimes referred to as zombie genes in the media, are segments of DNA that are related to real genes. Pseudogenes have lost at least some functionality, relative to the complete gene, in cellular gene expression or protein-coding ability. Pseudogenes often result from the accumulation of multiple mutations within a gene whose product is not required for the survival of the organism, but can also be caused by genomic copy number variation (CNV) where segments of 1+ kb are duplicated or deleted. Although not fully functional, pseudogenes may be functional, similar to other kinds of noncoding DNA, which can perform regulatory functions. The "pseudo" in "pseudogene" implies a variation in sequence relative to the parent coding gene, but does not necessarily indicate pseudo-function. Despite being non-coding, many pseudogenes have important roles in normal physiology and abnormal pathology.

Although some pseudogenes do not have introns or promoters (such pseudogenes are copied from messenger RNA and incorporated into the chromosome, and are called "processed pseudogenes"), others have some gene-like features such as promoters, CpG islands, and splice sites. They are different from normal genes due to either a lack of protein-coding ability resulting from a variety of disabling mutations (e.g. premature stop codons or frameshifts), a lack of transcription, or their inability to encode RNA (such as with ribosomal RNA pseudogenes). The term "pseudogene" was coined in 1977 by Jacq et al. 

Because pseudogenes were initially thought of as the last stop for genomic material that could be removed from the genome, they were often labeled as junk DNA. Nonetheless, pseudogenes contain biological and evolutionary histories within their sequences. This is due to a pseudogene's shared ancestry with a functional gene: in the same way that Darwin thought of two species as possibly having a shared common ancestry followed by millions of years of evolutionary divergence, a pseudogene and its associated functional gene also share a common ancestor and have diverged as separate genetic entities over millions of years.

Properties

Pseudogenes are usually characterized by a combination of homology to a known gene and loss of some functionality. That is, although every pseudogene has a DNA sequence that is similar to some functional gene, they are usually unable to produce functional final protein products. Pseudogenes are sometimes difficult to identify and characterize in genomes, because the two requirements of homology and loss of functionality are usually implied through sequence alignments rather than biologically proven.
  1. Homology is implied by sequence identity between the DNA sequences of the pseudogene and parent gene. After aligning the two sequences, the percentage of identical base pairs is computed. A high sequence identity means that it is highly likely that these two sequences diverged from a common ancestral sequence (are homologous), and highly unlikely that these two sequences have evolved independently.
  2. Nonfunctionality can manifest itself in many ways. Normally, a gene must go through several steps to a fully functional protein: Transcription, pre-mRNA processing, translation, and protein folding are all required parts of this process. If any of these steps fails, then the sequence may be considered nonfunctional. In high-throughput pseudogene identification, the most commonly identified disablements are premature stop codons and frameshifts, which almost universally prevent the translation of a functional protein product.
Pseudogenes for RNA genes are usually more difficult to discover as they do not need to be translated and thus do not have "reading frames". 

Pseudogenes can complicate molecular genetic studies. For example, amplification of a gene by PCR may simultaneously amplify a pseudogene that shares similar sequences. This is known as PCR bias or amplification bias. Similarly, pseudogenes are sometimes annotated as genes in genome sequences.
Processed pseudogenes often pose a problem for gene prediction programs, often being misidentified as real genes or exons. It has been proposed that identification of processed pseudogenes can help improve the accuracy of gene prediction methods.

Recently 140 human pseudogenes have been shown to be translated. However, the function, if any, of the protein products is unknown.

Types and origin

There are four main types of pseudogenes, all with distinct mechanisms of origin and characteristic features. The classifications of pseudogenes are as follows:

Processed

Processed pseudogene production

Processed (or retrotransposed) pseudogenes. In higher eukaryotes, particularly mammals, retrotransposition is a fairly common event that has had a huge impact on the composition of the genome. For example, somewhere between 30–44% of the human genome consists of repetitive elements such as SINEs and LINEs. In the process of retrotransposition, a portion of the mRNA or hnRNA transcript of a gene is spontaneously reverse transcribed back into DNA and inserted into chromosomal DNA. Although retrotransposons usually create copies of themselves, it has been shown in an in vitro system that they can create retrotransposed copies of random genes, too. Once these pseudogenes are inserted back into the genome, they usually contain a poly-A tail, and usually have had their introns spliced out; these are both hallmark features of cDNAs. However, because they are derived from an RNA product, processed pseudogenes also lack the upstream promoters of normal genes; thus, they are considered "dead on arrival", becoming non-functional pseudogenes immediately upon the retrotransposition event. However, these insertions occasionally contribute exons to existing genes, usually via alternatively spliced transcripts. A further characteristic of processed pseudogenes is common truncation of the 5' end relative to the parent sequence, which is a result of the relatively non-processive retrotransposition mechanism that creates processed pseudogenes. Processed pseudogenes are continually being created in primates. Human populations, for example, have distinct sets of processed pseudogenes across its individuals.

Non-processed

One way a pseudogene may arise
 
Non-processed (or duplicated) pseudogenes. Gene duplication is another common and important process in the evolution of genomes. A copy of a functional gene may arise as a result of a gene duplication event caused by homologous recombination at, for example, repetitive sine sequences on misaligned chromosomes and subsequently acquire mutations that cause the copy to lose the original gene's function. Duplicated pseudogenes usually have all the same characteristics as genes, including an intact exon-intron structure and regulatory sequences. The loss of a duplicated gene's functionality usually has little effect on an organism's fitness, since an intact functional copy still exists. According to some evolutionary models, shared duplicated pseudogenes indicate the evolutionary relatedness of humans and the other primates. If pseudogenization is due to gene duplication, it usually occurs in the first few million years after the gene duplication, provided the gene has not been subjected to any selection pressure. Gene duplication generates functional redundancy and it is not normally advantageous to carry two identical genes. Mutations that disrupt either the structure or the function of either of the two genes are not deleterious and will not be removed through the selection process. As a result, the gene that has been mutated gradually becomes a pseudogene and will be either unexpressed or functionless. This kind of evolutionary fate is shown by population genetic modeling and also by genome analysis. According to evolutionary context, these pseudogenes will either be deleted or become so distinct from the parental genes so that they will no longer be identifiable. Relatively young pseudogenes can be recognized due to their sequence similarity.

Unitary pseudogenes

2 ways a pseuogene may be produced

Various mutations (such as indels and nonsense mutations) can prevent a gene from being normally transcribed or translated, and thus the gene may become less- or non-functional or "deactivated". These are the same mechanisms by which non-processed genes become pseudogenes, but the difference in this case is that the gene was not duplicated before pseudogenization. Normally, such a pseudogene would be unlikely to become fixed in a population, but various population effects, such as genetic drift, a population bottleneck, or, in some cases, natural selection, can lead to fixation. The classic example of a unitary pseudogene is the gene that presumably coded the enzyme L-gulono-γ-lactone oxidase (GULO) in primates. In all mammals studied besides primates (except guinea pigs), GULO aids in the biosynthesis of ascorbic acid (vitamin C), but it exists as a disabled gene (GULOP) in humans and other primates. Another more recent example of a disabled gene links the deactivation of the caspase 12 gene (through a nonsense mutation) to positive selection in humans.

It has been shown that processed pseudogenes accumulate mutations faster than non-processed pseudogenes.

Pseudo-pseudogenes

The rapid proliferation of DNA sequencing technologies has led to the identification of many apparent pseudogenes using gene prediction techniques. Pseudogenes are often identified by the appearance of a premature stop codon in a predicted mRNA sequence, which would, in theory, prevent synthesis (translation) of the normal protein product of the original gene. There have been some reports of translational readthrough of such premature stop codons in mammals, as reviewed in the "Translational readthrough" section of the stop codon article. As alluded to in the figure above, a small amount of the protein product of such readthrough may still be recognizable and function at some level. If so, the pseudogene can be subject to natural selection. That appears to have happened during the evolution of Drosophila species, as described next. 

Drosophila melanogaster

In 2016 it was reported that 4 predicted pseudogenes in multiple Drosophila species actually encode proteins with biologically important functions, "suggesting that such 'pseudo-pseudogenes' could represent a widespread phenomenon". For example, the functional protein (an olfactory receptor) is found only in neurons. This finding of tissue-specific biologically-functional genes that could have been dismissed as pseudogenes by in silico analysis complicates the analysis of sequence data. As of 2012, it appeared that there are approximately 12,000–14,000 pseudogenes in the human genome, almost comparable to the oft-cited approximate value of 20,000 genes in our genome. The current work may also help to explain why we are able to live with 20 to 100 putative homozygous loss of function mutations in our genomes.

Through reanalysis of over 50 million peptides generated from the human proteome and separated by mass spectrometry, it now (2016) appears that there are at least 19,262 human proteins produced from 16,271 genes or clusters of genes. From this analysis, 8 new protein coding genes that were previously considered pseudogenes were identified.

Examples of pseudogene function

Drosophila glutamate receptor. The term "pseudo-pseudogene" was coined for the gene encoding the chemosensory ionotropic glutamate receptor Ir75a of Drosophila sechellia, which bears a premature termination codon (PTC) and was thus classified as a pseudogene. However, in vivo the D. sechellia Ir75a locus produces a functional receptor, owing to translational read-through of the PTC. Read-through is detected only in neurons and depends on the nucleotide sequence downstream of the PTC.

siRNAs. Some endogenous siRNAs appear to be derived from pseudogenes, and thus some pseudogenes play a role in regulating protein-coding transcripts, as reviewed. One of the many examples is psiPPM1K. Processing of RNAs transcribed from psiPPM1K yield siRNAs that can act to suppress the most common type of liver cancer, hepatocellular carcinoma. This and much other research has led to considerable excitement about the possibility of targeting pseudogenes with/as therapeutic agents.

piRNAs. Some piRNAs are derived from pseudogenes located in piRNA clusters. Those piRNAs regulate genes via the piRNA pathway in mammalian testes and are crucial for limiting transposable element damage to the genome.

BRAF pseudogene acts as a ceRNA

microRNAs. There are many reports of pseudogene transcripts acting as microRNA decoys. Perhaps the earliest definitive example of such a pseudogene involved in cancer is the pseudogene of BRAF. The BRAF gene is a proto-oncogene that, when mutated, is associated with many cancers. Normally, the amount of BRAF protein is kept under control in cells through the action of miRNA. In normal situations, the amount of RNA from BRAF and the pseudogene BRAFP1 compete for miRNA, but the balance of the 2 RNAs is such that cells grow normally. However, when BRAFP1 RNA expression is increased (either experimentally or by natural mutations), less miRNA is available to control the expression of BRAF, and the increased amount of BRAF protein causes cancer. This sort of competition for regulatory elements by RNAs that are endogenous to the genome has given rise to the term ceRNA

PTEN. The PTEN gene is a known tumor suppressor gene. The PTEN pseudogene, PTENP1 is a processed pseudogene that is very similar in its genetic sequence to the wild-type gene. However, PTENP1 has a missense mutation which eliminates the codon for the initiating methionine and thus prevents translation of the normal PTEN protein. In spite of that, PTENP1 appears to play a role in oncogenesis. The 3' UTR of PTENP1 mRNA functions as a decoy of PTEN mRNA by targeting micro RNAs due to its similarity to the PTEN gene, and overexpression of the 3' UTR resulted in an increase of PTEN protein level. That is, overexpression of the PTENP1 3' UTR leads to increased regulation and suppression of cancerous tumors. The biology of this system is basically the inverse of the BRAF system described above. 

Potogenes. Pseudogenes can, over evolutionary time scales, participate in gene conversion and other mutational events that may give rise to new or newly-functional genes. This has led to the concept that pseudogenes could be viewed as potogenes: potential genes for evolutionary diversification.

Misidentified pseudogenes

Sometimes genes are thought to be pseudogenes, usually based on bioinformatic analysis, but then turn out to be functional genes. Examples include the Drosophila jingwei gene which encodes a functional alcohol dehydrogenase enzyme in vivo.

Another example is the human gene encoding phosphoglycerate mutase which was thought to be a pseudogene but which turned out to be a functional gene, now named PGAM4. Mutations in it actually cause infertility.

Bacterial pseudogenes

Pseudogenes can be found in bacteria. Most are in bacteria that are not free-living; that is, they are either symbionts or obligate intracellular parasites and thus do not require many genes that are needed by bacteria living in changeable environments. An extreme example is the genome of Mycobacterium leprae, the causative agent of leprosy. It has been reported to have 1,133 pseudogenes which give rise to approximately 50% of its transcriptome.

Enzyme promiscuity

From Wikipedia, the free encyclopedia
 
Enzyme promiscuity is the ability of an enzyme to catalyse a fortuitous side reaction in addition to its main reaction. Although enzymes are remarkably specific catalysts, they can often perform side reactions in addition to their main, native catalytic activity. These promiscuous activities are usually slow relative to the main activity and are under neutral selection. Despite ordinarily being physiologically irrelevant, under new selective pressures these activities may confer a fitness benefit therefore prompting the evolution of the formerly promiscuous activity to become the new main activity. An example of this is the atrazine chlorohydrolase (atzA encoded) from Pseudomonas sp. ADP which evolved from melamine deaminase (triA encoded), which has very small promiscuous activity towards atrazine, a man-made chemical.

Introduction

Enzymes are evolved to catalyse a particular reaction on a particular substrate with a high catalytic efficiency (kcat/KM, cf. Michaelis–Menten kinetics). However, in addition to this main activity, they possess other activities that are generally several orders of magnitude lower, and that are not a result of evolutionary selection and therefore do not partake in the physiology of the organism. This phenomenon allows new functions to be gained as the promiscuous activity could confer a fitness benefit under a new selective pressure leading to its duplication and selection as a new main activity.

Enzyme evolution

Duplication and divergence

Several theoretical models exist to predict the order of duplication and specialisation events, but the actual process is more intertwined and fuzzy (§ Reconstructed enzymes below). On one hand, gene amplification results in an increase in enzyme concentration, and potentially freedom from a restrictive regulation, therefore increasing the reaction rate (v) of the promiscuous activity of the enzyme making its effects more pronounced physiologically ("gene dosage effect"). On the other, enzymes may evolve an increased secondary activity with little loss to the primary activity ("robustness") with little adaptive conflict.

Robustness and plasticity

A study of four distinct hydrolases (human serum paraoxonase (PON1), pseudomonad phosphotriesterase (PTE), Protein tyrosine phospatase(PTP) and human carbonic anhydrase II (CAII)) has shown the main activity is "robust" towards change, whereas the promiscuous activities are weak and more "plastic". Specifically, selecting for an activity that is not the main activity (via directed evolution), does not initially diminish the main activity (hence its robustness), but greatly affects the non-selected activities (hence their plasticity).

The phosphotriesterase (PTE) from Pseudomonas diminuta was evolved to become an arylesterase (P–O to C–O hydrolase) in eighteen rounds gaining a 109 shift in specificity (ratio of KM), however most of the change occurred in the initial rounds, where the unselected vestigial PTE activity was retained and the evolved arylesterase activity grew, while in the latter rounds there was a little trade-off for the loss of the vestigial PTE activity in favour of the arylesterase activity.

This means firstly that a specialist enzyme (monofunctional) when evolved goes through a generalist stage (multifunctional), before becoming a specialist again—presumably after gene duplication according to the IAD model—and secondly that promiscuous activities are more plastic than the main activity.

Reconstructed enzymes

The most recent and most clear cut example of enzyme evolution is the rise of bioremediating enzymes in the past 60 years. Due to the very low number of amino acid changes, these provide an excellent model to investigate enzyme evolution in nature. However, using extant enzymes to determine how the family of enzymes evolved has the drawback that the newly evolved enzyme is compared to paralogues without knowing the true identity of the ancestor before the two genes divereged. This issue can be resolved thanks to ancestral reconstruction. First proposed in 1963 by Linus Pauling and Emile Zuckerkandl, ancestral reconstruction is the inference and synthesis of a gene from the ancestral form of a group of genes, which has had a recent revival thanks to improved inference techniques and low-cost artificial gene synthesis, resulting in several ancestral enzymes—dubbed "stemzymes" by some—to be studied.

Evidence gained from reconstructed enzyme suggests that the order of the events where the novel activity is improved and the gene is duplication is not clear cut, unlike what the theoretical models of gene evolution suggest. 

One study showed that the ancestral gene of the immune defence protease family in mammals had a broader specificity and a higher catalytic efficiency than the contemporary family of paralogues, whereas another study showed that the ancestral steroid receptor of vertebrates was an oestrogen receptor with slight substrate ambiguity for other hormones—indicating that these probably were not synthesised at the time.

This variability in ancestral specificity has not only been observed between different genes, but also within the same gene family. In light of the large number of paralogous fungal α-glucosidase genes with a number of specific maltose-like (maltose, turanose, maltotriose, maltulose and sucrose) and isomaltose-like (isomaltose and palatinose) substrates, a study reconstructed all key ancestors and found that the last common ancestor of the paralogues was mainly active on maltose-like substrates with only trace activity for isomaltose-like sugars, despite leading to a lineage of iso-maltose glucosidases and a lineage that further split into maltose glucosidases and iso-maltose glucosidases. Antithetically, the ancestor before the latter split had a more pronounced isomaltose-like glucosidase activity.

Primordial metabolism

Roy Jensen in 1976 theorised that primordial enzymes had to be highly promiscuous in order for metabolic networks to assemble in a patchwork fashion (hence its name, the patchwork model). This primordial catalytic versatility was later lost in favour of highly catalytic specialised orthologous enzymes. As a consequence, many central-metabolic enzymes have structural homologues that diverged before the last universal common ancestor.

Distribution

Promiscuity is however not only a primordial trait, in fact it is very widespread property in modern genomes. A series of experiments have been conducted to assess the distribution of promiscuous enzyme activities in E. coli. In E. coli 21 out of 104 single-gene knockouts tested (from the Keio collection) could be rescued by overexpressing a noncognate E. coli protein (using a pooled set of plasmids of the ASKA collection). The mechanisms by which the noncognate ORF could rescue the knockout can be grouped into eight categories: isozyme overexpression (homologues), substrate ambiguity, transport ambiguity (scavenging), catalytic promiscuity, metabolic flux maintenance (including overexpression of the large component of a synthase in the absence of the amine transferase subunit), pathway bypass, regulatory effects and unknown mechanisms. Similarly, overexpressing the ORF collection allowed E. coli to gain over an order of magnitude in resistance in 86 out 237 toxic environment.

Homology

Homologues are sometimes known to display promiscuity towards each other's main reactions. This crosswise promiscuity has been most studied with members of the alkaline phosphatase superfamily, which catalyse hydrolytic reaction on the sulfate, phosphonate, monophosphate, diphosphate or triphosphate ester bond of several compounds. Despite the divergence the homologues have a varying degree of reciprocal promiscuity: the differences in promiscuity are due to mechanisms involved, particularly the intermediate required.

Degree of promiscuity

Enzymes are generally in a state that is not only a compromise between stability and catalytic efficiency, but also for specificity and evolvability, the latter two dictating whether an enzyme is a generalist (highly evolvable due to large promiscuity, but low main activity) or a specialist (high main activity, poorly evolvable due to low promiscuity). Examples of these are enzymes for primary and secondary metabolism in plants. Other factors can come into play, for example the glycerophosphodiesterase (gpdQ) from Enterobacter aerogenes shows different values for its promiscuous activities depending on the two metal ions it binds, which is dictated by ion availability. In some cases promiscuity can be increased by relaxing the specificity of the active site by enlarging it with a single mutation as was the case of a D297G mutant of the E. coli L-Ala-D/L-Glu epimerase (ycjG) and E323G mutant of a pseudomonad muconate lactonizing enzyme II, allowing them to promiscuously catalyse the activity of O-succinylbenzoate synthase (menC). Conversely, promiscuity can be decreased as was the case of γ-humulene synthase (a sesquiterpene synthase) from Abies grandis that is known to produce 52 different sesquiterpenes from farnesyl diphosphate upon several mutations.

Studies on enzymes with broad-specificity—not promiscuous, but conceptually close—such as mammalian trypsin and chymotrypsin, and the bifunctional isopropylmalate isomerase/homoaconitase from Pyrococcus horikoshii have revealed that active site loop mobility contributes substantially to the catalytic elasticity of the enzyme.

Toxicity

A promiscuous activity is a non-native activity the enzyme did not evolve to do, but arises due to an accommodating conformation of the active site. However, the main activity of the enzyme is a result not only of selection towards a high catalytic rate towards a particular substrate to produce a particular product, but also to avoid the production of toxic or unnecessary products. For example, if a tRNA syntheses loaded an incorrect amino acid onto a tRNA, the resulting peptide would have unexpectedly altered properties, consequently to enhance fidelity several additional domains are present. Similar in reaction to tRNA syntheses, the first subunit of tyrocidine synthetase (tyrA) from Bacillus brevis adenylates a molecule of phenylalanine in order to use the adenyl moiety as a handle to produce tyrocidine, a cyclic non-ribosomal peptide. When the specificity of enzyme was probed, it was found that it was highly selective against natural amino acids that were not phenylalanine, but was much more tolerant towards unnatural amino acids. Specifically, most amino acids were not catalysed, whereas the next most catalysed native amino acid was the structurally similar tyrosine, but at a thousandth as much as phenylalanine, whereas several unnatural amino acids where catalysed better than tyrosine, namely D-phenylalanine, β-cyclohexyl-L-alanine, 4-amino-L-phenylalanine and L-norleucine.

One peculiar case of selected secondary activity are polymerases and restriction endonucleases, where incorrect activity is actually a result of a compromise between fidelity and evolvability. For example, for restriction endonucleases incorrect activity (star activity) is often lethal for the organism, but a small amount allows new functions to evolve against new pathogens.

Plant secondary metabolism

Anthocyanins (delphinidin pictured) confer plants, particularly their flowers, with a variety of colors to attract pollinators and a typical example of plant secondary metabolite.

Plants produce a large number of secondary metabolites thanks to enzymes that, unlike those involved in primary metabolism, are less catalytically efficient but have a larger mechanistic elasticity (reaction types) and broader specificities. The liberal drift threshold (caused by the low selective pressure due the small population size) allows the fitness gain endowed by one of the products to maintain the other activities even though they may be physiologically useless.

Biocatalysis

In biocatalysis, many reactions are sought that are absent in nature. To do this, enzymes with a small promiscuous activity towards the required reaction are identified and evolved via directed evolution or rational design.

An example of a commonly evolved enzyme is ω-transaminase which can replace a ketone with a chiral amine and consequently libraries of different homologues are commercially available for rapid biomining (eg. Codexis). 

Another example is the possibility of using the promiscuous activities of cysteine synthase (cysM) towards nucleophiles to produce non-proteinogenic amino acids.

Reaction similarity

Similarity between enzymatic reactions (EC) can be calculated by using bond changes, reaction centres or substructure metrics (EC-BLAST).

Drugs and promiscuity

Whereas promiscuity is mainly studied in terms of standard enzyme kinetics, drug binding and subsequent reaction is a promiscuous activity as the enzyme catalyses an inactivating reaction towards a novel substrate it did not evolve to catalyse. This could be because of the demonstration that there are only a small number of distinct ligand binding pockets in proteins.

Mammalian xenobiotic metabolism, on the other hand, was evolved to have a broad specificity to oxidise, bind and eliminate foreign lipophilic compounds which may be toxic, such as plant alkaloids, so their ability to detoxify anthropogenic xenobiotics is an extension of this.

Protein moonlighting

From Wikipedia, the free encyclopedia

Crystallographic structure of cytochrome P450 from the bacteria S. coelicolor (rainbow colored cartoon, N-terminus = blue, C-terminus = red) complexed with heme cofactor (magenta spheres) and two molecules of its endogenous substrate epi-isozizaene as orange and cyan spheres respectively. The orange-colored substrate resides in the monooxygenase site while the cyan-colored substrate occupies the substrate entrance site. An unoccupied moonlighting terpene synthase site is designated by the orange arrow.

Protein moonlighting (or gene sharing) is a phenomenon by which a protein can perform more than one function. Ancestral moonlighting proteins originally possessed a single function but through evolution, acquired additional functions. Many proteins that moonlight are enzymes; others are receptors, ion channels or chaperones. The most common primary function of moonlighting proteins is enzymatic catalysis, but these enzymes have acquired secondary non-enzymatic roles. Some examples of functions of moonlighting proteins secondary to catalysis include signal transduction, transcriptional regulation, apoptosis, motility, and structural.

Protein moonlighting may occur widely in nature. Protein moonlighting through gene sharing differs from the use of a single gene to generate different proteins by alternative RNA splicing, DNA rearrangement, or post-translational processing. It is also different from multifunctionality of the protein, in which the protein has multiple domains, each serving a different function. Protein moonlighting by gene sharing means that a gene may acquire and maintain a second function without gene duplication and without loss of the primary function. Such genes are under two or more entirely different selective constraints.

Various techniques have been used to reveal moonlighting functions in proteins. The detection of a protein in unexpected locations within cells, cell types, or tissues may suggest that a protein has a moonlighting function. Furthermore, sequence or structure homology of a protein may be used to infer both primary function as well as secondary moonlighting functions of a protein. 

The most well-studied examples of gene sharing are crystallins. These proteins, when expressed at low levels in many tissues function as enzymes, but when expressed at high levels in eye tissue, become densely packed and thus form lenses. While the recognition of gene sharing is relatively recent—the term was coined in 1988, after crystallins in chickens and ducks were found to be identical to separately identified enzymes—recent studies have found many examples throughout the living world. Joram Piatigorsky has suggested that many or all proteins exhibit gene sharing to some extent, and that gene sharing is a key aspect of molecular evolution. The genes encoding crystallins must maintain sequences for catalytic function and transparency maintenance function.

Inappropriate moonlighting is a contributing factor in some genetic diseases, and moonlighting provides a possible mechanism by which bacteria may become resistant to antibiotics.

Discovery

The first observation of a moonlighting protein was made in the late 1980s by Joram Piatigorsky and Graeme Wistow during their research on crystallin enzymes. Piatigorsky determined that lens crystallin conservation and variance is due to other moonlighting functions outside of the lens. Originally Piatigorsky called these proteins "gene sharing" proteins, but the colloquial description moonlighting was subsequently applied to proteins by Constance Jeffery in 1999 to draw a similarity between multitasking proteins and people who work two jobs. The phrase "gene sharing" is ambiguous since it is also used to describe horizontal gene transfer, hence the phrase "protein moonlighting" has become the preferred description for proteins with more than one function.

Evolution

It is believed that moonlighting proteins came about by means of evolution through which uni-functional proteins gained the ability to perform multiple functions. With alterations, much of the protein's unused space can provide new functions. Many moonlighting proteins are the result of gene fusion of two single function genes. Alternatively a single gene can acquire a second function since the active site of the encoded protein typically is small compared to the overall size of the protein leaving considerable room to accommodate a second functional site. In yet a third alternative, the same active site can acquire a second function through mutations of the active site. 

The development of moonlighting proteins may be evolutionarily favorable to the organism since a single protein can do the job of multiple proteins conserving amino acids and energy required to synthesize these proteins. However, there is no universally agreed upon theory that explains why proteins with multiple roles evolved. While using one protein to perform multiple roles seems advantageous because it keeps the genome small, we can conclude that this is probably not the reason for moonlighting because of the large of amount of noncoding DNA.

Functions

Many proteins catalyze a chemical reaction. Other proteins fulfill structural, transport, or signaling roles. Furthermore, numerous proteins have the ability to aggregate into supramolecular assemblies. For example, a ribosome is made up of 90 proteins and RNA

A number of the currently known moonlighting proteins are evolutionarily derived from highly conserved enzymes, also called ancient enzymes. These enzymes are frequently speculated to have evolved moonlighting functions. Since highly conserved proteins are present in many different organisms, this increases the chance that they would develop secondary moonlighting functions. A high fraction of enzymes involved in glycolysis, an ancient universal metabolic pathway, exhibit moonlighting behavior. Furthermore, it has been suggested that as many as 7 out of 10 proteins in glycolysis and 7 out of 8 enzymes of the tricarboxylic acid cycle exhibit moonlighting behavior.

An example of a moonlighting enzyme is pyruvate carboxylase. This enzyme catalyzes the carboxylation of pyruvate into oxaloacetate, thereby replenishing the tricarboxylic acid cycle. Surprisingly, in yeast species such as H. polymorpha and P. pastoris, pyruvate carboylase is also essential for proper targeting and assembly of the peroxisomal protein alcohol oxidase (AO). AO, the first enzyme of methanol metabolism, is a homo-octameric flavoenzyme. In wild type cells, this enzyme is present as enzymatically active AO octamers in the peroxisomal matrix. However, in cells lacking pyruvate carboxylase, AO monomers accumulate in the cytosol, indicating that pyruvate carboxylase has a second fully unrelated function in assembly and import. The function in AO import/assembly is fully independent of the enzyme activity of pyruvate carboxylase, because amino acid substitutions can be introduced that fully inactive the enzyme activity of pyruvate carboxylase, without affecting its function in AO assembly and import. Conversely, mutations are known that block the function of this enzyme in import and assembly of AO, but have no effect on the enzymatic activity of the protein.

The E. coli anti-oxidant thioredoxin protein is another example of a moonlighting protein. Upon infection with the bacteriophage T7, E. coli thioredoxin forms a complex with T7 DNA polymerase, which results in enhanced T7 DNA replication, a crucial step for successful T7 infection. Thioredoxin binds to a loop in T7 DNA polymerase to bind more strongly to the DNA. The anti-oxidant function of thioredoxin is fully autonomous and fully independent of T7 DNA replication, in which the protein most likely fulfills the functional role.

ADT2 and ADT5 are another example of moonlighting proteins found in plants. Both of these proteins have roles in phenylalanine biosynthesis like all other ADTs. However ADT2, together with FtsZ is necessary in chloroplast division and ADT5 is transported by stromules into the nucleus.

Examples of moonlighting proteins
Kingdom Protein Organism Function
primary moonlighting
Animal
Aconitase H. sapiens TCA cycle enzyme Iron homeostasis
ATF2 H. sapiens Transcription factor DNA damage response
Crystallins Various Lens structural protein Various enzyme
Cytochrome c Various Energy metabolism Apoptosis
DLD H. sapiens Energy metabolism Protease
ERK2 H. sapiens MAP kinase Transcriptional repressor
ESCRT-II complex D. melanogaster Endosomal protein sorting Bicoid mRNA localization
STAT3 M. musculus Transcription factor Electron transport chain
Plant
Hexokinase A. thaliana Glucose metabolism Glucose signaling / cell death control
Presenilin P. patens γ-secretase Cystoskeletal function
Fungus
Aconitase S. cerevisiae TCA cycle enzyme mtDNA stability
Aldolase S. cerevisiae Glycolytic enzyme V-ATPase assembly
Arg5,6 S. cerevisiae Arginine biosynthesis Transcriptional control
Enolase S. cerevisiae Glycolytic enzyme Homotypic vacuole fusion
Mitochondrial tRNA import
Galactokinase K. lactis Galactose catabolism enzyme Induction galactose genes
Hal3 S. cerevisiae Halotolerance determinant Coenzyme A biosynthesis
HSP60 S. cerevisiae Mitochondrial chaperone Stabilization active DNA ori's
Phosphofructokinase P. pastoris Glycolytic enzyme Autophagy peroxisomes
Pyruvate carboxylase H. polymorpha Anaplerotic enzyme Assembly of alcohol oxidase
Vhs3 S. cerevisiae Halotolerance determinant Coenzyme A biosynthesis
Prokaryotes
Aconitase M. tuberculosis TCA cycle enzyme Iron-responsive protein
CYP170A1 S. coelicolor Albaflavenone synthase Terpene synthase
Enolase S. pneumoniae Glycolytic enzyme Plasminogen binding
GroEL E. aerogenes Chaperone Insect toxin
Glutamate racemase (MurI) E. coli cell wall biosynthesis gyrase inhibition
Thioredoxin E. coli Anti-oxidant T7 DNA polymerase subunit
Protist
Aldolase P. vivax Glycolytic enzyme Host-cell invasion

Mechanisms

Crystallographic structure of aconitase

In many cases, the functionality of a protein not only depends on its structure, but also its location. For example, a single protein may have one function when found in the cytoplasm of a cell, a different function when interacting with a membrane, and yet a third function if excreted from the cell. This property of moonlighting proteins is known as "differential localization". For example, in higher temperatures DegP (HtrA) will function as a protease by the directed degradation of proteins and in lower temperatures as a chaperone by assisting the non-covalent folding or unfolding and the assembly or disassembly of other macromolecular structures. Furthermore, moonlighting proteins may exhibit different behaviors not only as a result of its location within a cell, but also the type of cell that the protein is expressed in. Multifunctionality could also be as a consequence of differential post translational modifications (PTM'S). In the case of the glycolytic enzyme glyceraldehyde-3-phosphate dehydrogenase (GAPDH)alterations in the PTM's have been shown to be associated with higher order multi functionality.

Other methods through which proteins may moonlight are by changing their oligomeric state, altering concentrations of the protein's ligand or substrate, use of alternative binding sites, or finally through phosphorylation. An example of a protein that displays different function in different oligomeric states is pyruvate kinase which exhibits metabolic activity as a tetramer and thyroid hormone–binding activity as a monomer. Changes in the concentrations of ligands or substrates may cause a switch in protein a protein's function. For example, in the presence of low iron concentrations, aconitase functions as an enzyme while at high iron concentration, aconitase functions as an iron-responsive element-binding protein (IREBP). Proteins may also perform separate functions through the use of alternative binding sites that perform different tasks. An example of this is ceruloplasmin, a protein that functions as an oxidase in copper metabolism and moonlights as a copper-independent glutathione peroxidase. Lastly, phosphorylation may sometimes cause a switch in the function of a moonlighting protein. For example, phosphorylation of phosphoglucose isomerase (PGI) at Ser-185 by protein kinase CK2 causes it to stop functioning as an enzyme, while retaining its function as an autocrine motility factor. Hence when a mutation takes place that inactivates a function of a moonlighting proteins, the other function(s) are not necessarily affected.

The crystal structures of several moonlighting proteins, such as I-AniI homing endonuclease / maturase and the PutA proline dehydrogenase / transcription factor, have been determined. An analysis of these crystal structures has demonstrated that moonlighting proteins can either perform both functions at the same time, or through conformational changes, alternate between two states, each of which is able to perform a separate function. For example, the protein DegP plays a role in proteolysis with higher temperatures and is involved in refolding functions at lower temperatures. Lastly, these crystal structures have shown that the second function may negatively affect the first function in some moonlighting proteins. As seen in ƞ-crystallin, the second function of a protein can alter the structure, decreasing the flexibility, which in turn can impair enzymatic activity somewhat.

Identification methods

Moonlighting proteins have usually been identified by chance because there is no clear procedure to identify secondary moonlighting functions. Despite such difficulties, the number of moonlighting proteins that have been discovered is rapidly increasing. Furthermore, moonlighting proteins appear to be abundant in all kingdoms of life.

Various methods have been employed to determine a protein's function including secondary moonlighting functions. For example, the tissue, cellular, or subcellular distribution of a protein may provide hints as to the function. Real-time PCR is used to quantify mRNA and hence infer the presence or absence of a particular protein which is encoded by the mRNA within different cell types. Alternatively immunohistochemistry or mass spectrometry can be used to directly detect the presence of proteins and determine in which subcellular locations, cell types, and tissues a particular protein is expressed. 

Mass spectrometry may be used to detect proteins based on their mass-to-charge ratio. Because of alternative splicing and posttranslational modification, identification of proteins based on the mass of the parent ion alone is very difficult. However tandem mass spectrometry in which each of the parent peaks is in turn fragmented can be used to unambiguously identify proteins. Hence tandem mass spectrometry is one of the tools used in proteomics to identify the presence of proteins in different cell types or subcellular locations. While the presence of a moonlighting protein in an unexpected location may complicate routine analyses, at the same time, the detection of a protein in unexpected multiprotein complexes or locations suggests that protein may have a moonlighting function. Furthermore, mass spectrometry may be used to determine if a protein has high expression levels that do not correlate to the enzyme's measured metabolic activity. These expression levels may signify that the protein is performing a different function than previously known.

The structure of a protein can also help determine its functions. Protein structure in turn may be elucidated with various techniques including X-ray crystallography or NMR. Dual polarization interferometry may be used to measure changes in protein structure which may also give hints to the protein's function. Finally, application of systems biology approaches such as interactomics give clues to a proteins function based on what it interacts with.

Higher order multifunctionality

In the case of the glycolytic enzyme glyceraldehyde-3-phosphate dehydrogenase (GAPDH), in addition to the large number of alternate functions it has also been observed that it can be involved in the same function by multiple means (multifunctionality within multifunctionality). For example, in its role in maintenance of cellular iron homeostasis GAPDH can function to import or extrude iron from cells. Moreover, in case of its iron import activities it can traffic into cells holo-transferrin as well as the related molecule lactoferrin by multiple pathways.

Example

Crystallins

A crystallin from ducks that exhibits argininosuccinate lyase activity and is a key structural component in eye lenses, an example of gene sharing

In the case of crystallins, the genes must maintain sequences for catalytic function and transparency maintenance function. The abundant lens crystallins have been generally viewed as static proteins serving a strictly structural role in transparency and cataract. However, recent studies have shown that the lens crystallins are much more diverse than previously recognized and that many are related or identical to metabolic enzymes and stress proteins found in numerous tissues. Unlike other proteins performing highly specialized tasks, such as globin or rhodopsin, the crystallins are very diverse and show numerous species differences. Essentially all vertebrate lenses contain representatives of the α and β/γ crystallins, the "Ubiquitous crystallins", which are themselves heterogeneous, and only few species or selected taxonomic groups use entirely different proteins as lens crystallins.This paradox of crystallins being highly conserved in sequence while extremely diverse in number and distribution shows that many crystallins have vital functions outside the lens and cornea, and this multi-functionality of the crystallins is achieved by gene sharing.

Gene regulation

Crystallin recruitment may occur by changes in gene regulation that leads to high lens expression. One such example is gluthathione S-transferase/S11-crystallin that was specialized for lens expression by change in gene regulation and gene duplication. The fact that similar transcriptional factors such as Pax-6, and retinoic acid receptors, regulate different crystalline genes, suggests that lens-specific expression have played a crucial role for recruiting multifunctional protein as crystallins. Crystallin recruitment has occurred both with and without gene duplication, and tandem gene duplication has taken place among some of the crystallins with one of the duplicates specializing for lens expression. Ubiquitous α –crystallins and bird δ –crystallins are two examples.

Alpha crystallins

The α-crystallins, which contributed to the discovery of crystallins as borrowed proteins, have continually supported the theory of gene sharing, and helped delineating the mechanisms used for gene sharing as well. There are two α-crystallin genes (αA and αB), which are about 55% identical in amino acid sequence. Expression studies in non-lens cells showed that the αB-crystallin, other than being a functional lens protein, is a functional small heat shock protein. αB-crystallin is induced by heat and other physiological stresses, and it can protect the cells from elevated temperatures and hypertonic stress. αB-crystallin is also over-expressed in many pathologies, including neurodegenerative diseases, fibroblasts of patients with Werner's disease showing premature senescence, and growth abnormalities. In addition to being over-expressed under abnormal conditions, αB-crystallin is constitutively expressed in heart, skeletal muscle, kidney, lung and many other tissues. In contrast to αB-crystallin, except for low-level expression in the thymus, spleen and retina, αA-crystallin is highly specialized for expression in the lens and is not stress-inducible. However, like αB-crystallin, it can also function as molecular chaperone and protect against thermal stress.

Beta/gamma-crystallins

β/γ-crystallins are different from α-crystallins in that they are a large multigene family. Other proteins like bacterial spore coat, a slime mold cyst protein, and epidermis differentiation-specific protein, contain the same Greek key motifs and are placed under β/γ crystallin superfamily. This relationship supports the idea that β/γ- crystallins have been recruited by a gene-sharing mechanism. However, except for few reports, non-refractive function of the β/γ-crystallin is yet to be found.

Corneal crystallins

Similar to lens, cornea is a transparent, avascular tissue derived from the ectoderm that is responsible for focusing light onto the retina. However, unlike lens, cornea depends on the air-cell interface and its curvature for refraction. Early immunology studies have shown that BCP 54 comprises 20–40% of the total soluble protein in bovine cornea. Subsequent studies have indicated that BCP 54 is ALDH3, a tumor and xenobiotic-inducible cytosolic enzyme, found in human, rat, and other mammals.

Non refractive roles of crystallins in lens and cornea

While it is evident that gene sharing resulted in many of lens crystallins being multifunctional proteins, it is still uncertain to what extent the crystallins use their non-refractive properties in the lens, or on what basis they were selected. The α-crystallins provide a convincing case for a lens crystallin using its non-refractive ability within the lens to prevent protein aggregation under a variety of environmental stresses and to protect against enzyme inactivation by post-translational modifications such as glycation. The α-crystallins may also play a functional role in the stability and remodeling of the cytoskeleton during fiber cell differentiation in the lens. In cornea, ALDH3 is also suggested to be responsible for absorbing UV-B light.

Co-evolution of lens and cornea through gene sharing

Based on the similarities between lens and cornea, such as abundant water-soluble enzymes, and being derived from ectoderm, the lens and cornea are thought to be co-evolved as a "refraction unit." Gene sharing would maximize light transmission and refraction to the retina by this refraction unit. Studies have shown that many water-soluble enzymes/proteins expressed by cornea are identical to taxon-specific lens crystallins, such as ALDH1A1/ η-crystallin, α-enolase/τ-crystallin, and lactic dehydrogenase/ -crystallin. Also, the anuran corneal epithelium, which can transdifferentiate to regenerate the lens, abundantly expresses ubiquitous lens crystallins, α, β and γ, in addition to the taxon-specific crystallin α-enolase/τ-crystallin. Overall, the similarity in expression of these proteins in the cornea and lens, both in abundance and taxon-specificity, supports the idea of co-evolution of lens and cornea through gene sharing.

Relationship to similar concepts

Gene sharing is related to, but distinct from, several concepts in genetics, evolution, and molecular biology. Gene sharing entails multiple effects from the same gene, but unlike pleiotropy, it necessarily involves separate functions at the molecular level. A gene could exhibit pleiotropy when single enzyme function affects multiple phenotypic traits; mutations of a shared gene could potentially affect only a single trait. Gene duplication followed by differential mutation is another phenomenon thought to be a key element in the evolution of protein function, but in gene sharing, there is no divergence of gene sequence when proteins take on new functions; the single polypeptide takes on new roles while retaining old ones. Alternative splicing can result in the production of multiple polypeptides (with multiple functions) from a single gene, but by definition, gene sharing involves multiple functions of a single polypeptide.

Clinical significance

The multiple roles of moonlighting proteins complicates the determination of phenotype from genotype, hampering the study of inherited metabolic disorders

The complex phenotypes of several disorders are suspected to be caused by the involvement of moonlighting proteins. The protein GAPDH has at least 11 documented functions, one of which includes apoptosis. Excessive apoptosis is involved in many neurodegenerative diseases, such as Huntington's, Alzheimer's, and Parkinson's as well as in brain ischemia. In one case, GAPDH was found in the degenerated neurons of individuals who had Alzheimer's disease.

Although there is insufficient evidence for definite conclusions, there are well documented examples of moonlighting proteins that play a role in disease. One such disease is tuberculosis. One moonlighting protein in the bacterium M. tuberculosis has a function which counteracts the effects of antibiotics. Specifically, M. tuberculosis gains antibiotic resistance against ciprofloxacin from overexpression of Glutamate racemase in vivo. GAPDH localized to the surface of pathogenic mycobacteriea has been shown to capture and traffic the mammalian iron carrier protein transferrin into cells resulting in iron acquisition by the pathogen.

United States labor law

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Uni...