A Medley of Potpourri

Wednesday, December 19, 2018

Convergent evolution (updated)

From Wikipedia, the free encyclopedia

Two succulent plant genera, Euphorbia and Astrophytum, are only distantly related, but the species within each have converged on a similar body form.

Convergent evolution is the independent evolution of similar features in species of different lineages. Convergent evolution creates analogous structures that have similar form or function but were not present in the last common ancestor of those groups. The cladistic term for the same phenomenon is homoplasy. The recurrent evolution of flight is a classic example, as flying insects, birds, pterosaurs, and bats have independently evolved the useful capacity of flight. Functionally similar features that have arisen through convergent evolution are analogous, whereas homologous structures or traits have a common origin but can have dissimilar functions. Bird, bat, and pterosaur wings are analogous structures, but their forelimbs are homologous, sharing an ancestral state despite serving different functions.

The opposite of convergence is divergent evolution, where related species evolve different traits. Convergent evolution is similar to parallel evolution, which occurs when two independent species evolve in the same direction and thus independently acquire similar characteristics; for instance, gliding frogs have evolved in parallel from multiple types of tree frog.

Many instances of convergent evolution are known in plants, including the repeated development of C₄ photosynthesis, seed dispersal by fleshy fruits adapted to be eaten by animals, and carnivory.

Overview

Homology and analogy in mammals and insects: on the horizontal axis, the structures are homologous in morphology, but different in function due to differences in habitat. On the vertical axis, the structures are analogous in function due to similar lifestyles but anatomically different with different phylogeny.

In morphology, analogous traits arise when different species live in similar ways and/or a similar environment, and so face the same environmental factors. When occupying similar ecological niches (that is, a distinctive way of life) similar problems can lead to similar solutions. The British anatomist Richard Owen was the first to identify the fundamental difference between analogies and homologies.

In biochemistry, physical and chemical constraints on mechanisms have caused some active site arrangements such as the catalytic triad to evolve independently in separate enzyme superfamilies.

In his 1989 book Wonderful Life, Stephen Jay Gould argued that if one could "rewind the tape of life [and] the same conditions were encountered again, evolution could take a very different course". Simon Conway Morris disputes this conclusion, arguing that convergence is a dominant force in evolution, and given that the same environmental and physical constraints are at work, life will inevitably evolve toward an "optimum" body plan, and at some point, evolution is bound to stumble upon intelligence, a trait presently identified with at least primates, corvids, and cetaceans.

Distinctions

Cladistics

In cladistics, a homoplasy is a trait shared by two or more taxa for any reason other than that they share a common ancestry. Taxa which do share ancestry are part of the same clade; cladistics seeks to arrange them according to their degree of relatedness to describe their phylogeny. Homoplastic traits caused by convergence are therefore, from the point of view of cladistics, confounding factors which could lead to an incorrect analysis.

Atavism

In some cases, it is difficult to tell whether a trait has been lost and then re-evolved convergently, or whether a gene has simply been switched off and then re-enabled later. Such a re-emerged trait is called an atavism. From a mathematical standpoint, an unused gene (selectively neutral) has a steadily decreasing probability of retaining potential functionality over time. The time scale of this process varies greatly in different phylogenies; in mammals and birds, there is a reasonable probability of remaining in the genome in a potentially functional state for around 6 million years.

Parallel vs. convergent evolution

Evolution at an amino acid position. In each case, the left-hand species changes from having alanine (A) at a specific position in a protein in a hypothetical ancestor, and now has serine (S) there. The right-hand species may undergo divergent, parallel, or convergent evolution at this amino acid position relative to the first species.

When two species are similar in a particular character, evolution is defined as parallel if the ancestors were also similar, and convergent if they were not. Some scientists have argued that there is a continuum between parallel and convergent evolution, while others maintain

When the ancestral forms are unspecified or unknown, or the range of traits considered is not clearly specified, the distinction between parallel and convergent evolution becomes more subjective. For instance, the striking example of similar placental and marsupial forms is described by Richard Dawkins in The Blind Watchmaker as a case of convergent evolution, because mammals on each continent had a long evolutionary history prior to the extinction of the dinosaurs under which to accumulate relevant differences.

At molecular level

Evolutionary convergence of serine and cysteine protease towards the same catalytic triads organisation of acid-base-nucleophile in different protease superfamilies. Shown are the triads of subtilisin, prolyl oligopeptidase, TEV protease, and papain.

Protease active sites

The enzymology of proteases provides some of the clearest examples of convergent evolution. These examples reflect the intrinsic chemical constraints on enzymes, leading evolution to converge on equivalent solutions independently and repeatedly.

Serine and cysteine proteases use different amino acid functional groups (alcohol or thiol) as a nucleophile. In order to activate that nucleophile, they orient an acidic and a basic residue in a catalytic triad. The chemical and physical constraints on enzyme catalysis have caused identical triad arrangements to evolve independently more than 20 times in different enzyme superfamilies.

Threonine proteases use the amino acid threonine as their catalytic nucleophile. Unlike cysteine and serine, threonine is a secondary alcohol (i.e. has a methyl group). The methyl group of threonine greatly restricts the possible orientations of triad and substrate, as the methyl clashes with either the enzyme backbone or the histidine base. Consequently, most threonine proteases use an N-terminal threonine in order to avoid such steric clashes. Several evolutionarily independent enzyme superfamilies with different protein folds use the N-terminal residue as a nucleophile. This commonality of active site but difference of protein fold indicates that the active site evolved convergently in those families.

Nucleic acids

Convergence occurs at the level of DNA and the amino acid sequences produced by translating structural genes into proteins. Studies have found convergence in amino acid sequences in echolocating bats and the dolphin; among marine mammals; between giant and red pandas; and between the thylacine and canids. Convergence has also been detected in a type of non-coding DNA, cis-regulatory elements, such as in their rates of evolution; this could indicate either positive selection or relaxed purifying selection.

In animal morphology

Dolphins and ichthyosaurs converged on many adaptations for fast swimming.

Bodyplans

Swimming animals including fish such as herrings, marine mammals such as dolphins, and ichthyosaurs (of the Mesozoic) all converged on the same streamlined shape. The fusiform bodyshape (a tube tapered at both ends) adopted by many aquatic animals is an adaptation to enable them to travel at high speed in a high drag environment. Similar body shapes are found in the earless seals and the eared seals: they still have four legs, but these are strongly modified for swimming.

The marsupial fauna of Australia and the placental mammals of the Old World have several strikingly similar forms, developed in two clades, isolated from each other. The body and especially the skull shape of the thylacine (Tasmanian wolf) converged with those of Canidae such as the red fox, Vulpes vulpes.

Convergence of marsupial and placental mammals:

Red fox skeleton
Skulls of thylacine (left), timber wolf (right)
Thylacine skeleton

Echolocation

As a sensory adaptation, echolocation has evolved separately in cetaceans (dolphins and whales) and bats, but from the same genetic mutations.

Eyes

The camera eyes of vertebrates (left) and cephalopods (right) developed independently and are wired differently; for instance, optic nerve fibres reach the vertebrate retina from the front, creating a blind spot.

One of the best-known examples of convergent evolution is the camera eye of cephalopods (such as squid and octopus), vertebrates (including mammals) and cnidaria (such as jellyfish). Their last common ancestor had at most a simple photoreceptive spot, but a range of processes led to the progressive refinement of camera eyes — with one sharp difference: the cephalopod eye is "wired" in the opposite direction, with blood and nerve vessels entering from the back of the retina, rather than the front as in vertebrates. As a result, cephalopods lack a blind spot.

Flight

Vertebrate wings are partly homologous (from forelimbs), but analogous as organs of flight in (1) pterosaurs, (2) bats, (3) birds, evolved separately.

Birds and bats have homologous limbs because they are both ultimately derived from terrestrial tetrapods, but their flight mechanisms are only analogous, so their wings are examples of functional convergence. The two groups have powered flight, evolved independently. Their wings differ substantially in construction. The bat wing is a membrane stretched across four extremely elongated fingers and the legs. The airfoil of the bird wing is made of feathers, strongly attached to the forearm (the ulna) and the highly fused bones of the wrist and hand (the carpometacarpus), with only tiny remnants of two fingers remaining, each anchoring a single feather. So, while the wings of bats and birds are functionally convergent, they are not anatomically convergent. Birds and bats also share a high concentration of cerebrosides in the skin of their wings. This improves skin flexibility, a trait useful for flying animals; other mammals have a far lower concentration. The extinct pterosaurs independently evolved wings from their fore- and hindlimbs, while insects have wings that evolved separately from different organs.

Flying squirrels and sugar gliders are much alike in their body plans, with gliding wings stretched between their limbs, but flying squirrels are placental mammals while sugar gliders are marsupials, widely separated within the mammal lineage.

Insect mouthparts

Insect mouthparts show many examples of convergent evolution. The mouthparts of different insect groups consist of a set of homologous organs, specialised for the dietary intake of that insect group. Convergent evolution of many groups of insects led from original biting-chewing mouthparts to different, more specialised, derived function types. These include, for example, the proboscis of flower-visiting insects such as bees and flower beetles, or the biting-sucking mouthparts of blood-sucking insects such as fleas and mosquitos.

Opposable thumbs

Opposable thumbs allowing the grasping of objects are most often associated with primates, like humans, monkeys, apes, and lemurs. Opposable thumbs also evolved in giant pandas, but these are completely different in structure, having six fingers including the thumb, which develops from a wrist bone entirely separately from other fingers.

Primates


Despite the similar lightening of skin colour after moving out of Africa, different genes were involved in European (left) and East-Asian (right) lineages.

Convergent evolution in humans includes blue eye colour and light skin colour. When humans migrated out of Africa, they moved to more northern latitudes with less intense sunlight. It was beneficial to them to reduce their skin pigmentation. It appears certain that there was some lightening of skin colour before European and East Asian lineages diverged, as there are some skin-lightening genetic differences that are common to both groups. However, after the lineages diverged and became genetically isolated, the skin of both groups lightened more, and that additional lightening was due to different genetic changes.

Humans	Lemurs

Despite the similarity of appearance, the genetic basis of blue eyes is different in humans and lemurs.

Lemurs and humans are both primates. Ancestral primates had brown eyes, as most primates do today. The genetic basis of blue eyes in humans has been studied in detail and much is known about it. It is not the case that one gene locus is responsible, say with brown dominant to blue eye colour. However, a single locus is responsible for about 80% of the variation. In lemurs, the differences between blue and brown eyes are not completely known, but the same gene locus is not involved.^[42]

In plants

In myrmecochory, seeds such as those of Chelidonium majus have a hard coating and an attached oil body, an elaiosome, for dispersal by ants.

Carbon fixation

While convergent evolution is often illustrated with animal examples, it has often occurred in plant evolution. For instance, C₄ photosynthesis, one of the three major carbon-fixing biochemical processes, has arisen independently up to 40 times. About 7,600 plant species of angiosperms use C₄ carbon fixation, with many monocots including 46% of grasses such as maize and sugar cane, and dicots including several species in the Chenopodiaceae and the Amaranthaceae.

Fruits

A good example of convergence in plants is the evolution of edible fruits such as apples. These pomes incorporate (five) carpels and their accessory tissues forming the apple's core, surrounded by structures from outside the botanical fruit, the receptacle or hypanthium. Other edible fruits include other plant tissues; for example, the fleshy part of a tomato is the walls of the pericarp. This implies convergent evolution under selective pressure, in this case the competition for seed dispersal by animals through consumption of fleshy fruits.

Seed dispersal by ants (myrmecochory) has evolved independently more than 100 times, and is present in more than 11,000 plant species. It is one of the most dramatic examples of convergent evolution in biology.

Carnivory

Carnivory has evolved multiple times independently in plants in widely separated groups. In three species studied, Cephalotus follicularis, Nepenthes alata and Sarracenia purpurea, there has been convergence at the molecular level. Carnivorous plants secrete enzymes into the digestive fluid they produce. By studying phosphatase, glycoside hydrolase, glucanase, RNAse and chitinase enzymes as well as a pathogenesis-related protein and a thaumatin-related protein, the authors found many convergent amino acid substitutions. These changes were not at the enzymes' catalytic sites, but rather on the exposed surfaces of the proteins, where they might interact with other components of the cell or the digestive fluid. The authors also found that homologous genes in the non-carnivorous plant Arabidopsis thaliana tend to have their expression increased when the plant is stressed, leading the authors to suggest that stress-responsive proteins have often been co-opted^[c] in the repeated evolution of carnivory.

Methods of inference

Angiosperm phylogeny of orders based on classification by the Angiosperm Phylogeny Group. The figure shows the number of inferred independent origins of C₃-C₄ photosynthesis and C₄ photosynthesis in parentheses.

Phylogenetic reconstruction and ancestral state reconstruction proceed by assuming that evolution has occurred without convergence. Convergent patterns may, however, appear at higher levels in a phylogenetic reconstruction, and are sometimes explicitly sought by investigators. The methods applied to infer convergent evolution depend on whether pattern-based or process-based convergence is expected. Pattern-based convergence is the broader term, for when two or more lineages independently evolve patterns of similar traits. Process-based convergence is when the convergence is due to similar forces of natural selection.

Pattern-based measures

Earlier methods for measuring convergence incorporate ratios of phenotypic and phylogenetic distance by simulating evolution with a Brownian motion model of trait evolution along a phylogeny. More recent methods also quantify the strength of convergence. One drawback to keep in mind is that these methods can confuse long-term stasis with convergence due to phenotypic similarities. Stasis occurs when there is little evolutionary change among taxa.

Distance-based measures assess the degree of similarity between lineages over time. Frequency-based measures assess the number of lineages that have evolved in a particular trait space.

Process-based measures

Methods to infer process-based convergence fit models of selection to a phylogeny and continuous trait data to determine whether the same selective forces have acted upon lineages. This uses the Ornstein-Uhlenbeck (OU) process to test different scenarios of selection. Other methods rely on an a priori specification of where shifts in selection have occurred.

Catalytic triad

From Wikipedia, the free encyclopedia

The enzyme TEV protease contains an example of a catalytic triad of residues (red) in its active site. The triad consists of an aspartate (acid), histidine (base) and serine (nucleophile). The substrate (black) is bound by the binding site to orient it next to the triad. PDB: 1lvm

A catalytic triad is a set of three coordinated amino acids that can be found in the active site of some enzymes. Catalytic triads are most commonly found in hydrolase and transferase enzymes (e.g. proteases, amidases, esterases, acylases, lipases and β-lactamases). An Acid-Base-Nucleophile triad is a common motif for generating a nucleophilic residue for covalent catalysis. The residues form a charge-relay network to polarise and activate the nucleophile, which attacks the substrate, forming a covalent intermediate which is then hydrolysed to release the product and regenerate free enzyme. The nucleophile is most commonly a serine or cysteine amino acid, but occasionally threonine or even selenocysteine. The 3D structure of the enzyme brings together the triad residues in a precise orientation, even though they may be far apart in the sequence (primary structure).

As well as divergent evolution of function (and even the triad's nucleophile), catalytic triads show some of the best examples of convergent evolution. Chemical constraints on catalysis have led to the same catalytic solution independently evolving in at least 23 separate superfamilies. Their mechanism of action is consequently one of the best studied in biochemistry.

History

The enzymes trypsin and chymotrypsin were first purified in the 1930s. A serine in each of trypsin and chymotrypsin was identified as the catalytic nucleophile (by diisopropyl fluorophosphate modification) in the 1950s. The structure of chymotrypsin was solved by X-ray crystallography in the 1960s, showing the orientation of the catalytic triad in the active site. Other proteases were sequenced and aligned to reveal a family of related proteases, now called the S1 family. Simultaneously, the structures of the evolutionarily unrelated papain and subtilisin proteases were found to contain analogous triads. The 'charge-relay' mechanism for the activation of the nucleophile by the other triad members was proposed in the late 1960s. As more protease structures were solved by X-ray crystallography in the 1970s and 80s, homologous (such as TEV protease) and analogous (such as papain) triads were found. The MEROPS classification system in the 1990s and 2000s began classing proteases into structurally related enzyme superfamilies and so acts as a database of the convergent evolution of triads in over 20 superfamilies. Understanding how chemical constraints on evolution led to the convergence of so many enzyme families on the same triad geometries has developed in the 2010s.

Since their initial discovery, there have been increasingly detailed investigations of their exact catalytic mechanism. Of particular contention in the 1990s and 2000s was whether low-barrier hydrogen bonding contributed to catalysis, or whether ordinary hydrogen bonding is sufficient to explain the mechanism. The massive body of work on the charge-relay, covalent catalysis used by catalytic triads has led to the mechanism being the best characterised in all of biochemistry.

Function

Enzymes that contain a catalytic triad use it for one of two reaction types: either to split a substrate (hydrolases) or to transfer one portion of a substrate over to a second substrate (transferases). Triads are an inter-dependent set of residues in the active site of an enzyme and act in concert with other residues (e.g. binding site and oxyanion hole) to achieve nucleophilic catalysis. These triad residues act together to make the nucleophile member highly reactive, generating a covalent intermediate with the substrate that is then resolved to complete catalysis.

Mechanism

Catalytic triads perform covalent catalysis using a residue as a nucleophile. The reactivity of the nucleophilic residue is increased by the functional groups of the other triad members. The nucleophile is polarised and oriented by the base, which is itself bound and stabilised by the acid.

Catalysis is performed in two stages. First, the activated nucleophile attacks the carbonyl carbon and forces the carbonyl oxygen to accept an electron, leading to a tetrahedral intermediate. The build-up of negative charge on this intermediate is typically stabilized by an oxyanion hole within the active site. The intermediate then collapses back to a carbonyl, ejecting the first half of the substrate, but leaving the second half still covalently bound to the enzyme as an acyl-enzyme intermediate. The ejection of this first leaving group is often aided by donation of a proton by the base.

The second stage of catalysis is the resolution of the acyl-enzyme intermediate by the attack of a second substrate. If this substrate is water then the result is hydrolysis; if it is an organic molecule then the result is transfer of that molecule onto the first substrate. Attack by this second substrate forms a new tetrahedral intermediate, which resolves by ejecting the enzyme's nucleophile, releasing the second product and regenerating free enzyme.

General reaction mechanism of catalysed by a catalytic triad (black): nucleophilic substitution at a carbonyl substrate (red) by a second substrate (blue). First, the enzyme's nucleophile (X) attacks the carbonyl to form a covalently linked acyl-enzyme intermediate. This intermediate is then attacked by the second substrate's nucleophile (X'). If the second nucleophile is the hydroxyl of water, the result is hydrolysis, otherwise the result is group transfer of X'.

Identity of triad members

A catalytic triad charge-relay system as commonly found in proteases. The acid residue (commonly glutamate or aspartate) aligns and polarises the base (usually histidine) which activates the nucleophile (often serine or cysteine, occasionally threonine). The triad reduces the pK_a of the nucleophilic residue which then attacks the substrate. An oxyanion hole of positively charged usually backbone amides (occasionally side-chains) stabilise charge build-up on the substrate transition state.

Nucleophile

The side-chain of the nucleophilic residue performs covalent catalysis on the substrate. The lone pair of electrons present on the oxygen or sulphur attacks the electropositive carbonyl carbon. The 20 naturally occurring biological amino acids do not contain any sufficiently nucleophilic functional groups for many difficult catalytic reactions. Embedding the nucleophile in a triad increases its reactivity for efficient catalysis. The most commonly used nucleophiles are the hydroxyl (OH) of serine and the thiol/thiolate ion (SH/S⁻) of cysteine. Alternatively, threonine proteases use the secondary hydroxyl of threonine, however due to steric hindrance of the side chain's extra methyl group such proteases use their N-terminal amide as the base, rather than a separate amino acid.

Use of oxygen or sulphur as the nucleophilic atom causes minor differences in catalysis. Compared to oxygen, sulphur’s extra d orbital makes it larger (by 0.4 Å) and softer, allows it to form longer bonds (d_C-X and d_X-H by 1.3-fold), and gives it a lower pK_a (by 5 units). Serine is therefore more dependent than cysteine on optimal orientation of the acid-base triad members to reduce its pK_a in order to achieve concerted deprotonation with catalysis. The low pK_a of cysteine works to its disadvantage in the resolution of the first tetrahedral intermediate as unproductive reversal of the original nucleophilic attack is the more favourable breakdown product. The triad base is therefore preferentially oriented to protonate the leaving group amide to ensure that it is ejected to leave the enzyme sulphur covalently bound to the substrate N-terminus. Finally, resolution of the acyl-enzyme (to release the substrate C-terminus) requires serine to be re-protonated whereas cysteine can leave as S⁻. Sterically, the sulphur of cysteine also forms longer bonds and has a bulkier van der Waals radius and if mutated to serine can be trapped in unproductive orientations in the active site.

Very rarely, the selenium atom of the uncommon amino acid selenocysteine is used as a nucleophile. The deprotonated Se⁻ state is strongly favoured when in a catalytic triad.

Base

Since no natural amino acids are strongly nucleophilic, the base in a catalytic triad polarises and deprotonates the nucleophile to increase its reactivity. Additionally, it protonates the first product to aid leaving group departure.

The base is most commonly histidine since its pK_a allows for effective base catalysis, hydrogen bonding to the acid residue, and deprotonation of the nucleophile residue. β-lactamases such as TEM-1 use a lysine residue as the base. Because lysine's pK_a is so high (pK_a=11), a glutamate and several other residues act as the acid to stabilise its deprotonated state during the catalytic cycle. Threonine proteases use their N-terminal amide as the base, since steric crowding by the catalytic threonine's methyl prevents other residues from being close enough.

Acid

The acidic triad member forms a hydrogen bond with the basic residue. This aligns the basic residue by restricting its side-chain rotation, and polarises it by stabilising its positive charge. Two amino acids have acidic side chains at physiological pH (aspartate or glutamate) and so are the most commonly used for this triad member. Cytomegalovirus protease uses a pair of histidines, one as the base, as usual, and one as the acid. The second histidine is not as effective an acid as the more common aspartate or glutamate, leading to a lower catalytic efficiency. In some enzymes, the acid member of the triad is less necessary and some act only as a dyad. For example, papain uses asparagine as its third triad member which orients the histidine base but does not act as an acid. Similarly, hepatitis A virus protease contains an ordered water in the position where an acid residue should be.

Examples of triads

The range of amino acid residues used in different combinations in different enzymes to make up a catalytic triad for hydrolysis. On the left are the nucleophile, base and acid triad members. On the right are different substrates with the cleaved bond indicated by a pair of scissors. Two different bonds in beta-lactams can be cleaved (1 by penicillin acylase and 2 by beta-lactamase).

Ser-His-Asp

The Serine-Histidine-Aspartate motif is one of the most thoroughly characterised catalytic motifs in biochemistry. The triad is exemplified by chymotrypsin, a model serine protease from the PA superfamily which uses its triad to hydrolyse protein backbones. The aspartate is hydrogen bonded to the histidine, increasing the pK_a of its imidazole nitrogen from 7 to around 12. This allows the histidine to act as a powerful general base and to activate the serine nucleophile. It also has an oxyanion hole consisting of several backbone amides which stabilises charge build-up on intermediates. The histidine base aids the first leaving group by donating a proton, and also activates the hydrolytic water substrate by abstracting a proton as the remaining OH⁻ attacks the acyl-enzyme intermediate.

The same triad has also convergently evolved in α/β hydrolases such as some lipases and esterases, however orientation of the triad members is reversed. Additionally, brain acetyl hydrolase (which has the same fold as a small G-protein) has also been found to have this triad. The equivalent Ser-His-Glu triad is used in acetylcholinesterase.

Cys-His-Asp

The second most studied triad is the Cysteine-Histidine-Aspartate motif. Several families of cysteine proteases use this triad set, for example TEV protease and papain. The triad acts similarly to serine protease triads, with a few notable differences. Due to cysteine's low pK_a, the importance of the Asp to catalysis varies and several cysteine proteases are effectively Cys-His dyads (e.g. hepatitis A virus protease), whilst in others the cysteine is already deprotonated before catalysis begins (e.g. papain). This triad is also used by some amidases, such as N-glycanase to hydrolyse non-peptide C-N bonds.

Ser-His-His

The triad of cytomegalovirus protease uses histidine as both the acid and base triad members. Removing the acid histidine results in only a 10-fold activity loss (compared to >10,000-fold when aspartate is removed from chymotrypsin). This triad has been interpreted as a possible way of generating a less active enzyme to control cleavage rate.

Ser-Glu-Asp

An unusual triad is found in seldolisin proteases. The low pK_a of the glutamate carboxylate group means that it only acts as a base in the triad at very low pH. The triad is hypothesised to be an adaptation to specific environments like acidic hot springs (e.g. kumamolysin) or cell lysosome (e.g. tripeptidyl peptidase).

Cys-His-Ser

The endothelial protease vasohibin uses a cysteine as the nucleophile, but a serine to coordinate the histidine base. Despite the serine being a poor acid, it is still effective in orienting the histidine in the catalytic triad. Some homologues alternatively have a threonine instead of serine at the acid location.

Thr-Nter, Ser-Nter and Cys-Nter

Threonine proteases, such as the proteasome protease subunit and ornithine acyltransferases use the secondary hydroxyl of threonine in a manner analogous to the use of the serine primary hydroxyl. However, due to the steric interference of the extra methyl group of threonine, the base member of the triad is the N-terminal amide which polarises an ordered water which, in turn, deprotonates the catalytic hydroxyl to increase its reactivity. Similarly, there exist equivalent 'serine only' and 'cysteine only' configurations such as penicillin acylase G and penicillin acylase V which are evolutionarily related to the proteasome proteases. Again, these use their N-terminal amide as a base.

Ser-cisSer-Lys

This unusual triad occurs only in one superfamily of amidases. In this case, the lysine acts to polarise the middle serine. The middle serine then forms two strong hydrogen bonds to the nucleophilic serine to activate it (one with the side chain hydroxyl and the other with the backbone amide). The middle serine is held in an unusual cis orientation to facilitate precise contacts with the other two triad residues. The triad is further unusual in that the lysine and cis-serine both act as the base in activating the catalytic serine, but the same lysine also performs the role of the acid member as well as making key structural contacts.

Sec-His-Glu

The rare, but naturally occurring amino acid selenocysteine (Sec), can also be found as the nucleophile in some catalytic triads. Selenocysteine is similar to cysteine, but contains a selenium atom instead of a sulphur. An example is in the active site of thioredoxin reductase, which uses the selenium for reduction of disulphide in thioredoxin.

Engineered triads

In addition to naturally occurring types of catalytic triads, protein engineering has been used to create enzyme variants with non-native amino acids, or entirely synthetic amino acids. Catalytic triads have also been inserted into otherwise non-catalytic proteins, or protein mimics.

Subtilisin (a serine protease) has had its oxygen nucleophile replaced with each of sulphur, selenium, or tellurium. Cysteine and selenocysteine were inserted by mutagenesis, whereas the non-natural amino acid, tellurocysteine, was inserted using auxotrophic cells fed with synthetic tellurocysteine. These elements are all in the 16th periodic table column (chalcogens), so have similar properties. In each case, changing the nucleophile reduced the enzyme's protease activity, but increased a different activity. A sulphur nucleophile improved the enzymes transferase activity (sometimes called subtiligase). Selenium and tellurium nucleophiles converted the enzyme into a oxidoreductase. When the nucleophile of TEV protease was converted from cysteine to serine, it protease activity was strongly reduced, but was able to be restored by directed evolution.

Non-catalytic proteins have been used as scaffolds, having catalytic triads inserted into them which were then improved by directed evolution. The Ser-His-Asp triad has been inserted into an antibody, as well as a range of other proteins. Similarly, catalytic triad mimics have been created in small organic molecules like diaryl diselenide, and displayed on larger polymers like Merrifield resins, and self-assembling short peptide nanostructures.

Divergent evolution

The sophistication of the active site network causes residues involved in catalysis (and residues in contact with these) to be highly evolutionarily conserved. However, there are examples of divergent evolution in catalytic triads, both in the reaction catalysed, and the residues used in catalysis. The triad remains the core of the active site, but it is evolutionarily adapted to serve different functions. Some proteins, called pseudoenzymes, have non-catalytic functions (e.g. regulation by inhibitory binding) and have accumulated mutations that inactivate their catalytic triad.

Reaction changes

Catalytic triads perform covalent catalysis via an acyl-enzyme intermediate. If this intermediate is resolved by water, the result is hydrolysis of the substrate. However, if the intermediate is resolved by attack by a second substrate, then the enzyme acts as a transferase. For example, attack by an acyl group results in an acyltransferase reaction. Several families of transferase enzymes have evolved from hydrolases by adaptation to exclude water and favour attack of a second substrate. In different members of the α/β-hydrolase superfamily, the Ser-His-Asp triad is tuned by surrounding residues to perform at least 17 different reactions. Some of these reactions are also achieved with mechanisms that have altered formation, or resolution of the acyl-enzyme intermediate, or that don't proceed via an acyl-enzyme intermediate.

Additionally, an alternative transferase mechanism has been evolved by amidophosphoribosyltransferases, which has two active sites. In the first active site, a cysteine triad hydrolyses a glutamine substrate to release free ammonia. The ammonia then diffuses though an internal tunnel in the enzyme to the second active site, where it is transferred to a second substrate.

Nucleophile changes

Divergent evolution of PA clan proteases to use different nucleophiles in their catalytic triad. Shown are the serine triad of chymotrypsin and the cysteine triad of TEV protease.

Divergent evolution of active site residues is slow, due to strong chemical constraints. Nevertheless, some protease superfamilies have evolved from one nucleophile to another. This can be inferred when a superfamily (with the same fold) contains families that use different nucleophiles. Such nucleophile switches have occurred several times during evolutionary history, however the mechanisms by which this happen are still unclear.

Within protease superfamilies that contain a mixture of nucleophiles (e.g. the PA clan), families are designated by their catalytic nucleophile (C=cysteine proteases, S=serine proteases).

**Superfamilies containing a mixture of families that use different nucleophiles**
Superfamily	Families	Examples
PA clan	C3, C4, C24, C30, C37, C62, C74, C99	TEV protease (Tobacco etch virus)
PA clan	S1, S3, S6, S7, S29, S30, S31, S32, S39, S46, S55, S64, S65, S75	Chymotrypsin (mammals, e.g. Bos taurus)
PB clan	C44, C45, C59, C69, C89, C95	Amidophosphoribosyltransferase precursor (Homo sapiens)
	S45, S63	Penicillin G acylase precursor (Escherichia coli)
	T1, T2, T3, T6	Archaean proteasome, beta component (Thermoplasma acidophilum)
PC clan	C26, C56	Gamma-glutamyl hydrolase (Rattus norvegicus)
PC clan	S51	Dipeptidase E (Escherichia coli)
PD clan	C46	Hedgehog protein (Drosophila melanogaster)
PD clan	N9, N10, N11	Intein-containing V-type proton ATPase catalytic subunit A (Saccharomyces cerevisiae)
PE clan	P1	DmpA aminopeptidase (Ochrobactrum anthropi)
PE clan	T5	Ornithine acetyltransferase precursor (Saccharomyces cerevisiae)

Pseudoenzymes

A further subclass of catalytic triad variants are pseudoenzymes, which have triad mutations that make them catalytically inactive, but able to function as binding or structural proteins.^[62] For example, the heparin-binding protein Azurocidin is a homolog of PA clan, but with a glycine in place of the nucleophile and a serine in place of the histidine.^[63] Similarly, RHBDF1 is a homolog of the S54 family rhomboid proteases with an alanine in the place of the nucleophilic serine.^[64]^[65]

Convergent evolution

Evolutionary convergence of threonine proteases towards the same N-terminal active site organisation. Shown are the catalytic threonine of the proteasome and ornithine acetyltransferase.

The enzymology of proteases provides some of the clearest known examples of convergent evolution. The same geometric arrangement of triad residues occurs in over 20 separate enzyme superfamilies. Each of these superfamilies is the result of convergent evolution for the same triad arrangement within a different structural fold. This is because there are limited productive ways to arrange three triad residues, the enzyme backbone and the substrate. These examples reflect the intrinsic chemical and physical constraints on enzymes, leading evolution to repeatedly and independently converge on equivalent solutions.

Cysteine and serine hydrolases

The same triad geometries been converged upon by serine proteases such as the chymotrypsin and subtilisin superfamilies. Similar convergent evolution has occurred with cysteine proteases such as viral C3 protease and papain superfamilies. These triads have converged to almost the same arrangement due to the mechanistic similarities in cysteine and serine proteolysis mechanisms.

Threonine proteases

Threonine proteases use the amino acid threonine as their catalytic nucleophile. Unlike cysteine and serine, threonine is a secondary hydroxyl (i.e. has a methyl group). This methyl group greatly restricts the possible orientations of triad and substrate as the methyl clashes with either the enzyme backbone or histidine base. When the nucleophile of a serine protease was mutated to threonine, the methyl occupied a mixture of positions, most of which prevented substrate binding. Consequently, the catalytic residue of a threonine protease is located at it N-terminus.

Two evolutionarily independent enzyme superfamilies with different protein folds are known to use the N-terminal residue as a nucleophile: Superfamily PB (proteasomes using the Ntn fold) and Superfamily PE (acetyltransferases using the DOM fold) This commonality of active site structure in completely different protein folds indicates that the active site evolved convergently in those superfamilies.

Protease

From Wikipedia, the free encyclopedia

The structure of a protease (TEV protease) complexed with its peptide substrate in black with catalytic residues in red.(PDB: 1LVB)

A protease (also called a peptidase or proteinase) is an enzyme that helps proteolysis: protein catabolism by hydrolysis of peptide bonds. Proteases have evolved multiple times, and different classes of protease can perform the same reaction by completely different catalytic mechanisms. Proteases can be found in Animalia, Plantae, Fungi, Bacteria, Archaea and viruses.

Hierarchy of proteases

Based on catalytic residue

Proteases can be classified into seven broad groups:

Serine proteases - using a serine alcohol
Cysteine proteases - using a cysteine thiol
Threonine proteases - using a threonine secondary alcohol
Aspartic proteases - using an aspartate carboxylic acid
Glutamic proteases - using a glutamate carboxylic acid
Metalloproteases - using a metal, usually zinc
Asparagine peptide lyases - using an asparagine to perform an elimination reaction (not requiring water)

Proteases were first grouped into 84 families according to their evolutionary relationship in 1993, and classified under four catalytic types: serine, cysteine, aspartic, and metallo proteases. The threonine and glutamic-acid proteases were not described until 1995 and 2004 respectively. The mechanism used to cleave a peptide bond involves making an amino acid residue that has the cysteine and threonine (proteases) or a water molecule (aspartic acid, metallo- and acid proteases) nucleophilic so that it can attack the peptide carboxyl group. One way to make a nucleophile is by a catalytic triad, where a histidine residue is used to activate serine, cysteine, or threonine as a nucleophile. This is not an evolutionary grouping, however, as the nucleophile types have evolved convergently in different superfamilies, and some superfamilies show divergent evolution to multiple different nucleophiles.

Peptide lyases

A seventh catalytic type of proteolytic enzymes, asparagine peptide lyase, was described in 2011. Its proteolytic mechanism is unusual since, rather than hydrolysis, it performs an elimination reaction. During this reaction, the catalytic asparagine forms a cyclic chemical structure that cleaves itself at asparagine residues in proteins under the right conditions. Given its fundamentally different mechanism, its inclusion as a peptidase may be debatable.

Evolutionary phylogeny

An up-to-date classification of protease evolutionary superfamilies is found in the MEROPS database. In this database, proteases are classified firstly by 'clan' (superfamily) based on structure, mechanism and catalytic residue order (e.g. the PA clan where P indicates a mixture of nucleophile families). Within each 'clan', proteases are classified into families based on sequence similarity (e.g. the S1 and C3 families within the PA clan). Each family may contain many hundreds of related proteases (e.g. trypsin, elastase, thrombin and streptogrisin within the S1 family).

Currently more than 50 clans are known, each indicating an independent evolutionary origin of proteolysis.

Classification based on optimal pH

Alternatively, proteases may be classified by the optimal pH in which they are active:

Acid proteases
Neutral proteases involved in type 1 hypersensitivity. Here, it is released by mast cells and causes activation of complement and kinins. This group includes the calpains.
Basic proteases (or alkaline proteases)

Enzymatic function and mechanism

A comparison of the two hydrolytic mechanisms used for proteolysis. Enzyme is shown in black, substrate protein in red and water in blue.The top panel shows 1-step hydrolysis where the enzyme uses an acid to polarise water which then hydrolyses the substrate. The bottom panel shows 2-step hydrolysis where a residue within the enzyme is activated to act as a nucleophile (Nu) and attack the substrate. This forms an intermediate where the enzyme is covalently linked to the N-terminal half of the substrate. In a second step, water is activated to hydrolyse this intermediate and complete catalysis. Other enzyme residues (not shown) donate and accept hydrogens and electrostatically stabilise charge build-up along the reaction mechanism.

Proteases are involved in digesting long protein chains into shorter fragments by splitting the peptide bonds that link amino acid residues. Some detach the terminal amino acids from the protein chain (exopeptidases, such as aminopeptidases, carboxypeptidase A); others attack internal peptide bonds of a protein (endopeptidases, such as trypsin, chymotrypsin, pepsin, papain, elastase).

Catalysis

Catalysis is achieved by one of two mechanisms:

Aspartic, glutamic and metallo- proteases activate a water molecule which performs a nucleophilic attack on the peptide bond to hydrolyse it.
Serine, threonine and cysteine proteases use a nucleophilic residue (usually in a catalytic triad). That residue performs a nucleophilic attack to covalently link the protease to the substrate protein, releasing the first half of the product. This covalent acyl-enzyme intermediate is then hydrolysed by activated water to complete catalysis by releasing the second half of the product and regenerating the free enzyme.

Specificity

Proteolysis can be highly promiscuous such that a wide range of protein substrates are hydrolysed. This is the case for digestive enzymes such as trypsin which have to be able to cleave the array of proteins ingested into smaller peptide fragments. Promiscuous proteases typically bind to a single amino acid on the substrate and so only have specificity for that residue. For example, trypsin is specific for the sequences ...K\... or ...R\... ('\'=cleavage site).

Conversely some proteases are highly specific and only cleave substrates with a certain sequence. Blood clotting (such as thrombin) and viral polyprotein processing (such as TEV protease) requires this level of specificity in order to achieve precise cleavage events. This is achieved by proteases having a long binding cleft or tunnel with several pockets along it which bind the specified residues. For example, TEV protease is specific for the sequence ...ENLYFQ\S... ('\'=cleavage site).

Degradation and autolysis

Proteases, being themselves proteins, are cleaved by other protease molecules, sometimes of the same variety. This acts as a method of regulation of protease activity. Some proteases are less active after autolysis (e.g. TEV protease) whilst others are more active (e.g. trypsinogen).

Biodiversity of proteases

Proteases occur in all organisms, from prokaryotes to eukaryotes to viruses. These enzymes are involved in a multitude of physiological reactions from simple digestion of food proteins to highly regulated cascades (e.g., the blood-clotting cascade, the complement system, apoptosis pathways, and the invertebrate prophenoloxidase-activating cascade). Proteases can either break specific peptide bonds (limited proteolysis), depending on the amino acid sequence of a protein, or completely break down a peptide to amino acids (unlimited proteolysis). The activity can be a destructive change (abolishing a protein's function or digesting it to its principal components), it can be an activation of a function, or it can be a signal in a signalling pathway.

Plants

Protease containing plant-solutions called vegetarian rennet has been in use for hundreds of years in Europe and middle-east for making kosher and halal Cheeses. Vegetarian rennet from Withania coagulans has been in use for thousands of years as Ayurvedic remedy for digestion and diabetes in the Indian subcontinent. It is also used to make Paneer.

Plant genomes encode hundreds of proteases, largely of unknown function. Those with known function are largely involved in developmental regulation. Plant proteases also play a role in regulation of photosynthesis.

Animals

Proteases are used throughout an organism for various metabolic processes. Acid proteases secreted into the stomach (such as pepsin) and serine proteases present in duodenum (trypsin and chymotrypsin) enable us to digest the protein in food. Proteases present in blood serum (thrombin, plasmin, Hageman factor, etc.) play important role in blood-clotting, as well as lysis of the clots, and the correct action of the immune system. Other proteases are present in leukocytes (elastase, cathepsin G) and play several different roles in metabolic control. Some snake venoms are also proteases, such as pit viper haemotoxin and interfere with the victim's blood clotting cascade. Proteases determine the lifetime of other proteins playing important physiological role like hormones, antibodies, or other enzymes. This is one of the fastest "switching on" and "switching off" regulatory mechanisms in the physiology of an organism.

By complex cooperative action the proteases may proceed as cascade reactions, which result in rapid and efficient amplification of an organism's response to a physiological signal.

Bacteria

Bacteria secrete proteases to hydrolyse the peptide bonds in proteins and therefore break the proteins down into their constituent amino acids. Bacterial and fungal proteases are particularly important to the global carbon and nitrogen cycles in the recycling of proteins, and such activity tends to be regulated by nutritional signals in these organisms. The net impact of nutritional regulation of protease activity among the thousands of species present in soil can be observed at the overall microbial community level as proteins are broken down in response to carbon, nitrogen, or sulfur limitation.

Bacteria contain proteases responsible for general protein quality control (e.g. the AAA+ proteasome) by degrading unfolded or misfolded proteins.

A secreted bacterial protease may also act as an exotoxin, and be an example of a virulence factor in bacterial pathogenesis (for example, exfoliative toxin). Bacterial exotoxic proteases destroy extracellular structures.

Viruses

Some viruses express their entire genome as one massive polyprotein and use a protease to cleave this into functional units (e.g. polio, norovirus, and TEV proteases). These proteases (e.g. TEV protease) have high specificity and only cleave very restricted set of substrate sequences. They are therefore a common target for antiviral drugs.

Uses

The field of protease research is enormous. Since 2004, approximately 8000 papers related to this field were published each year. Proteases are used in industry, medicine and as a basic biological research tool.

Digestive proteases are part of many laundry detergents and are also used extensively in the bread industry in bread improver. A variety of proteases are used medically both for their native function (e.g. controlling blood clotting) or for completely artificial functions (e.g. for the targeted degradation of pathogenic proteins). Highly specific proteases such as TEV protease and thrombin are commonly used to cleave fusion proteins and affinity tags in a controlled fashion.

Inhibitors

The activity of proteases is inhibited by protease inhibitors. One example of protease inhibitors is the serpin superfamily. It includes alpha 1-antitrypsin (which protects the body from excessive effects of its own inflammatory proteases), alpha 1-antichymotrypsin (which does likewise), C1-inhibitor (which protects the body from excessive protease-triggered activation of its own complement system), antithrombin (which protects the body from excessive coagulation), plasminogen activator inhibitor-1 (which protects the body from inadequate coagulation by blocking protease-triggered fibrinolysis), and neuroserpin.

Natural protease inhibitors include the family of lipocalin proteins, which play a role in cell regulation and differentiation. Lipophilic ligands, attached to lipocalin proteins, have been found to possess tumor protease inhibiting properties. The natural protease inhibitors are not to be confused with the protease inhibitors used in antiretroviral therapy. Some viruses, with HIV/AIDS among them, depend on proteases in their reproductive cycle. Thus, protease inhibitors are developed as antiviral means.

Other natural protease inhibitors are used as defense mechanisms. Common examples are the trypsin inhibitors found in the seeds of some plants, most notable for humans being soybeans, a major food crop, where they act to discourage predators. Raw soybeans are toxic to many animals, including humans, until the protease inhibitors they contain have been denatured.

Search This Blog

Wednesday, December 19, 2018

Convergent evolution (updated)

Overview

Distinctions

Cladistics

Atavism

Parallel vs. convergent evolution

At molecular level

Protease active sites

Nucleic acids

In animal morphology

Bodyplans

Echolocation

Eyes

Flight

Insect mouthparts

Opposable thumbs

Primates

In plants

Carbon fixation

Fruits

Carnivory

Methods of inference

Pattern-based measures

Process-based measures

Catalytic triad

History

Function

Mechanism

Identity of triad members

Nucleophile

Base

Acid

Examples of triads

Ser-His-Asp

Cys-His-Asp

Ser-His-His

Ser-Glu-Asp

Cys-His-Ser

Thr-Nter, Ser-Nter and Cys-Nter

Ser-cisSer-Lys

Sec-His-Glu

Engineered triads

Divergent evolution

Reaction changes

Nucleophile changes

Pseudoenzymes

Convergent evolution

Cysteine and serine hydrolases

Threonine proteases

Protease

Hierarchy of proteases

Based on catalytic residue

Peptide lyases

Evolutionary phylogeny

Classification based on optimal pH

Enzymatic function and mechanism

Catalysis

Specificity

Degradation and autolysis

Biodiversity of proteases

Plants

Animals

Bacteria

Viruses

Uses

Inhibitors

Normal distribution