The evolution of biological complexity is one important outcome of the process of evolution.
Evolution has produced some remarkably complex organisms – although the
actual level of complexity is very hard to define or measure accurately
in biology, with properties such as gene content, the number of cell types or morphology all proposed as possible metrics.
Many biologists used to believe that evolution was progressive (orthogenesis) and had a direction that led towards so-called "higher organisms", despite a lack of evidence for this viewpoint. This idea of "progression" introduced the terms "high animals" and "low animals" in evolution. Many now regard this as misleading, with natural selection
having no intrinsic direction and that organisms selected for either
increased or decreased complexity in response to local environmental
conditions. Although there has been an increase in the maximum level of complexity over the history of life, there has always been a large majority of small and simple organisms and the most common level of complexity appears to have remained relatively constant.
Selection for simplicity and complexity
Usually
organisms that have a higher rate of reproduction than their
competitors have an evolutionary advantage. Consequently, organisms can
evolve to become simpler and thus multiply faster and produce more
offspring, as they require fewer resources to reproduce. A good example
are parasites such as Plasmodium – the parasite responsible for malaria – and mycoplasma; these organisms often dispense with traits that are made unnecessary through parasitism on a host.
A lineage can also dispense with complexity when a particular
complex trait merely provides no selective advantage in a particular
environment. Loss of this trait need not necessarily confer a selective
advantage, but may be lost due to the accumulation of mutations if its
loss does not confer an immediate selective disadvantage. For example, a parasitic organism
may dispense with the synthetic pathway of a metabolite where it can
readily scavenge that metabolite from its host. Discarding this
synthesis may not necessarily allow the parasite to conserve significant
energy or resources and grow faster, but the loss may be fixed in the
population through mutation accumulation if no disadvantage is incurred
by loss of that pathway. Mutations causing loss of a complex trait occur
more often than mutations causing gain of a complex trait.
With selection, evolution can also produce more complex
organisms. Complexity often arises in the co-evolution of hosts and
pathogens, with each side developing ever more sophisticated adaptations, such as the immune system and the many techniques pathogens have developed to evade it. For example, the parasite Trypanosoma brucei, which causes sleeping sickness, has evolved so many copies of its major surface antigen
that about 10% of its genome is devoted to different versions of this
one gene. This tremendous complexity allows the parasite to constantly
change its surface and thus evade the immune system through antigenic variation.
More generally, the growth of complexity may be driven by the co-evolution between an organism and the ecosystem of predators, prey and parasites
to which it tries to stay adapted: as any of these become more complex
in order to cope better with the diversity of threats offered by the
ecosystem formed by the others, the others too will have to adapt by
becoming more complex, thus triggering an ongoing evolutionary arms race towards more complexity. This trend may be reinforced by the fact that ecosystems themselves tend to become more complex over time, as species diversity increases, together with the linkages or dependencies between species.
Types of trends in complexity
If evolution possessed an active trend toward complexity (orthogenesis), as was widely believed in the 19th century, then we would expect to see an active trend of increase over time in the most common value (the mode) of complexity among organisms.
However, an increase in complexity can also be explained through a passive process.
Assuming unbiased random changes of complexity and the existence of a
minimum complexity leads to an increase over time of the average
complexity of the biosphere. This involves an increase in variance,
but the mode does not change. The trend towards the creation of some
organisms with higher complexity over time exists, but it involves
increasingly small percentages of living things.
In this hypothesis, any appearance of evolution acting with an
intrinsic direction towards increasingly complex organisms is a result
of people concentrating on the small number of large, complex organisms
that inhabit the right-hand tail
of the complexity distribution and ignoring simpler and much more
common organisms. This passive model predicts that the majority of
species are microscopicprokaryotes, which is supported by estimates of 106 to 109 extant prokaryotes compared to diversity estimates of 106 to 3·106 for eukaryotes. Consequently, in this view, microscopic life dominates Earth, and large organisms only appear more diverse due to sampling bias.
Genome complexity has generally increased since the beginning of the life on Earth. Some computer models have suggested that the generation of complex organisms is an inescapable feature of evolution. Proteins tend to become more hydrophobic over time, and to have their hydrophobic amino acids more interspersed along the primary sequence. Increases in body size over time are sometimes seen in what is known as Cope's rule.
According to this model, new genes are created by non-adaptive processes, such as by random gene duplication.
These novel entities, although not required for viability, do give the
organism excess capacity that can facilitate the mutational decay of
functional subunits. If this decay results in a situation where all of
the genes are now required, the organism has been trapped in a new state
where the number of genes has increased. This process has been
sometimes described as a complexifying ratchet. These supplemental genes can then be co-opted by natural selection by a process called neofunctionalization.
In other instances constructive neutral evolution does not promote the
creation of new parts, but rather promotes novel interactions between
existing players, which then take on new moonlighting roles.
Constructive neutral evolution has also been used to explain how ancient complexes, such as the spliceosome and the ribosome, have gained new subunits over time, how new alternative spliced isoforms of genes arise, how gene scrambling in ciliates evolved, how pervasive pan-RNA editing may have arisen in Trypanosoma brucei, how functional lncRNAs have likely arisen from transcriptional noise, and how even useless protein complexes can persist for millions of years.
Mutational hazard hypothesis
The mutational hazard hypothesis is a non-adaptive theory for increased complexity in genomes. The basis of mutational hazard hypothesis is that each mutation for non-coding DNA imposes a fitness cost. Variation in complexity can be described by 2Neu, where Ne is effective population size and u is mutation rate.
In this hypothesis, selection against non-coding DNA can be
reduced in three ways: random genetic drift, recombination rate, and
mutation rate. As complexity increases from prokaryotes to multicellular eukaryotes, effective population size decreases, subsequently increasing the strength of random genetic drift. This, along with low recombination rate and high mutation rate, allows non-coding DNA to proliferate without being removed by purifying selection.
Accumulation of non-coding DNA in larger genomes can be seen when
comparing genome size and genome content across eukaryotic taxa. There
is a positive correlation between genome size and noncoding DNA genome
content with each group staying within some variation.
When comparing variation in complexity in organelles, effective
population size is replaced with genetic effective population size (Ng). If looking at silent-site nucleotide diversity, then larger genomes are expected to have less diversity than more compact ones. In plant and animal mitochondria,
differences in mutation rate account for the opposite directions in
complexity, with plant mitochondria being more complex and animal
mitochondria more streamlined.
The mutational hazard hypothesis has been used to at least
partially explain expanded genomes in some species. For example, when
comparing Volvox cateri to a close relative with a compact genome, Chlamydomonas reinhardtii, the former had less silent-site diversity than the latter in nuclear, mitochondrial, and plastid genomes. However when comparing the plastid genome of Volvox cateri to Volvox africanus, a species in the same genus but with half the plastid genome size, there was high mutation rates in intergenic regions. In Arabiopsis thaliana, the hypothesis was used as a possible explanation for intron loss and compact genome size. When compared to Arabidopsis lyrata,
researchers found a higher mutation rate overall and in lost introns
(an intron that is no longer transcribed or spliced) compared to
conserved introns.
There are expanded genomes in other species that could not be
explained by the mutational hazard hypothesis. For example, the expanded
mitochondrial genomes of Silene noctiflora and Silene conica
have high mutation rates, lower intron lengths, and more non-coding DNA
elements compared to others in the same genus, but there was no
evidence for long-term low effective population size. The mitochondrial genomes of Citrullus lanatus and Cucurbita pepo differ in several ways. Citrullus lanatus is smaller, has more introns and duplications, while Cucurbita pepo is larger with more chloroplast and short repeated sequences. If RNA editing sites and mutation rate lined up, then Cucurbita pepo would have a lower mutation rate and more RNA editing sites. However the mutation rate is four times higher than Citrullus lanatus and they have a similar number of RNA editing sites. There was also an attempt to use the hypothesis to explain large nuclear genomes of salamanders, but researchers found opposite results than expected, including lower long-term strength of genetic drift.
In the 19th century, some scientists such as Jean-Baptiste Lamarck (1744–1829) and Ray Lankester
(1847–1929) believed that nature had an innate striving to become more
complex with evolution. This belief may reflect then-current ideas of Hegel (1770–1831) and of Herbert Spencer (1820–1903) which envisaged the universe gradually evolving to a higher, more perfect state.
This view regarded the evolution of parasites from independent organisms to a parasitic species as "devolution"
or "degeneration", and contrary to nature. Social theorists have
sometimes interpreted this approach metaphorically to decry certain
categories of people as "degenerate parasites". Later scientists
regarded biological devolution as nonsense; rather, lineages become
simpler or more complicated according to whatever forms had a selective
advantage.
In a 1964 book, The Emergence of Biological Organization, Quastler
pioneered a theory of emergence, developing a model of a series of
emergences from protobiological systems to prokaryotes without the need
to invoke implausible very low probability events.
The evolution of order, manifested as biological complexity, in
living systems and the generation of order in certain non-living systems
was proposed in 1983 to obey a common fundamental principal called “the
Darwinian dynamic”.
The Darwinian dynamic was formulated by first considering how
microscopic order is generated in simple non-biological systems that are
far from thermodynamic equilibrium. Consideration was then extended to short, replicating RNA molecules assumed to be similar to the earliest forms of life in the RNA world.
It was shown that the underlying order-generating processes in the
non-biological systems and in replicating RNA are basically similar.
This approach helped clarify the relationship of thermodynamics to
evolution as well as the empirical content of Darwin's theory.
In 1985, Morowitz noted that the modern era of irreversible thermodynamics ushered in by Lars Onsager
in the 1930s showed that systems invariably become ordered under a flow
of energy, thus indicating that the existence of life involves no
contradiction to the laws of physics.
In chemistry, a double bond is a covalent bond between two atoms involving four bonding electrons as opposed to two in a single bond. Double bonds occur most commonly between two carbon atoms, for example in alkenes. Many double bonds exist between two different elements: for example, in a carbonyl group between a carbon atom and an oxygen atom. Other common double bonds are found in azo compounds (N=N), imines (C=N), and sulfoxides (S=O). In a skeletal formula, a double bond is drawn as two parallel lines (=) between the two connected atoms; typographically, the equals sign is used for this. Double bonds were introduced in chemical notation by Russian chemist Alexander Butlerov.
Double bonds involving carbon are stronger and shorter than single bonds. The bond order
is two. Double bonds are also electron-rich, which makes them
potentially more reactive in the presence of a strong electron acceptor
(as in addition reactions of the halogens).
The type of bonding can be explained in terms of orbital hybridisation. In ethylene each carbon atom has three sp2 orbitals and one p-orbital. The three sp2
orbitals lie in a plane with ~120° angles. The p-orbital is
perpendicular to this plane. When the carbon atoms approach each other,
two of the sp2 orbitals overlap to form a sigma bond. At the same time, the two p-orbitals approach (again in the same plane) and together they form a pi bond.
For maximum overlap, the p-orbitals have to remain parallel, and,
therefore, rotation around the central bond is not possible. This
property gives rise to cis-trans isomerism. Double bonds are shorter than single bonds because p-orbital overlap is maximized.
2 sp2 orbitals (total of 3 such orbitals) approach to form a sp2-sp2 sigma bond
Two p-orbitals overlap to form a pi-bond in a plane parallel to the sigma plane
With 133 pm, the ethylene C=Cbond length is shorter than the C−C length in ethane with 154 pm. The double bond is also stronger, 636 kJmol−1 versus 368 kJ mol−1 but not twice as much as the pi-bond is weaker than the sigma bond due to less effective pi-overlap.
In an alternative representation, the double bond results from two overlapping sp3 orbitals as in a bent bond.[3]
Variations
In
molecules with alternating double bonds and single bonds, p-orbital
overlap can exist over multiple atoms in a chain, giving rise to a conjugated system. Conjugation can be found in systems such as dienes and enones. In cyclic molecules, conjugation can lead to aromaticity. In cumulenes, two double bonds are adjacent.
Double bonded compounds, alkene homologs, R2E=ER2 are now known for all of the heavier group 14
elements. Unlike the alkenes these compounds are not planar but adopt
twisted and/or trans bent structures. These effects become more
pronounced for the heavier elements. The distannene (Me3Si)2CHSn=SnCH(SiMe3)2
has a tin-tin bond length just a little shorter than a single bond, a
trans bent structure with pyramidal coordination at each tin atom, and
readily dissociates in solution to form (Me3Si)2CHSn:
(stannanediyl, a carbene analog). The bonding comprises two weak donor
acceptor bonds, the lone pair on each tin atom overlapping with the
empty p orbital on the other.
In contrast, in disilenes each silicon atom has planar coordination
but the substituents are twisted so that the molecule as a whole is not
planar. In diplumbenes the Pb=Pb bond length can be longer than that of
many corresponding single bonds
Plumbenes and stannenes generally dissociate in solution into monomers
with bond enthalpies that are just a fraction of the corresponding
single bonds. Some double bonds plumbenes and stannenes are similar in
strength to hydrogen bonds. The Carter-Goddard-Malrieu-Trinquier model can be used to predict the nature of the bonding.
Protein–protein interactions (PPIs) are physical contacts of high specificity established between two or more protein molecules as a result of biochemical events steered by interactions that include electrostatic forces, hydrogen bonding and the hydrophobic effect.
Many are physical contacts with molecular associations between chains
that occur in a cell or in a living organism in a specific biomolecular
context.
Proteins rarely act alone as their functions tend to be regulated. Many molecular processes within a cell are carried out by molecular machines
that are built from numerous protein components organized by their
PPIs. These physiological interactions make up the so-called interactomics of the organism, while aberrant PPIs are the basis of multiple aggregation-related diseases, such as Creutzfeldt–Jakob and Alzheimer's diseases.
In many metabolic reactions, a protein that acts as an electron carrier binds to an enzyme that acts as its reductase. After it receives an electron, it dissociates and then binds to the next enzyme that acts as its oxidase
(i.e. an acceptor of the electron). These interactions between proteins
are dependent on highly specific binding between proteins to ensure
efficient electron transfer. Examples: mitochondrial oxidative
phosphorylation chain system components cytochrome c-reductase / cytochrome c / cytochrome c oxidase; microsomal and mitochondrial P450 systems.
In the case of the mitochondrial P450 systems, the specific residues involved in the binding of the electron transfer protein adrenodoxin
to its reductase were identified as two basic Arg residues on the
surface of the reductase and two acidic Asp residues on the adrenodoxin.
More recent work on the phylogeny of the reductase has shown that these
residues involved in protein–protein interactions have been conserved
throughout the evolution of this enzyme.
The activity of the cell is regulated by extracellular signals.
Signal propagation inside and/or along the interior of cells depends on
PPIs between the various signaling molecules. The recruitment of
signaling pathways through PPIs is called signal transduction and plays a fundamental role in many biological processes and in many diseases including Parkinson's disease and cancer.
To describe the types of protein–protein interactions (PPIs) it is
important to consider that proteins can interact in a "transient" way
(to produce some specific effect in a short time, like signal
transduction) or to interact with other proteins in a "stable" way to
form complexes that become molecular machines within the living systems.
A protein complex assembly can result in the formation of homo-oligomeric or hetero-oligomeric complexes.
In addition to the conventional complexes, as enzyme-inhibitor and
antibody-antigen, interactions can also be established between
domain-domain and domain-peptide. Another important distinction to
identify protein–protein interactions is the way they have been
determined, since there are techniques that measure direct physical
interactions between protein pairs, named “binary” methods, while there
are other techniques that measure physical interactions among groups of
proteins, without pairwise determination of protein partners, named
“co-complex” methods.
Homo-oligomers vs. hetero-oligomers
Homo-oligomers are macromolecular complexes constituted by only one type of protein subunit. Protein subunits assembly is guided by the establishment of non-covalent interactions in the quaternary structure of the protein. Disruption of homo-oligomers in order to return to the initial individual monomers often requires denaturation of the complex. Several enzymes, carrier proteins,
scaffolding proteins, and transcriptional regulatory factors carry out
their functions as homo-oligomers.
Distinct protein subunits interact in hetero-oligomers, which are
essential to control several cellular functions. The importance of the
communication between heterologous proteins is even more evident during
cell signaling events and such interactions are only possible due to
structural domains within the proteins (as described below).
Stable interactions vs. transient interactions
Stable
interactions involve proteins that interact for a long time, taking
part of permanent complexes as subunits, in order to carry out
functional roles. These are usually the case of homo-oligomers (e.g. cytochrome c), and some hetero-oligomeric proteins, as the subunits of ATPase. On the other hand, a protein may interact briefly and in a reversible manner with other proteins in only certain cellular contexts – cell type, cell cycle stage, external factors, presence of other binding proteins, etc. – as it happens with most of the proteins involved in biochemical cascades. These are called transient interactions. For example, some G protein–coupled receptors only transiently bind to Gi/o proteins when they are activated by extracellular ligands, while some Gq-coupled receptors, such as muscarinic receptor M3, pre-couple with Gq proteins prior to the receptor-ligand binding. Interactions between intrinsically disordered protein regions to globular protein domains (i.e. MoRFs) are transient interactions.
Water molecules play a significant role in the interactions between proteins.
The crystal structures of complexes, obtained at high resolution from
different but homologous proteins, have shown that some interface water
molecules are conserved between homologous complexes. The majority of
the interface water molecules make hydrogen bonds with both partners of
each complex. Some interface amino acid residues or atomic groups of one
protein partner engage in both direct and water mediated interactions
with the other protein partner. Doubly indirect interactions, mediated
by two water molecules, are more numerous in the homologous complexes of
low affinity.
Carefully conducted mutagenesis experiments, e.g. changing a tyrosine
residue into a phenylalanine, have shown that water mediated
interactions can contribute to the energy of interaction. Thus, water molecules may facilitate the interactions and cross-recognitions between proteins.
The molecular structures of many protein complexes have been unlocked by the technique of X-ray crystallography. The first structure to be solved by this method was that of sperm whalemyoglobin by Sir John Cowdery Kendrew.
In this technique the angles and intensities of a beam of X-rays
diffracted by crystalline atoms are detected in a film, thus producing a
three-dimensional picture of the density of electrons within the
crystal.
Later, nuclear magnetic resonance
also started to be applied with the aim of unravelling the molecular
structure of protein complexes. One of the first examples was the
structure of calmodulin-binding domains bound to calmodulin.
This technique is based on the study of magnetic properties of atomic
nuclei, thus determining physical and chemical properties of the
correspondent atoms or the molecules. Nuclear magnetic resonance is
advantageous for characterizing weak PPIs.
SH2 domains are structurally composed by three-stranded twisted
beta sheet sandwiched flanked by two alpha-helices. The existence of a
deep binding pocket with high affinity for phosphotyrosine, but not for phosphoserine or phosphothreonine, is essential for the recognition of tyrosine phosphorylated proteins, mainly autophosphorylated growth factor receptors. Growth factor receptor binding proteins and phospholipase Cγ are examples of proteins that have SH2 domains.
Structurally, SH3 domains are constituted by a beta barrel
formed by two orthogonal beta sheets and three anti-parallel beta
strands. These domains recognize proline enriched sequences, as polyproline type II helical structure (PXXP motifs) in cell signaling proteins like protein tyrosine kinases and the growth factor receptor bound protein 2 (Grb2).
LIM domains were initially identified in three homeodomain transcription factors (lin11, is11, and mec3). In addition to this homeodomain proteins
and other proteins involved in development, LIM domains have also been
identified in non-homeodomain proteins with relevant roles in cellular differentiation, association with cytoskeleton and senescence. These domains contain a tandem cysteine-rich Zn2+-finger motif and embrace the consensus sequence CX2CX16-23HX2CX2CX2CX16-21CX2C/H/D. LIM domains bind to PDZ domains, bHLH transcription factors, and other LIM domains.
SAM domains are composed by five helices forming a compact package with a conserved hydrophobic core. These domains, which can be found in the Eph receptor and the stromal interaction molecule (STIM) for example, bind to non-SAM domain-containing proteins and they also appear to have the ability to bind RNA.
PDZ domains were first identified in three guanylate kinases:
PSD-95, DlgA and ZO-1. These domains recognize carboxy-terminal
tri-peptide motifs (S/TXV), other PDZ domains or LIM domains and bind them through a short peptide sequence that has a C-terminal
hydrophobic residue. Some of the proteins identified as having PDZ
domains are scaffolding proteins or seem to be involved in ion receptor
assembling and receptor-enzyme complexes formation.
FERM domains contain basic residues capable of binding PtdIns(4,5)P2. Talin and focal adhesion kinase (FAK) are two of the proteins that present FERM domains.
The
study of the molecular structure can give fine details about the
interface that enables the interaction between proteins. When
characterizing PPI interfaces it is important to take into account the
type of complex.
Parameters evaluated include size (measured in absolute dimensions Å2 or in solvent-accessible surface area (SASA)),
shape, complementarity between surfaces, residue interface
propensities, hydrophobicity, segmentation and secondary structure, and
conformational changes on complex formation.
The great majority of PPI interfaces reflects the composition of
protein surfaces, rather than the protein cores, in spite of being
frequently enriched in hydrophobic residues, particularly in aromatic
residues. PPI interfaces are dynamic and frequently planar, although they can be globular and protruding as well. Based on three structures – insulin dimer, trypsin-pancreatic trypsin inhibitor complex, and oxyhaemoglobin – Cyrus Chothia and Joel Janin found that between 1,130 and 1,720 Å2 of surface area was removed from contact with water indicating that hydrophobicity is a major factor of stabilization of PPIs. Later studies refined the buried surface area of the majority of interactions to 1,600±350 Å2. However, much larger interaction interfaces were also observed and were associated with significant changes in conformation of one of the interaction partners. PPIs interfaces exhibit both shape and electrostatic complementarity.
Regulation
Protein concentration, which in turn are affected by expression levels and degradation rates;
Protein affinity for proteins or other binding ligands;
This system was firstly described in 1989 by Fields and Song using Saccharomyces cerevisiae as biological model. Yeast two hybrid allows the identification of pairwise PPIs (binary method) in vivo,
in which the two proteins are tested for biophysically direct
interaction. The Y2H is based on the functional reconstitution of the
yeast transcription factor Gal4 and subsequent activation of a selective
reporter such as His3. To test two proteins for interaction, two
protein expression constructs are made: one protein (X) is fused to the
Gal4 DNA-binding domain (DB) and a second protein (Y) is fused to the
Gal4 activation domain (AD). In the assay, yeast cells are transformed
with these constructs. Transcription of reporter genes does not occur
unless bait (DB-X) and prey (AD-Y) interact with each other and form a
functional Gal4 transcription factor. Thus, the interaction between
proteins can be inferred by the presence of the products resultant of
the reporter gene expression.
In cases in which the reporter gene expresses enzymes that allow the
yeast to synthesize essential amino acids or nucleotides, yeast growth
under selective media conditions indicates that the two proteins tested
are interacting. Recently, software to detect and prioritize protein
interactions was published.
Despite its usefulness, the yeast two-hybrid system has
limitations. It uses yeast as main host system, which can be a problem
when studying proteins that contain mammalian-specific
post-translational modifications. The number of PPIs identified is
usually low because of a high false negative rate; and, understates membrane proteins, for example.
In initial studies that utilized Y2H, proper controls for false
positives (e.g. when DB-X activates the reporter gene without the
presence of AD-Y) were frequently not done, leading to a higher than
normal false positive rate. An empirical framework must be implemented
to control for these false positives.
Limitations in lower coverage of membrane proteins have been overcoming
by the emergence of yeast two-hybrid variants, such as the membrane
yeast two-hybrid (MYTH) and the split-ubiquitin system, which are not limited to interactions that occur in the nucleus; and, the bacterial two-hybrid system, performed in bacteria;
Affinity purification coupled to mass spectrometry
Affinity purification coupled to mass spectrometry mostly detects
stable interactions and thus better indicates functional in vivo PPIs. This method starts by purification of the tagged protein, which is expressed in the cell usually at in vivo
concentrations, and its interacting proteins (affinity purification).
One of the most advantageous and widely used methods to purify proteins
with very low contaminating background is the tandem affinity purification,
developed by Bertrand Seraphin and Matthias Mann and respective
colleagues. PPIs can then be quantitatively and qualitatively analysed
by mass spectrometry using different methods: chemical incorporation,
biological or metabolic incorporation (SILAC), and label-free methods. Furthermore, network theory has been used to study the whole set of identified protein–protein interactions in cells.
Nucleic acid programmable protein array (NAPPA)
This
system was first developed by LaBaer and colleagues in 2004 by using in
vitro transcription and translation system. They use DNA template
encoding the gene of interest fused with GST protein, and it was
immobilized in the solid surface. Anti-GST antibody and biotinylated
plasmid DNA were bounded in aminopropyltriethoxysilane (APTES)-coated
slide. BSA can improve the binding efficiency of DNA. Biotinylated
plasmid DNA was bound by avidin. New protein was synthesized by using
cell-free expression system i.e. rabbit reticulocyte lysate (RRL), and
then the new protein was captured through anti-GST antibody bounded on
the slide. To test protein–protein interaction, the targeted protein
cDNA and query protein cDNA were immobilized in a same coated slide. By
using in vitro transcription and translation system, targeted and query
protein was synthesized by the same extract. The targeted protein was
bound to array by antibody coated in the slide and query protein was
used to probe the array. The query protein was tagged with hemagglutinin
(HA) epitope. Thus, the interaction between the two proteins was
visualized with the antibody against HA.
Intragenic complementation
When multiple copies of a polypeptide encoded by a gene
form a complex, this protein structure is referred to as a multimer.
When a multimer is formed from polypeptides produced by two different mutantalleles
of a particular gene, the mixed multimer may exhibit greater functional
activity than the unmixed multimers formed by each of the mutants
alone. In such a case, the phenomenon is referred to as intragenic complementation
(also called inter-allelic complementation). Intragenic
complementation has been demonstrated in many different genes in a
variety of organisms including the fungi Neurospora crassa, Saccharomyces cerevisiae and Schizosaccharomyces pombe; the bacterium Salmonella typhimurium; the virus bacteriophage T4, an RNA virus and humans. In such studies, numerous mutations defective in the same gene were often isolated and mapped in a linear order on the basis of recombination frequencies to form a genetic map
of the gene. Separately, the mutants were tested in pairwise
combinations to measure complementation. An analysis of the results
from such studies led to the conclusion that intragenic complementation,
in general, arises from the interaction of differently defective
polypeptide monomers to form a multimer.
Genes that encode multimer-forming polypeptides appear to be common.
One interpretation of the data is that polypeptide monomers are often
aligned in the multimer in such a way that mutant polypeptides defective
at nearby sites in the genetic map tend to form a mixed multimer that
functions poorly, whereas mutant polypeptides defective at distant sites
tend to form a mixed multimer that functions more effectively. Direct
interaction of two nascent proteins emerging from nearby ribosomes appears to be a general mechanism for homo-oligomer (multimer) formation. Hundreds of protein oligomers were identified that assemble in human cells by such an interaction.
The most prevalent form of interaction is between the N-terminal
regions of the interacting proteins. Dimer formation appears to be able
to occur independently of dedicated assembly machines. The
intermolecular forces likely responsible for self-recognition and
multimer formation were discussed by Jehle.
The
experimental detection and characterization of PPIs is labor-intensive
and time-consuming. However, many PPIs can be also predicted
computationally, usually using experimental data as a starting point.
However, methods have also been developed that allow the prediction of
PPI de novo, that is without prior evidence for these interactions.
Genomic context methods
The Rosetta Stone or Domain Fusion method is based on the hypothesis that interacting proteins are sometimes fused into a single protein in another genome.
Therefore, we can predict if two proteins may be interacting by
determining if they each have non-overlapping sequence similarity to a
region of a single protein sequence in another genome.
The Conserved Neighborhood method
is based on the hypothesis that if genes encoding two proteins are
neighbors on a chromosome in many genomes, then they are likely
functionally related (and possibly physically interacting).
The Phylogenetic Profile method
is based on the hypothesis that if two or more proteins are
concurrently present or absent across several genomes, then they are
likely functionally related.
Therefore, potentially interacting proteins can be identified by
determining the presence or absence of genes across many genomes and
selecting those genes which are always present or absent together.
Publicly
available information from biomedical documents is readily accessible
through the internet and is becoming a powerful resource for collecting
known protein–protein interactions (PPIs), PPI prediction and protein
docking. Text mining is much less costly and time-consuming compared to
other high-throughput techniques. Currently, text mining methods
generally detect binary relations between interacting proteins from individual sentences using rule/pattern-based information extraction and machine learning approaches.
A wide variety of text mining applications for PPI extraction and/or
prediction are available for public use, as well as repositories which
often store manually validated and/or computationally predicted PPIs.
Text mining can be implemented in two stages: information retrieval, where texts containing names of either or both interacting proteins are retrieved and information extraction, where targeted information (interacting proteins, implicated residues, interaction types, etc.) is extracted.
There are also studies using phylogenetic profiling,
basing their functionalities on the theory that proteins involved in
common pathways co-evolve in a correlated fashion across species. Some
more complex text mining methodologies use advanced Natural Language Processing
(NLP) techniques and build knowledge networks (for example, considering
gene names as nodes and verbs as edges). Other developments involve kernel methods to predict protein interactions.
Machine learning methods
Many computational methods have been suggested and reviewed for predicting protein–protein interactions. Prediction approaches can be grouped into categories based on predictive evidence: protein sequence, comparative genomics, protein domains, protein tertiary structure, and interaction network topology.
The construction of a positive set (known interacting protein pairs)
and a negative set (non-interacting protein pairs) is needed for the
development of a computational prediction model.
Prediction models using machine learning techniques can be broadly
classified into two main groups: supervised and unsupervised, based on
the labeling of input variables according to the expected outcome.
In 2005, integral membrane proteins of Saccharomyces cerevisiae
were analyzed using the mating-based ubiquitin system (mbSUS). The
system detects membrane proteins interactions with extracellular
signaling proteins
Of the 705 integral membrane proteins 1,985 different interactions were
traced that involved 536 proteins. To sort and classify interactions a
support vector machine was used to define high medium and low confidence
interactions. The split-ubiquitin membrane yeast two-hybrid system uses
transcriptional reporters to identify yeast transformants that encode
pairs of interacting proteins.
In 2006, random forest,
an example of a supervised technique, was found to be the
most-effective machine learning method for protein interaction
prediction. Such methods have been applied for discovering protein interactions on human interactome, specifically the interactome of Membrane proteins and the interactome of Schizophrenia-associated proteins.
As of 2020, a model using residue cluster classes (RCCs), constructed from the 3DID and Negatome databases, resulted in 96-99% correctly classified instances of protein–protein interactions.
RCCs are a computational vector space that mimics protein fold space
and includes all simultaneously contacted residue sets, which can be
used to analyze protein structure-function relation and evolution.
Databases
Large
scale identification of PPIs generated hundreds of thousands of
interactions, which were collected together in specialized biological databases that are continuously updated in order to provide complete interactomes. The first of these databases was the Database of Interacting Proteins (DIP).
Primary databases collect information about published PPIs proven to exist via small-scale or large-scale experimental methods. Examples: DIP, Biomolecular Interaction Network Database (BIND), Biological General Repository for Interaction Datasets (BioGRID),
Human Protein Reference Database (HPRD), IntAct Molecular Interaction
Database, Molecular Interactions Database (MINT), MIPS Protein
Interaction Resource on Yeast (MIPS-MPact), and MIPS Mammalian
Protein–Protein Interaction Database (MIPS-MPPI).
Meta-databases normally result from the integration of primary databases information, but can also collect some original data.
Prediction databases include many PPIs that are predicted
using several techniques (main article). Examples: Human Protein–Protein
Interaction Prediction Database (PIPs), Interlogous Interaction Database (I2D), Known and Predicted Protein–Protein Interactions (STRING-db), and Unified Human Interactive (UniHI).
The aforementioned computational methods all depend on source
databases whose data can be extrapolated to predict novel
protein–protein interactions. Coverage differs greatly between
databases. In general, primary databases have the fewest total protein
interactions recorded as they do not integrate data from multiple other
databases, while prediction databases have the most because they include
other forms of evidence in addition to experimental. For example, the
primary database IntAct has 572,063 interactions, the meta-database APID has 678,000 interactions, and the predictive database STRING has 25,914,693 interactions.
However, it is important to note that some of the interactions in the
STRING database are only predicted by computational methods such as
Genomic Context and not experimentally verified.
Information found in PPIs databases supports the construction of
interaction networks. Although the PPI network of a given query protein
can be represented in textbooks, diagrams of whole cell PPIs are frankly
complex and difficult to generate.
One example of a manually produced molecular interaction map is the Kurt Kohn's 1999 map of cell cycle control.
Drawing on Kohn's map, Schwikowski et al. in 2000 published a paper on
PPIs in yeast, linking 1,548 interacting proteins determined by
two-hybrid screening. They used a layered graph drawing method to find
an initial placement of the nodes and then improved the layout using a
force-based algorithm.
Bioinformatic tools have been developed to simplify the difficult
task of visualizing molecular interaction networks and complement them
with other types of data. For instance, Cytoscape is an open-source software widely used and many plugins are currently available. Pajek software is advantageous for the visualization and analysis of very large networks.
Identification of functional modules in PPI networks is an
important challenge in bioinformatics. Functional modules means a set of
proteins that are highly connected to each other in PPI network. It is almost similar problem as community detection in social networks. There are some methods such as Jactive modules and MoBaS. Jactive modules integrate PPI network and gene expression data where as MoBaS integrate PPI network and Genome Wide association Studies.
protein–protein relationships are often the result of multiple
types of interactions or are deduced from different approaches,
including co-localization, direct interaction, suppressive genetic
interaction, additive genetic interaction, physical association, and
other associations.
Signed interaction networks
Protein–protein interactions often result in one of the interacting
proteins either being 'activated' or 'repressed'. Such effects can be
indicated in a PPI network by "signs" (e.g. "activation" or
"inhibition"). Although such attributes have been added to networks for a
long time, Vinayagam et al. (2014) coined the term Signed network
for them. Signed networks are often expressed by labeling the
interaction as either positive or negative. A positive interaction is
one where the interaction results in one of the proteins being
activated. Conversely, a negative interaction indicates that one of the
proteins being inactivated.
Protein–protein interaction networks are often constructed as a
result of lab experiments such as yeast two-hybrid screens or 'affinity
purification and subsequent mass spectrometry techniques.
However these methods do not provide the layer of information needed in
order to determine what type of interaction is present in order to be
able to attribute signs to the network diagrams.
RNA interference screens
RNA interference
(RNAi) screens (repression of individual proteins between transcription
and translation) are one method that can be utilized in the process of
providing signs to the protein–protein interactions. Individual proteins
are repressed and the resulting phenotypes are analyzed. A correlating
phenotypic relationship (i.e. where the inhibition of either of two
proteins results in the same phenotype) indicates a positive, or
activating relationship. Phenotypes that do not correlate (i.e. where
the inhibition of either of two proteins results in two different
phenotypes) indicate a negative or inactivating relationship. If protein
A is dependent on protein B for activation then the inhibition of
either protein A or B will result in a cell losing the service that is
provided by protein A and the phenotypes will be the same for the
inhibition of either A or B. If, however, protein A is inactivated by
protein B then the phenotypes will differ depending on which protein is
inhibited (inhibit protein B and it can no longer inactivate protein A
leaving A active however inactivate A and there is nothing for B to
activate since A is inactive and the phenotype changes). Multiple RNAi
screens need to be performed in order to reliably appoint a sign to a
given protein–protein interaction. Vinayagam et al. who devised this
technique state that a minimum of nine RNAi screens are required with confidence increasing as one carries out more screens.
As therapeutic targets
Modulation of PPI is challenging and is receiving increasing attention by the scientific community. Several properties of PPI such as allosteric sites and hotspots, have been incorporated into drug-design strategies. Nevertheless, very few PPIs are directly targeted by FDA-approved small-molecule PPI inhibitors, emphasizing a huge untapped opportunity for drug discovery.
In 2014, Amit Jaiswal and others were able to develop 30 peptides
to inhibit recruitment of telomerase towards telomeres by utilizing
protein–protein interaction studies. Arkin and others were able to develop antibody fragment-based inhibitors to regulate specific protein-protein interactions.
As the "modulation" of PPIs not only includes the inhibition, but also the stabilization of quaternary protein complexes, molecules with this mechanism of action (so called molecular glues) are also intensively studied.
Examples
Tirobifan, inhibitor of the glycoprotein IIb/IIIa, used as a cardiovascular drug.
Maraviroc, inhibitor of the CCR5-gp120 interaction, used as anti-HIV drug.
AMG-176, AZD5991, S64315, inhibitors of myeloid cell leukemia 1 (Mcl-1) protein and its interactions.