A Medley of Potpourri

Thursday, July 25, 2019

Protein–protein interaction

From Wikipedia, the free encyclopedia

The horseshoe shaped ribonuclease inhibitor (shown as wireframe) forms a protein–protein interaction with the ribonuclease protein. The contacts between the two proteins are shown as coloured patches.

Protein–protein interactions (PPIs) are the physical contacts of high specificity established between two or more protein molecules as a result of biochemical events steered by electrostatic forces including the hydrophobic effect. Many are physical contacts with molecular associations between chains that occur in a cell or in a living organism in a specific biomolecular context.

Proteins rarely act alone as their functions tend to be regulated. Many molecular processes within a cell are carried out by molecular machines that are built from a large number of protein components organized by their PPIs. These interactions make up the so-called interactomics of the organism, while aberrant PPIs are the basis of multiple aggregation-related diseases, such as Creutzfeldt–Jakob, Alzheimer's diseases, and may lead to cancer.

PPIs have been studied from different perspectives: biochemistry, quantum chemistry, molecular dynamics, signal transduction, among others. All this information enables the creation of large protein interaction networks – similar to metabolic or genetic/epigenetic networks – that empower the current knowledge on biochemical cascades and molecular etiology of disease, as well as the discovery of putative protein targets of therapeutic interest.

Examples

Electron transfer proteins

In many metabolic reactions, a protein that acts as an electron carrier binds to an enzyme that acts its reductase. After it receives an electron, it dissociates and then binds to the next enzyme that acts its oxidase (i.e. an acceptor of the electron). These interactions between proteins are dependent on highly specific binding between proteins to ensure efficient electron transfer. Examples: mitochondrial oxidative phosphorylation chain system components cytochrome c-reductase / cytochrome c / cytochrome c oxidase; microsomal and mitochondrial P450 systems.

In the case of the mitochondrial P450 systems, the specific residues involved in the binding of the electron transfer protein adrenodoxin to its reductase were identified as two basic Arg residues on the surface of the reductase and two acidic Asp residues on the adrenodoxin. More recent work on the phylogeny of the reductase has shown that these residues involved in protein-protein interactions have been conserved throughout the evolution of this enzyme.

Signal transduction

The activity of the cell is regulated by extracellular signals. Signal propagation inside and/or along the interior of cells depends on PPIs between the various signaling molecules. The recruitment of signaling pathways through PPIs is called signal transduction and plays a fundamental role in many biological processes and in many diseases including Parkinson's disease and cancer.

Membrane transport

A protein may be carrying another protein (for example, from cytoplasm to nucleus or vice versa in the case of the nuclear pore importins).

Cell metabolism

In many biosynthetic processes enzymes interact with each other to produce small compounds or other macromolecules.

Muscle contraction

Physiology of muscle contraction involves several interactions. Myosin filaments act as molecular motors and by binding to actin enables filament sliding. Furthermore, members of the skeletal muscle lipid droplet-associated proteins family associate with other proteins, as activator of adipose triglyceride lipase and its coactivator comparative gene identification-58, to regulate lipolysis in skeletal muscle.

Types

To describe the types of protein–protein interactions (PPIs) it is important to consider that proteins can interact in a "transient" way (to produce some specific effect in a short time) or to interact with other proteins in a "stable" way to build multiprotein complexes that are molecular machines within the living systems. A protein complex assembly can result in the formation of homo-oligomeric or hetero-oligomeric complexes. In addition to the conventional complexes, as enzyme-inhibitor and antibody-antigen, interactions can also be established between domain-domain and domain-peptide. Another important distinction to identify protein-protein interactions is the way they have been determined, since there are techniques that measure direct physical interactions between protein pairs, named “binary” methods, while there are other techniques that measure physical interactions among groups of proteins, without pairwise determination of protein partners, named “co-complex” methods.

Homo-oligomers vs. hetero-oligomers

Homo-oligomers are macromolecular complexes constituted by only one type of protein subunit. Protein subunits assembly is guided by the establishment of non-covalent interactions in the quaternary structure of the protein. Disruption of homo-oligomers in order to return to the initial individual monomers often requires denaturation of the complex. Several enzymes, carrier proteins, scaffolding proteins, and transcriptional regulatory factors carry out their functions as homo-oligomers. Distinct protein subunits interact in hetero-oligomers, which are essential to control several cellular functions. The importance of the communication between heterologous proteins is even more evident during cell signaling events and such interactions are only possible due to structural domains within the proteins (as described below).

Stable interactions vs. transient interactions

Stable interactions involve proteins that interact for a long time, taking part of permanent complexes as subunits, in order to carry out structural or functional roles. These are usually the case of homo-oligomers (e.g. cytochrome c), and some hetero-oligomeric proteins, as the subunits of ATPase. On the other hand, a protein may interact briefly and in a reversible manner with other proteins in only certain cellular contexts – cell type, cell cycle stage, external factors, presence of other binding proteins, etc. – as it happens with most of the proteins involved in biochemical cascades. These are called transient interactions. For example, some G protein-coupled receptors only transiently bind to G_i/o proteins when they are activated by extracellular ligands, while some G_q-coupled receptors, such as muscarinic receptor M3, pre-couple with G_q proteins prior to the receptor-ligand binding. Interactions between intrinsically disordered protein regions to globular protein domains (i.e. MoRFs) are transient interactions.

Covalent vs. non-covalent

Covalent interactions are those with the strongest association and are formed by disulphide bonds or electron sharing. Although being rare, these interactions are determinant in some posttranslational modifications, as ubiquitination and SUMOylation. Non-covalent bonds are usually established during transient interactions by the combination of weaker bonds, such as hydrogen bonds, ionic interactions, Van der Waals forces, or hydrophobic bonds.

Role of water

Water molecules play a significant role in the interactions between proteins. The crystal structures of complexes, obtained at high resolution from different but homologous proteins, have shown that some interface water molecules are conserved between homologous complexes. The majority of the interface water molecules make hydrogen bonds with both partners of each complex. Some interface amino acid residues or atomic groups of one protein partner engage in both direct and water mediated interactions with the other protein partner. Doubly indirect interactions, mediated by two water molecules, are more numerous in the homologous complexes of low affinity. Carefully conducted mutagenesis experiments, e.g. changing a tyrosine residue into a phenylalanine, have shown that water mediated interactions can contribute to the energy of interaction. Thus, water molecules may facilitate the interactions and cross-recognitions between proteins.

Structure

Crystal structure of modified Gramicidin S horizontally determined by X-ray crystallography

NMR structure of cytochrome C illustrating its dynamics in solution

The molecular structures of many protein complexes have been unlocked by the technique of X-ray crystallography. The first structure to be solved by this method was that of sperm whale myoglobin by Sir John Cowdery Kendrew. In this technique the angles and intensities of a beam of X-rays diffracted by crystalline atoms are detected in a film, thus producing a three-dimensional picture of the density of electrons within the crystal.

Later, nuclear magnetic resonance also started to be applied with the aim of unravelling the molecular structure of protein complexes. One of the first examples was the structure of calmodulin-binding domains bound to calmodulin. This technique is based on the study of magnetic properties of atomic nuclei, thus determining physical and chemical properties of the correspondent atoms or the molecules. Nuclear magnetic resonance is advantageous for characterizing weak PPIs.

Domains

Proteins hold structural domains that allow their interaction with and bind to specific sequences on other proteins:

Src homology 2 (SH2) domain

SH2 domains are structurally composed by three-stranded twisted beta sheet sandwiched flanked by two alpha-helices. The existence of a deep binding pocket with high affinity for phosphotyrosine, but not for phosphoserine or phosphothreonine, is essential for the recognition of tyrosine phosphorylated proteins, mainly autophosphorylated growth factor receptors. Growth factor receptor binding proteins and phospholipase Cγ are examples of proteins that have SH2 domains.

Src homology 3 (SH3) domain

Structurally, SH3 domains are constituted by a beta barrel formed by two orthogonal beta sheets and three anti-parallel beta strands. These domains recognize proline enriched sequences, as polyproline type II helical structure (PXXP motifs)^{[verification needed]} in cell signaling proteins like protein tyrosine kinases and the growth factor receptor bound protein 2 (Grb2).

Phosphotyrosine-binding (PTB) domain

PTB domains interact with sequences that contain a phosphotyrosine group. These domains can be found in the insulin receptor substrate.

LIM domain

LIM domains were initially identified in three homeodomain transcription factors (lin11, is11, and mec3). In addition to this homeodomain proteins and other proteins involved in development, LIM domains have also been identified in non-homeodomain proteins with relevant roles in cellular differentiation, association with cytoskeleton and senescence. These domains contain a tandem cysteine-rich Zn²⁺-finger motif and embrace the consensus sequence CX2CX16-23HX2CX2CX2CX16-21CX2C/H/D. LIM domains bind to PDZ domains, bHLH transcription factors, and other LIM domains.

Sterile alpha motif (SAM) domain

SAM domains are composed by five helices forming a compact package with a conserved hydrophobic core. These domains, which can be found in the Eph receptor and the stromal interaction molecule (STIM) for example, bind to non-SAM domain-containing proteins and they also appear to have the ability to bind RNA.

PDZ domain

PDZ domains were first identified in three guanylate kinases: PSD-95, DlgA and ZO-1. These domains recognize carboxy-terminal tri-peptide motifs (S/TXV), other PDZ domains or LIM domains and bind them through a short peptide sequence that has a C-terminal hydrophobic residue. Some of the proteins identified as having PDZ domains are scaffolding proteins or seem to be involved in ion receptor assembling and receptor-enzyme complexes formation.

FERM domain

FERM domains contain basic residues capable of binding PtdIns(4,5)P₂. Talin and focal adhesion kinase (FAK) are two of the proteins that present FERM domains.

Calponin homology (CH) domain

CH domains are mainly present in cytoskeletal proteins as parvin.

Pleckstrin homology domain

Pleckstrin homology domains bind to phosphoinositides and acid domains in signaling proteins.

WW domain

WW domains bind to proline enriched sequences.

WSxWS motif

Found in cytokine receptors

Properties of the interface

The study of the molecular structure can give fine details about the interface that enables the interaction between proteins. When characterizing PPI interfaces it is important to take into account the type of complex.

Parameters evaluated include size (measured in absolute dimensions Å² or in solvent-accessible surface area (SASA)), shape, complementarity between surfaces, residue interface propensities, hydrophobicity, segmentation and secondary structure, and conformational changes on complex formation.

The great majority of PPI interfaces reflects the composition of protein surfaces, rather than the protein cores, in spite of being frequently enriched in hydrophobic residues, particularly in aromatic residues. PPI interfaces are dynamic and frequently planar, although they can be globular and protruding as well. Based on three structures – insulin dimer, trypsin-pancreatic trypsin inhibitor complex, and oxyhaemoglobin – Cyrus Chothia and Joel Janin found that between 1,130 and 1,720 Å² of surface area was removed from contact with water indicating that hydrophobicity is a major factor of stabilization of PPIs. Later studies refined the buried surface area of the majority of interactions to 1,600±350 Å². However, much larger interaction interfaces were also observed and were associated with significant changes in conformation of one of the interaction partners. PPIs interfaces exhibit both shape and electrostatic complementarity.

Regulation

Protein concentration, which in turn are affected by expression levels and degradation rates;
Protein affinity for proteins or other binding ligands;
Ligands concentrations (substrates, ions, etc.);
Presence of other proteins, nucleic acids, and ions;
Electric fields around proteins;
Occurrence of covalent modifications.

Experimental Methods

There are a multitude of methods to detect them. Each of the approaches has its own strengths and weaknesses, especially with regard to the sensitivity and specificity of the method. The most conventional and widely used high-throughput methods are yeast two-hybrid screening and affinity purification coupled to mass spectrometry.

Principles of yeast and mammalian two-hybrid systems

Yeast two-hybrid screening

This system was firstly described in 1989 by Fields and Song using Saccharomyces cerevisiae as biological model. Yeast two hybrid allows the identification of pairwise PPIs (binary method) in vivo, in which the two proteins are tested for biophysically direct interaction. The Y2H is based on the functional reconstitution of the yeast transcription factor Gal4 and subsequent activation of a selective reporter such as His3. To test two proteins for interaction, two protein expression constructs are made: one protein (X) is fused to the Gal4 DNA-binding domain (DB) and a second protein (Y) is fused to the Gal4 activation domain (AD). In the assay, yeast cells are transformed with these constructs. Transcription of reporter genes does not occur unless bait (DB-X) and prey (AD-Y) interact with each other and form a functional Gal4 transcription factor. Thus, the interaction between proteins can be inferred by the presence of the products resultant of the reporter gene expression. In cases in which the reporter gene expresses enzymes that allow the yeast to synthesize essential amino acids or nucleotides, yeast growth under selective media conditions indicates that the two proteins tested are interacting.

Despite its usefulness, the yeast two-hybrid system has limitations. It uses yeast as main host system, which can be a problem when studying proteins that contain mammalian-specific post-translational modifications. The number of PPIs identified is usually low because of a high false negative rate; and, understates membrane proteins, for example.

In initial studies that utilized Y2H, proper controls for false positives (e.g. when DB-X activates the reporter gene without the presence of AD-Y) were frequently not done, leading to a higher than normal false positive rate. An empirical framework must be implemented to control for these false positives. Limitations in lower coverage of membrane proteins have been overcoming by the emergence of yeast two-hybrid variants, such as the membrane yeast two-hybrid (MYTH) and the split-ubiquitin system, which are not limited to interactions that occur in the nucleus; and, the bacterial two-hybrid system, performed in bacteria.

Principle of tandem affinity purification

Affinity purification coupled to mass spectrometry

Affinity purification coupled to mass spectrometry mostly detects stable interactions and thus better indicates functional in vivo PPIs. This method starts by purification of the tagged protein, which is expressed in the cell usually at in vivo concentrations, and its interacting proteins (affinity purification). One of the most advantageous and widely used method to purify proteins with very low contaminating background is the tandem affinity purification, developed by Bertrand Seraphin and Matthias Mann and respective colleagues. PPIs can then be quantitatively and qualitatively analysed by mass spectrometry using different methods: chemical incorporation, biological or metabolic incorporation (SILAC), and label-free methods.

Nucleic acid programmable protein array

This system was first developed by LaBaer and colleagues in 2004 by using in vitro transcription and translation system. They use DNA template encoding the gene of interest fused with GST protein, and it was immobilized in the solid surface. Anti-GST antibody and biotinylated plasmid DNA were bounded in aminopropyltriethoxysilane (APTES)-coated slide. BSA can improve the binding efficiency of DNA. Biotinylated plasmid DNA was bound by avidin. New protein was synthesized by using cell-free expression system i.e. rabbit reticulocyte lysate (RRL), and then the new protein was captured through anti-GST antibody bounded on the slide. To test protein-protein interaction, the targeted protein cDNA and query protein cDNA were immobilized in a same coated slide. By using in vitro transcription and translation system, targeted and query protein was synthesized by the same extract. The targeted protein was bound to array by antibody coated in the slide and query protein was used to probe the array. The query protein was tagged with hemagglutinin (HA) epitope. Thus, the interaction between the two proteins was visualized with the antibody against HA.

Other potential methods

Diverse techniques to identify PPIs have been emerging along with technology progression. These include co-immunoprecipitation, protein microarrays, analytical ultracentrifugation, light scattering, fluorescence spectroscopy, luminescence-based mammalian interactome mapping (LUMIER), resonance-energy transfer systems, mammalian protein–protein interaction trap, electro-switchable biosurfaces, protein-fragment complementation assay, as well as real-time label-free measurements by surface plasmon resonance, and calorimetry.

Computational methods

Computational Prediction of Protein-Protein Interactions

The experimental detection and characterization of PPIs is labor intensive and time-consuming. However, many PPIs can be also predicted computationally, usually using experimental data as a starting point. However, methods have also been developed that allow the prediction of PPI de novo, that is without prior evidence for these interactions.

Genomic Context Methods

The Rosetta Stone or Domain Fusion method is based on the hypothesis that interacting proteins are sometimes fused into a single protein in another genome. Therefore, we can predict if two proteins may be interacting by determining if they each have non-overalaping sequence similarity to a region of a single protein sequence in another genome.

The Conserved Neighborhood method is based on the hypothesis that if genes encoding two proteins are neighbors on a chromosome in many genomes, then they are likely functionally related (and possibly physically interacting).

The Phylogenetic Profile method is based on the hypothesis that if two or more proteins are concurrently present or absent across several genomes, then they are likely functionally related. Therefore, potentially interacting proteins can be identified by determining the presence or absence of genes across many genomes and selecting those genes which are always present or absent together.

Text mining methods

Text mining protocol.

Publicly available information from biomedical documents is readily accessible through the internet and is becoming a powerful resource for collecting known protein-protein interactions (PPIs), PPI prediction and protein docking. Text mining is much less costly and time-consuming compared to other high-throughput techniques. Currently, text mining methods generally detect binary relations between interacting proteins from individual sentences using rule/pattern-based information extraction and machine learning approaches. A wide variety of text mining applications for PPI extraction and/or prediction are available for public use, as well as repositories which often store manually validated and/or computationally predicted PPIs. Text mining can be implemented in two stages: information retrieval, where texts containing names of either or both interacting proteins are retrieved and information extraction, where targeted information (interacting proteins, implicated residues, interaction types, etc.) is extracted.

There are also studies using phylogenetic profiling, basing their functionalities on the theory that proteins involved in common pathways co-evolve in a correlated fashion across species. Some more complex text mining methodologies use advanced Natural Language Processing (NLP) techniques and build knowledge networks (for example, considering gene names as nodes and verbs as edges). Other developments involve kernel methods to predict protein interactions.

Machine learning methods

These methods use machine learning to distinguish how interacting protein pairs differ from non-interacting protein pairs in terms of pairwise features such as cellular colocalization, gene co-expression, how closely located on a DNA are the genes that encode the two proteins, and so on. Random Forest has been found to be most-effective machine learning method for protein interaction prediction. Such methods have been applied for discovering protein interactions on human interactome, specifically the interactome of Membrane proteins and the interactome of Schizophrenia-associated proteins.

Databases

Large scale identification of PPIs generated hundreds of thousands of interactions, which were collected together in specialized biological databases that are continuously updated in order to provide complete interactomes. The first of these databases was the Database of Interacting Proteins (DIP). Since that time, the number of public databases has been increasing. Databases can be subdivided into primary databases, meta-databases, and prediction databases.

Primary databases collect information about published PPIs proven to exist via small-scale or large-scale experimental methods. Examples: DIP, Biomolecular Interaction Network Database (BIND), Biological General Repository for Interaction Datasets (BioGRID), Human Protein Reference Database (HPRD), IntAct Molecular Interaction Database, Molecular Interactions Database (MINT), MIPS Protein Interaction Resource on Yeast (MIPS-MPact), and MIPS Mammalian Protein–Protein Interaction Database (MIPS-MPPI).

Meta-databases normally result from the integration of primary databases information, but can also collect some original data. Examples: Agile Protein Interactomes Dataserver (APID), The Microbial Protein Interaction Database (MPIDB), Protein Interaction Network Analysis (PINA) platform, (GPS-Prot), and Wiki-Pi.

Prediction databases include many PPIs that are predicted using several techniques (main article). Examples: Human Protein–Protein Interaction Prediction Database (PIPs), Interlogous Interaction Database (I2D), Known and Predicted Protein–Protein Interactions (STRING-db), and Unified Human Interactive (UniHI).

The aforementioned computational methods all depend on source databases whose data can be extrapolated to predict novel protein-protein interactions. Coverage differs greatly between databases. In general, primary databases have the fewest total protein interactions recorded as they do not integrate data from multiple other databases, while prediction databases have the most because they include other forms of evidence in addition to experimental. For example, the primary database IntAct has 572,063 interactions, the meta-database APID has 678,000 interactions, and the predictive database STRING has 25,914,693 interactions. However, it is important to note that some of the interactions in the STRING database are only predicted by computational methods such as Genomic Context and not experimentally verified.

Interaction networks

Schizophrenia PPI.

Information found in PPIs databases supports the construction of interaction networks. Although the PPI network of a given query protein can be represented in textbooks, diagrams of whole cell PPIs are frankly complex and difficult to generate.

One example of a manually produced molecular interaction map is the Kurt Kohn's 1999 map of cell cycle control. Drawing on Kohn's map, Schwikowski et al. in 2000 published a paper on PPIs in yeast, linking 1,548 interacting proteins determined by two-hybrid screening. They used a layered graph drawing method to find an initial placement of the nodes and then improved the layout using a force-based algorithm.

Bioinformatic tools have been developed to simplify the difficult task of visualizing molecular interaction networks and complement them with other types of data. For instance, Cytoscape is an open-source software widely used and lots of plugins are currently available. Pajek software is advantageous for the visualization and analysis of very large networks.

Identification of functional modules in PPI networks is an important challenge in bioinformatics. Functional modules means a set of proteins that are highly connected to each other in PPI network. It is almost similar problem as community detection in social networks. There are some methods such as Jactive modules and MoBaS. Jactive modules integrate PPI network and gene expression data where as MoBaS integrate PPI network and Genome Wide association Studies.

The awareness of the major roles of PPIs in numerous physiological and pathological processes has been driving the challenge of unravel many interactomes. Examples of published interactomes are the thyroid specific DREAM interactome and the PP1α interactome in human brain.

Protein-protein relationships are often the result of multiple types of interactions or are deduced from different approaches, including co-localization, direct interaction, suppressive genetic interaction, additive genetic interaction, physical association, and other associations.

Signed interaction networks

The protein protein interactions are displayed in a signed network that describes what type of interactions that are taking place

Protein–protein interactions often result in one of the interacting proteins either being 'activated' or 'repressed'. Such effects can be indicated in a PPI network by "signs" (e.g. "activation" or "inhibition"). Although such attributes have been added to networks for a long time, Vinayagam et al. (2014) coined the term Signed network for them. Signed networks are often expressed by labeling the interaction as either positive or negative. A positive interaction is one where the interaction results in one of the proteins being activated. Conversely a negative interaction indicates that one of the proteins being inactivated.

Protein–protein interaction networks are often constructed as a result of lab experiments such as yeast two hybrid screens or 'affinity purification and subsequent mass spectrometry techniques. However these methods do not provide the layer of information needed in order to determine what type of interaction is present in order to be able to attribute signs to the network diagrams.

RNA interference screens

RNA interference (RNAi) screens (repression of individual proteins between transcription and translation) are one method that can be utilized in the process of providing signs to the protein-protein interactions. Individual proteins are repressed and the resulting phenotypes are analyzed. A correlating phenotypic relationship (i.e. where the inhibition of either of two proteins results in the same phenotype) indicates a positive, or activating relationship. Phenotypes that do not correlate (i.e. where the inhibition of either of two proteins results in two different phenotypes) indicate a negative or inactivating relationship. If protein A is dependent on protein B for activation then the inhibition of either protein A or B will result in a cell losing the service that is provided by protein A and the phenotypes will be the same for the inhibition of either A or B. If, however, protein A is inactivated by protein B then the phenotypes will differ depending on which protein is inhibited (inhibit protein B and it can no longer inactivate protein A leaving A active however inactivate A and there is nothing for B to activate since A is inactive and the phenotype changes). Multiple RNAi screens need to be performed in order to reliably appoint a sign to a given protein-protein interaction. Vinayagam et al. who devised this technique state that a minimum of nine RNAi screens are required with confidence increasing as one carries out more screens.

As therapeutic targets

Modulation of PPI is challenging and is receiving increasing attention by the scientific community. Several properties of PPI such as allosteric sites and hotspots, have been incorporated into drug-design strategies. The relevance of PPI as putative therapeutic targets for the development of new treatments is particularly evident in cancer, with several ongoing clinical trials within this area. The consensus among these promising targets is, nonetheless, denoted in the already available drugs on the market to treat a multitude of diseases. Examples are Tirobifan, inhibitor of the glycoprotein IIb/IIIa, used as a cardiovascular drug, and Maraviroc, inhibitor of the CCR5-gp120 interaction, used as anti-HIV drug. Recently, Amit Jaiswal and others were able to develop 30 peptides using protein–protein interaction studies to inhibit telomerase recruitment towards telomeres.

RNA-binding protein

From Wikipedia, the free encyclopedia

RNA-binding proteins (often abbreviated as RBPs) are proteins that bind to the double or single stranded RNA in cells and participate in forming ribonucleoprotein complexes. RBPs contain various structural motifs, such as RNA recognition motif (RRM), dsRNA binding domain, zinc finger and others. They are cytoplasmic and nuclear proteins. However, since most mature RNA is exported from the nucleus relatively quickly, most RBPs in the nucleus exist as complexes of protein and pre-mRNA called heterogeneous ribonucleoprotein particles (hnRNPs). RBPs have crucial roles in various cellular processes such as: cellular function, transport and localization. They especially play a major role in post-transcriptional control of RNAs, such as: splicing, polyadenylation, mRNA stabilization, mRNA localization and translation. Eukaryotic cells encode diverse RBPs, approximately 500 genes, with unique RNA-binding activity and protein–protein interaction. During evolution, the diversity of RBPs greatly increased with the increase in the number of introns. Diversity enabled eukaryotic cells to utilize RNA exons in various arrangements, giving rise to a unique RNP (ribonucleoprotein) for each RNA. Although RBPs have a crucial role in post-transcriptional regulation in gene expression, relatively few RBPs have been studied systematically.

Structure

Many RBPs have modular structures and are composed of multiple repeats of just a few specific basic domains that often have limited sequences. These sequences are then arranged in varying combinations to fulfill the need for diversity. A specific protein's recognition of a specific RNA has evolved through the rearrangement of these few basic domains. Each basic domain recognizes RNA, but many of these proteins require multiple copies of one of the many common domains to function.

Diversity

As nuclear RNA emerges from RNA polymerase, RNA transcripts are immediately covered with RNA-binding proteins that regulate every aspect of RNA metabolism and function including RNA biogenesis, maturation, transport, cellular localization and stability. All RBPs bind RNA, however they do so with different RNA-sequence specificities and affinities, which allows the RBPs to be as diverse as their targets and functions. These targets include mRNA, which codes for proteins, as well as a number of functional non-coding RNAs. NcRNAs almost always function as ribonucleoprotein complexes and not as naked RNAs. These non-coding RNAs include microRNAs, small interfering RNAs (siRNA), as well as splicesomal small nuclear RNAs (snRNA).

Function

RNA processing and modification

Alternative splicing

Alternative splicing is a mechanism by which different forms of mature mRNAs (messengers RNAs) are generated from the same gene. It is a regulatory mechanism by which variations in the incorporation of the exons into mRNA leads to the production of more than one related protein, thus expanding possible genomic outputs. RBPs function extensively in the regulation of this process. Some binding proteins such as neuronal specific RNA-binding proteins, namely NOVA1, control the alternative splicing of a subset of hnRNA by recognizing and binding to a specific sequence in the RNA (YCAY where Y indicates pyrimidine, U or C).^[4] These proteins then recruit splicesomal proteins to this target site. SR proteins are also well known for their role in alternative splicing through the recruitment of snRNPs that form the splicesome, namely U1 snRNP and U2AF snRNP. However, RBPs are also part of the splicesome itself. The splicesome is a complex of snRNA and protein subunits and acts as the mechanical agent that removes introns and ligates the flanking exons.^[5] Other than core splicesome complex, RBPs also bind to the sites of Cis-acting RNA elements that influence exons inclusion or exclusion during splicing. These sites are referred to as exonic splicing enhancers (ESEs), exonic splicing silencers (ESSs), intronic splicing enhancers (ISEs) and intronic splicing silencers (ISSs) and depending on their location of binding, RBPs work as splicing silencers or enhancers.

RNA editing

ADAR : an RNA binding protein involved in RNA editing events.

The most extensively studied form of RNA editing involves the ADAR protein. This protein functions through post-transcriptional modification of mRNA transcripts by changing the nucleotide content of the RNA. This is done through the conversion of adenosine to inosine in an enzymatic reaction catalyzed by ADAR. This process effectively changes the RNA sequence from that encoded by the genome and extends the diversity of the gene products. The majority of RNA editing occurs on non-coding regions of RNA; however, some protein-encoding RNA transcripts have been shown to be subject to editing resulting in a difference in their protein's amino acid sequence. An example of this is the glutamate receptor mRNA where glutamine is converted to arginine leading to a change in the functionality of the protein.

Polyadenylation

Polyadenylation is the addition of a "tail" of adenylate residues to an RNA transcript about 20 bases downstream of the AAUAAA sequence within the three prime untranslated region. Polyadenylation of mRNA has a strong effect on its nuclear transport, translation efficiency, and stability. All of these as well as the process of polyadenylation depend on binding of specific RBPs. All eukaryotic mRNAs with few exceptions are processed to receive 3' poly (A) tails of about 200 nucleotides. One of the necessary protein complexes in this process is CPSF. CPSF binds to the 3' tail (AAUAAA) sequence and together with another protein called poly(A)-binding protein, recruits and stimulates the activity of poly(A) polymerase. Poly(A) polymerase is inactive on its own and requires the binding of these other proteins to function properly.

Export

After processing is complete, mRNA needs to be transported from the cell nucleus to cytoplasm. This is a three-step process involving the generation of a cargo-carrier complex in the nucleus followed by translocation of the complex through the nuclear pore complex and finally release of the cargo into cytoplasm. The carrier is then subsequently recycled. TAP/NXF1:p15 heterodimer is thought to be the key player in mRNA export. Over-expression of TAP in Xenopus laevis frogs increases the export of transcripts that are otherwise inefficiently exported. However TAP needs adaptor proteins because it is unable interact directly with mRNA. Aly/REF protein interacts and binds to the mRNA recruiting TAP.

mRNA localization

mRNA localization is critical for regulation of gene expression by allowing spatially regulated protein production. Through mRNA localization proteins are transcribed in their intended target site of the cell. This is especially important during early development when rapid cell cleavages give different cells various combinations of mRNA which can then lead to drastically different cell fates. RBPs are critical in the localization of this mRNA that insures proteins are only transcribed in their intended regions. One of these proteins is ZBP1. ZBP1 binds to beta-actin mRNA at the site of transcription and moves with mRNA into the cytoplasm. It then localizes this mRNA to the lamella region of several asymmetric cell types where it can then be translated.^[4] FMRP is another RBP involved in RNA localization. It was shown that in addition to other functions for FMRP in RNA metabolism, FMRP is involved in the stimulus-induced localization of several dendritic mRNAs in neuronal dendrites.

Translation

Translational regulation provides a rapid mechanism to control gene expression. Rather than controlling gene expression at the transcriptional level, mRNA is already transcribed but the recruitment of ribosomes is controlled. This allows rapid generation of proteins when a signal activates translation. ZBP1 in addition to its role in the localization of B-actin mRNA is also involved in the translational repression of beta-actin mRNA by blocking translation initiation. ZBP1 must be removed from the mRNA to allow the ribosome to properly bind and translation to begin.

Protein–RNA interactions

RNA-binding proteins exhibit highly specific recognition of their RNA targets by recognizing their sequences and structures. Specific binding of the RNA-binding proteins allow them to distinguish their targets and regulate a variety of cellular functions via control of the generation, maturation, and lifespan of the RNA transcript. This interaction begins during transcription as some RBPs remain bound to RNA until degradation whereas others only transiently bind to RNA to regulate RNA splicing, processing, transport, and localization. In this section, three classes of the most widely studied RNA-binding domains (RNA-recognition motif, double-stranded RNA-binding motif, zinc-finger motif) will be discussed.

RNA-recognition motif (RRM)

The RNA recognition motif, which is the most common RNA-binding motif, is a small protein domain of 75–85 amino acids that forms a four-stranded β-sheet against the two α-helices. This recognition motif exerts its role in numerous cellular functions, especially in mRNA/rRNA processing, splicing, translation regulation, RNA export, and RNA stability. Ten structures of an RRM have been identified through NMR spectroscopy and X-ray crystallography. These structures illustrate the intricacy of protein–RNA recognition of RRM as it entails RNA–RNA and protein–protein interactions in addition to protein–RNA interactions. Despite their complexity, all ten structures have some common features. All RRMs' main protein surfaces' four-stranded β-sheet was found to interact with the RNA, which usually contacts two or three nucleotides in a specific manner. In addition, strong RNA binding affinity and specificity towards variation are achieved through an interaction between the inter-domain linker and the RNA and between RRMs themselves. This plasticity of the RRM explains why RRM is the most abundant domain and why it plays an important role in various biological functions.

Double-stranded RNA-binding motif

dsRBD from rat ADAR2 protein (PDB: 2b7t).

Identifiers

Symbol

drrm

Pfam clan

1di2 / SUPFAM

Available protein structures:

Use the Pfam clan for the homologous superfamily.

The double-stranded RNA-binding motif (dsRM, dsRBD), a 70–75 amino-acid domain, plays a critical role in RNA processing, RNA localization, RNA interference, RNA editing, and translational repression. All three structures of the domain solved as of 2005 possess uniting features that explain how dsRMs only bind to dsRNA instead of dsDNA. The dsRMs were found to interact along the RNA duplex via both α-helices and β1-β2 loop. Moreover, all three dsRBM structures make contact with the sugar-phosphate backbone of the major groove and of one minor groove, which is mediated by the β1-β2 loop along with the N-terminus region of the alpha helix 2. This interaction is a unique adaptation for the shape of an RNA double helix as it involves 2'-hydroxyls and phosphate oxygen. Despite the common structural features among dsRBMs, they exhibit distinct chemical frameworks, which permits specificity for a variety for RNA structures including stem-loops, internal loops, bulges or helices containing mismatches.

Zinc fingers

"Zinc finger" : Cartoon representation of the zinc-finger motif of proteins. The zinc ion (green) is coordinated by two histidine and two cysteine amino acid residues.

CCHH-type zinc-finger domains are the most common DNA-binding domain within the eukaryotic genome. In order to attain high sequence-specific recognition of DNA, several zinc fingers are utilized in a modular fashion. Zinc fingers exhibit ββα protein fold in which a β-hairpin and a α-helix are joined together via a Zn²⁺ ion. Furthermore, the interaction between protein side-chains of the α-helix with the DNA bases in the major groove allows for the DNA-sequence-specific recognition. Despite its wide recognition of DNA, there has been recent discoveries that zinc fingers also have the ability to recognize RNA. In addition to CCHH zinc fingers, CCCH zinc fingers were recently discovered to employ sequence-specific recognition of single-stranded RNA through an interaction between intermolecular hydrogen bonds and Watson-Crick edges of the RNA bases. CCHH-type zinc fingers employ two methods of RNA binding. First, the zinc fingers exert non-specific interaction with the backbone of a double helix whereas the second mode allows zinc fingers to specifically recognize the individual bases that bulge out. Differing from the CCHH-type, the CCCH-type zinc finger displays another mode of RNA binding, in which single-stranded RNA is identified in a sequence-specific manner. Overall, zinc fingers can directly recognize DNA via binding to dsDNA sequence and RNA via binding to ssRNA sequence.

Role in embryonic development

Crawling C. elegans hermaphrodite worm

RNA-binding proteins' transcriptional and post-transcriptional regulation of RNA has a role in regulating the patterns of gene expression during development. Extensive research on the nematode C. elegans has identified RNA-binding proteins as essential factors during germline and early embryonic development. Their specific function involves the development of somatic tissues (neurons, hypodermis, muscles and excretory cells) as well as providing timing cues for the developmental events. Nevertheless, it is exceptionally challenging to discover the mechanism behind RBPs' function in development due to the difficulty in identifying their RNA targets. This is because most RBPs usually have multiple RNA targets. However, it is indisputable that RBPs exert a critical control in regulating developmental pathways in a concerted manner.

Germline development

In Drosophila melanogaster, Elav, Sxl and tra-2 are RNA-binding protein encoding genes that are critical in the early sex determination and the maintenance of the somatic sexual state. These genes impose effects on the post-transcriptional level by regulating sex-specific splicing in Drosophila. Sxl exerts positive regulation of the feminizing gene tra to produce a functional tra mRNA in females. In C. elegans, RNA-binding proteins including FOG-1, MOG-1/-4/-5 and RNP-4 regulate germline and somatic sex determination. Furthermore, several RBPs such as GLD-1, GLD-3, DAZ-1, PGL-1 and OMA-1/-2 exert their regulatory functions during meiotic prophase progression, gametogenesis, and oocyte maturation.

Somatic development

In addition to RBPs' functions in germline development, post-transcriptional control also plays a significant role in somatic development. Differing from RBPs that are involved in germline and early embryo development, RBPs functioning in somatic development regulate tissue-specific alternative splicing of the mRNA targets. For instance, MEC-8 and UNC-75 containing RRM domains localize to regions of hypodermis and nervous system, respectively. Furthermore, another RRM-containing RBP, EXC-7, is revealed to localize in embryonic excretory canal cells and throughout the nervous system during somatic development.

Neuronal development

ZBP1 was shown to regulate dendritogenesis (dendrite formation) in hippocampal neurons. Other RNA-binding proteins involved in dendrite formation are Pumilio and Nanos, FMRP, CPEB and Staufen 1

Role in cancer

RBPs are emerging to play a crucial role in tumor development. Hundreds of RBPs are markedly dysregulated across human cancers and showed predominant downregulation in tumors related to normal tissues. Many RBPs are differentially expressed in different cancer types for example KHDRBS1(Sam68), ELAVL1(HuR), FXR1. For some RBPs, the change in expression are related with Copy Number Variations (CNV), for example CNV gains of BYSL in colorectal cancer cells. and ESRP1, CELF3 in breast cancer, RBM24 in liver cancer, IGF2BP2, IGF2BP3 in lung cancer or CNV losses of KHDRBS2 in lung cancer. Some expression changes are cause due to protein affecting mutations on these RBPs for example NSUN6, ZC3H13, ELAC1, RBMS3, and ZGPAT, SF3B1, SRSF2, RBM10, U2AF1, SF3B1, PPRC1, RBMXL1, HNRNPCL1 etc. Several studies have related this change in expression of RBPs to aberrant alternative splicing in cancer.

Current research

"CIRBP" : Structure of the CIRBP protein.

As RNA-binding proteins exert significant control over numerous cellular functions, they have been a popular area of investigation for many researchers. Due to its importance in the biological field, numerous discoveries regarding RNA-binding proteins' potentials have been recently unveiled. Recent development in experimental identification of RNA-binding proteins has extended the number of RNA-binding proteins significantly.

RNA-binding protein Sam68 controls the spatial and temporal compartmentalization of RNA metabolism to attain proper synaptic function in dendrites. Loss of Sam68 results in abnormal posttranscriptional regulation and ultimately leads to neurological disorders such as fragile X-associated tremor/ataxia syndrome. Sam68 was found to interact with the mRNA encoding β-actin, which regulates the synaptic formation of the dendritic spines with its cytoskeletal components. Therefore, Sam68 plays a critical role in regulating synapse number via control of postsynaptic β-actin mRNA metabolism.

"Beta-actin" : Structure of the ACTB protein.

Neuron-specific CELF family RNA-binding protein UNC-75 specifically binds to the UUGUUGUGUUGU mRNA stretch via its three RNA recognition motifs for the exon 7a selection in C. elegans' neuronal cells. As exon 7a is skipped due to its weak splice sites in non-neuronal cells, UNC-75 was found to specifically activate splicing between exon 7a and exon 8 only in the neuronal cells.

The cold inducible RNA binding protein CIRBP plays a role in controlling the cellular response upon confronting a variety of cellular stresses, including short wavelength ultraviolet light, hypoxia, and hypothermia. This research yielded potential implications for the association of disease states with inflammation.

Serine-arginine family of RNA-binding protein Slr1 was found exert control on the polarized growth in Candida albicans. Slr1 mutations in mice results in decreased filamentation and reduces damage to epithelial and endothelial cells that leads to extended survival rate compared to the Slr1 wild-type strains. Therefore, this research reveals that SR-like protein Slr1 plays a role in instigating the hyphal formation and virulence in C. albicans.

Search This Blog

Thursday, July 25, 2019

Examples

Types

Homo-oligomers vs. hetero-oligomers

Stable interactions vs. transient interactions

Covalent vs. non-covalent

Role of water

Structure

Domains

Properties of the interface

Regulation

Experimental Methods

Yeast two-hybrid screening

Affinity purification coupled to mass spectrometry

Nucleic acid programmable protein array

Other potential methods

Computational methods

Genomic Context Methods

Text mining methods

Machine learning methods

Databases

Interaction networks

Signed interaction networks

RNA interference screens

As therapeutic targets

Structure

Diversity

Function

RNA processing and modification

Alternative splicing

RNA editing

Polyadenylation

Export

mRNA localization

Translation

Protein–RNA interactions

RNA-recognition motif (RRM)

Double-stranded RNA-binding motif

Zinc fingers

Role in embryonic development

Germline development

Somatic development

Neuronal development

Role in cancer

Current research