RNA-binding proteins (often abbreviated as RBPs) are proteins that bind to the double or single stranded RNA in cells and participate in forming ribonucleoprotein complexes. RBPs contain various structural motifs, such as RNA recognition motif (RRM), dsRNA binding domain, zinc finger and others. They are cytoplasmic and nuclear proteins. However, since most mature RNA is exported from the nucleus relatively quickly, most RBPs in the nucleus exist as complexes of protein and pre-mRNA called heterogeneous ribonucleoprotein particles (hnRNPs). RBPs have crucial roles in various cellular processes such as: cellular function, transport and localization. They especially play a major role in post-transcriptional control of RNAs, such as: splicing, polyadenylation, mRNA stabilization, mRNA localization and translation. Eukaryotic cells encode diverse RBPs, approximately 500 genes, with unique RNA-binding activity and protein–protein interaction. During evolution, the diversity of RBPs greatly increased with the increase in the number of introns. Diversity enabled eukaryotic cells to utilize RNA exons in various arrangements, giving rise to a unique RNP (ribonucleoprotein) for each RNA. Although RBPs have a crucial role in post-transcriptional regulation in gene expression, relatively few RBPs have been studied systematically.
Structure
Many
RBPs have modular structures and are composed of multiple repeats of
just a few specific basic domains that often have limited sequences.
These sequences are then arranged in varying combinations to fulfill the
need for diversity. A specific protein's recognition of a specific RNA
has evolved through the rearrangement of these few basic domains. Each
basic domain recognizes RNA, but many of these proteins require multiple
copies of one of the many common domains to function.
Diversity
As nuclear RNA emerges from RNA polymerase,
RNA transcripts are immediately covered with RNA-binding proteins that
regulate every aspect of RNA metabolism and function including RNA
biogenesis, maturation, transport, cellular localization and stability.
All RBPs bind RNA, however they do so with different RNA-sequence
specificities and affinities, which allows the RBPs to be as diverse as
their targets and functions. These targets include mRNA, which codes for proteins, as well as a number of functional non-coding RNAs. NcRNAs almost always function as ribonucleoprotein complexes and not as naked RNAs. These non-coding RNAs include microRNAs, small interfering RNAs (siRNA), as well as splicesomal small nuclear RNAs (snRNA).
Function
RNA processing and modification
Alternative splicing
Alternative splicing is a mechanism by which different forms of mature mRNAs (messengers RNAs) are generated from the same gene. It is a regulatory mechanism by which variations in the incorporation of the exons
into mRNA leads to the production of more than one related protein,
thus expanding possible genomic outputs. RBPs function extensively in
the regulation of this process. Some binding proteins such as neuronal
specific RNA-binding proteins, namely NOVA1,
control the alternative splicing of a subset of hnRNA by recognizing
and binding to a specific sequence in the RNA (YCAY where Y indicates
pyrimidine, U or C).[4] These proteins then recruit splicesomal proteins to this target site. SR proteins are also well known for their role in alternative splicing through the recruitment of snRNPs that form the splicesome,
namely U1 snRNP and U2AF snRNP. However, RBPs are also part of the
splicesome itself. The splicesome is a complex of snRNA and protein
subunits and acts as the mechanical agent that removes introns and ligates the flanking exons.[5] Other than core splicesome complex, RBPs also bind to the sites of Cis-acting
RNA elements that influence exons inclusion or exclusion during
splicing. These sites are referred to as exonic splicing enhancers
(ESEs), exonic splicing silencers (ESSs), intronic splicing enhancers
(ISEs) and intronic splicing silencers (ISSs) and depending on their
location of binding, RBPs work as splicing silencers or enhancers.
RNA editing
The most extensively studied form of RNA editing involves the ADAR protein. This protein functions through post-transcriptional modification of mRNA transcripts by changing the nucleotide content of the RNA. This is done through the conversion of adenosine to inosine in an enzymatic reaction catalyzed by ADAR. This process effectively changes the RNA sequence from that encoded by the genome
and extends the diversity of the gene products. The majority of RNA
editing occurs on non-coding regions of RNA; however, some
protein-encoding RNA transcripts have been shown to be subject to
editing resulting in a difference in their protein's amino acid
sequence. An example of this is the glutamate receptor mRNA where
glutamine is converted to arginine leading to a change in the
functionality of the protein.
Polyadenylation
Polyadenylation
is the addition of a "tail" of adenylate residues to an RNA transcript
about 20 bases downstream of the AAUAAA sequence within the three prime untranslated region. Polyadenylation of mRNA has a strong effect on its nuclear transport,
translation efficiency, and stability. All of these as well as the
process of polyadenylation depend on binding of specific RBPs. All
eukaryotic mRNAs with few exceptions are processed to receive 3' poly
(A) tails of about 200 nucleotides. One of the necessary protein
complexes in this process is CPSF. CPSF binds to the 3' tail (AAUAAA) sequence and together with another protein called poly(A)-binding protein, recruits and stimulates the activity of poly(A) polymerase. Poly(A) polymerase is inactive on its own and requires the binding of these other proteins to function properly.
Export
After processing is complete, mRNA needs to be transported from the cell nucleus to cytoplasm.
This is a three-step process involving the generation of a
cargo-carrier complex in the nucleus followed by translocation of the
complex through the nuclear pore complex
and finally release of the cargo into cytoplasm. The carrier is then
subsequently recycled. TAP/NXF1:p15 heterodimer is thought to be the key
player in mRNA export. Over-expression of TAP in Xenopus laevis
frogs increases the export of transcripts that are otherwise
inefficiently exported. However TAP needs adaptor proteins because it is
unable interact directly with mRNA. Aly/REF protein interacts and binds
to the mRNA recruiting TAP.
mRNA localization
mRNA
localization is critical for regulation of gene expression by allowing
spatially regulated protein production. Through mRNA localization
proteins are transcribed in their intended target site of the cell. This
is especially important during early development when rapid cell
cleavages give different cells various combinations of mRNA which can
then lead to drastically different cell fates. RBPs are critical in the
localization of this mRNA that insures proteins are only transcribed in
their intended regions. One of these proteins is ZBP1. ZBP1 binds to beta-actin mRNA at the site of transcription and moves with mRNA into the cytoplasm. It then localizes this mRNA to the lamella region of several asymmetric cell types where it can then be translated.[4]
FMRP is another RBP involved in RNA localization. It was shown that in
addition to other functions for FMRP in RNA metabolism, FMRP is involved
in the stimulus-induced localization of several dendritic mRNAs in
neuronal dendrites.
Translation
Translational
regulation provides a rapid mechanism to control gene expression.
Rather than controlling gene expression at the transcriptional level,
mRNA is already transcribed but the recruitment of ribosomes is
controlled. This allows rapid generation of proteins when a signal
activates translation. ZBP1 in addition to its role in the localization
of B-actin mRNA is also involved in the translational repression of
beta-actin mRNA by blocking translation initiation. ZBP1 must be removed
from the mRNA to allow the ribosome to properly bind and translation to
begin.
Protein–RNA interactions
RNA-binding proteins exhibit highly specific recognition of their RNA targets by recognizing their sequences and structures.
Specific binding of the RNA-binding proteins allow them to distinguish
their targets and regulate a variety of cellular functions via control
of the generation, maturation, and lifespan of the RNA transcript. This
interaction begins during transcription as some RBPs remain bound to RNA
until degradation whereas others only transiently bind to RNA to
regulate RNA splicing, processing, transport, and localization.
In this section, three classes of the most widely studied RNA-binding
domains (RNA-recognition motif, double-stranded RNA-binding motif,
zinc-finger motif) will be discussed.
RNA-recognition motif (RRM)
The RNA recognition motif, which is the most common RNA-binding motif, is a small protein domain of 75–85 amino acids that forms a four-stranded β-sheet
against the two α-helices. This recognition motif exerts its role in
numerous cellular functions, especially in mRNA/rRNA processing,
splicing, translation regulation, RNA export, and RNA stability. Ten
structures of an RRM have been identified through NMR spectroscopy and X-ray crystallography.
These structures illustrate the intricacy of protein–RNA recognition of
RRM as it entails RNA–RNA and protein–protein interactions in addition
to protein–RNA interactions. Despite their complexity, all ten
structures have some common features. All RRMs' main protein surfaces'
four-stranded β-sheet was found to interact with the RNA, which usually
contacts two or three nucleotides in a specific manner. In addition,
strong RNA binding affinity and specificity towards variation are
achieved through an interaction between the inter-domain linker and the
RNA and between RRMs themselves. This plasticity of the RRM explains why
RRM is the most abundant domain and why it plays an important role in
various biological functions.
Double-stranded RNA-binding motif
Double-stranded RNA-binding motif | |||
---|---|---|---|
Identifiers | |||
Symbol | drrm | ||
Pfam | PF14709 | ||
Pfam clan | CL0196 | ||
InterPro | IPR014720 | ||
CATH | 1di2 | ||
SCOPe | 1di2 / SUPFAM | ||
| |||
Use the Pfam clan for the homologous superfamily. |
The double-stranded RNA-binding motif (dsRM, dsRBD), a 70–75 amino-acid domain, plays a critical role in RNA processing, RNA localization, RNA interference, RNA editing,
and translational repression. All three structures of the domain solved
as of 2005 possess uniting features that explain how dsRMs only bind to
dsRNA instead of dsDNA. The dsRMs were found to interact along the RNA
duplex via both α-helices and β1-β2 loop. Moreover, all three dsRBM
structures make contact with the sugar-phosphate backbone of the major
groove and of one minor groove, which is mediated by the β1-β2 loop
along with the N-terminus region of the alpha helix
2. This interaction is a unique adaptation for the shape of an RNA
double helix as it involves 2'-hydroxyls and phosphate oxygen. Despite
the common structural features among dsRBMs, they exhibit distinct
chemical frameworks, which permits specificity for a variety for RNA
structures including stem-loops, internal loops, bulges or helices
containing mismatches.
Zinc fingers
CCHH-type zinc-finger domains are the most common DNA-binding domain within the eukaryotic genome.
In order to attain high sequence-specific recognition of DNA, several
zinc fingers are utilized in a modular fashion. Zinc fingers exhibit ββα
protein fold in which a β-hairpin and a α-helix are joined together via
a Zn2+
ion. Furthermore, the interaction between protein side-chains of the
α-helix with the DNA bases in the major groove allows for the
DNA-sequence-specific recognition. Despite its wide recognition of DNA,
there has been recent discoveries that zinc fingers also have the
ability to recognize RNA. In addition to CCHH zinc fingers, CCCH zinc
fingers were recently discovered to employ sequence-specific recognition
of single-stranded RNA through an interaction between intermolecular hydrogen bonds
and Watson-Crick edges of the RNA bases. CCHH-type zinc fingers employ
two methods of RNA binding. First, the zinc fingers exert non-specific
interaction with the backbone of a double helix
whereas the second mode allows zinc fingers to specifically recognize
the individual bases that bulge out. Differing from the CCHH-type, the
CCCH-type zinc finger displays another mode of RNA binding, in which
single-stranded RNA is identified in a sequence-specific manner.
Overall, zinc fingers can directly recognize DNA via binding to dsDNA
sequence and RNA via binding to ssRNA sequence.
Role in embryonic development
RNA-binding proteins' transcriptional and post-transcriptional regulation of RNA has a role in regulating the patterns of gene expression during development. Extensive research on the nematode C. elegans has identified RNA-binding proteins as essential factors during germline and early embryonic development. Their specific function involves the development of somatic tissues (neurons, hypodermis, muscles
and excretory cells) as well as providing timing cues for the
developmental events. Nevertheless, it is exceptionally challenging to
discover the mechanism behind RBPs' function in development due to the
difficulty in identifying their RNA targets. This is because most RBPs
usually have multiple RNA targets. However, it is indisputable that RBPs exert a critical control in regulating developmental pathways in a concerted manner.
Germline development
In Drosophila melanogaster, Elav, Sxl and tra-2 are RNA-binding protein encoding genes that are critical in the early sex determination and the maintenance of the somatic sexual state. These genes impose effects on the post-transcriptional level by regulating sex-specific splicing in Drosophila. Sxl exerts positive regulation of the feminizing gene tra to produce a functional tra mRNA in females. In C. elegans,
RNA-binding proteins including FOG-1, MOG-1/-4/-5 and RNP-4 regulate
germline and somatic sex determination. Furthermore, several RBPs such
as GLD-1, GLD-3, DAZ-1, PGL-1 and OMA-1/-2 exert their regulatory
functions during meiotic prophase progression, gametogenesis, and oocyte maturation.
Somatic development
In
addition to RBPs' functions in germline development,
post-transcriptional control also plays a significant role in somatic
development. Differing from RBPs that are involved in germline and early
embryo development, RBPs functioning in somatic development regulate
tissue-specific alternative splicing of the mRNA targets. For instance,
MEC-8 and UNC-75 containing RRM domains localize to regions of
hypodermis and nervous system, respectively.
Furthermore, another RRM-containing RBP, EXC-7, is revealed to localize
in embryonic excretory canal cells and throughout the nervous system
during somatic development.
Neuronal development
ZBP1 was shown to regulate dendritogenesis (dendrite formation) in hippocampal neurons. Other RNA-binding proteins involved in dendrite formation are Pumilio and Nanos, FMRP, CPEB and Staufen 1
Role in cancer
RBPs are emerging to play a crucial role in tumor development.
Hundreds of RBPs are markedly dysregulated across human cancers and
showed predominant downregulation in tumors related to normal tissues. Many RBPs are differentially expressed in different cancer types for example KHDRBS1(Sam68), ELAVL1(HuR), FXR1.
For some RBPs, the change in expression are related with Copy Number
Variations (CNV), for example CNV gains of BYSL in colorectal cancer
cells.
and ESRP1, CELF3 in breast cancer, RBM24 in liver cancer, IGF2BP2,
IGF2BP3 in lung cancer or CNV losses of KHDRBS2 in lung cancer. Some expression changes are cause due to protein affecting mutations on these RBPs for example NSUN6, ZC3H13, ELAC1, RBMS3, and ZGPAT, SF3B1, SRSF2, RBM10, U2AF1, SF3B1, PPRC1, RBMXL1, HNRNPCL1 etc. Several studies have related this change in expression of RBPs to aberrant alternative splicing in cancer.
Current research
As RNA-binding proteins exert significant control over numerous
cellular functions, they have been a popular area of investigation for
many researchers. Due to its importance in the biological field,
numerous discoveries regarding RNA-binding proteins' potentials have
been recently unveiled.
Recent development in experimental identification of RNA-binding
proteins has extended the number of RNA-binding proteins significantly.
RNA-binding protein Sam68 controls the spatial and temporal compartmentalization of RNA metabolism to attain proper synaptic function in dendrites. Loss of Sam68 results in abnormal posttranscriptional regulation and ultimately leads to neurological disorders such as fragile X-associated tremor/ataxia syndrome. Sam68 was found to interact with the mRNA encoding β-actin, which regulates the synaptic formation of the dendritic spines with its cytoskeletal
components. Therefore, Sam68 plays a critical role in regulating
synapse number via control of postsynaptic β-actin mRNA metabolism.
Neuron-specific CELF family RNA-binding protein UNC-75 specifically
binds to the UUGUUGUGUUGU mRNA stretch via its three RNA recognition
motifs for the exon 7a selection in C. elegans' neuronal cells.
As exon 7a is skipped due to its weak splice sites in non-neuronal
cells, UNC-75 was found to specifically activate splicing between exon
7a and exon 8 only in the neuronal cells.
The cold inducible RNA binding protein CIRBP plays a role in controlling the cellular response upon confronting a variety of cellular stresses, including short wavelength ultraviolet light, hypoxia, and hypothermia. This research yielded potential implications for the association of disease states with inflammation.
Serine-arginine family of RNA-binding protein Slr1 was found exert control on the polarized growth in Candida albicans. Slr1 mutations in mice results in decreased filamentation and reduces damage to epithelial and endothelial cells
that leads to extended survival rate compared to the Slr1 wild-type
strains. Therefore, this research reveals that SR-like protein Slr1
plays a role in instigating the hyphal formation and virulence in C. albicans.