Chemical biology is a scientific discipline spanning the fields of chemistry and biology. The discipline involves the application of chemical techniques, analysis, and often small molecules produced through synthetic chemistry, to the study and manipulation of biological systems. In contrast to biochemistry, which involves the study of the chemistry of biomolecules and regulation of biochemical pathways within and between cells, chemical biology deals with chemistry applied to biology.
Introduction
Some forms of chemical biology attempt to answer biological questions by directly probing living systems at the chemical level. In contrast to research using biochemistry, genetics, or molecular biology, where mutagenesis can provide a new version of the organism, cell, or biomolecule of interest, chemical biology probes systems in vitro and in vivo with small molecules that have been designed for a specific purpose or identified on the basis of biochemical or cell-based screening (see chemical genetics).Chemical biology is one of several interdisciplinary sciences that tend to differ from older, reductionist fields and whose goals are to achieve a description of scientific holism. Chemical biology has scientific, historical and philosophical roots in medicinal chemistry, supramolecular chemistry, bioorganic chemistry, pharmacology, genetics, biochemistry, and metabolic engineering.
Systems of interest
Proteomics
Proteomics investigates the proteome, the set of expressed proteins at a given time under defined conditions. As a discipline, proteomics has moved past rapid protein identification and has developed into a biological assay for quantitative analysis of complex protein samples by comparing protein changes in differently perturbed systems.[1] Current goals in proteomics include determining protein sequences, abundance and any post-translational modifications. Also of interest are protein–protein interactions, cellular distribution of proteins and understanding protein activity. Another important aspect of proteomics is the advancement of technology to achieve these goals.Protein levels, modifications, locations, and interactions are complex and dynamic properties. With this complexity in mind, experiments need to be carefully designed to answer specific questions especially in the face of the massive amounts of data that are generated by these analyses. The most valuable information comes from proteins that are expressed differently in a system being studied. These proteins can be compared relative to each other using quantitative proteomics, which allows a protein to be labeled with a mass tag. Proteomic technologies must be sensitive and robust, it is for these reasons, the mass spectrometer has been the workhorse of protein analysis. The high precision of mass spectrometry can distinguish between closely related species and species of interest can be isolated and fragmented within the instrument. Its applications to protein analysis was only possible in the late 1980s with the development of protein and peptide ionization with minimal fragmentation. These breakthroughs were ESI and MALDI. Mass spectrometry technologies are modular and can be chosen or optimized to the system of interest.
Chemical biologists are poised to impact proteomics through the development of techniques, probes and assays with synthetic chemistry for the characterization of protein samples of high complexity. These approaches include the development of enrichment strategies, chemical affinity tags and probes.
Enrichment techniques
Samples for Proteomics contain a myriad of peptide sequences, the sequence of interest may be highly represented or of low abundance. However, for successful MS analysis the peptide should be enriched within the sample. Reduction of sample complexity is achieved through selective enrichment using affinity chromatography techniques. This involves targeting a peptide with a distinguishing feature like a biotin label or a post translational modification.[2] Interesting methods have been developed that include the use of antibodies, lectins to capture glycoproteins, immobilized metal ions to capture phosphorylated peptides and suicide enzyme substrates to capture specific enzymes. Here, chemical biologists can develop reagents to interact with substrates, specifically and tightly, to profile a targeted functional group on a proteome scale. Development of new enrichment strategies is needed in areas like non-ser/thr/tyr phosphorylation sites and other post translational modifications. Other methods of decomplexing samples relies on upstream chromatographic separations.Affinity tags
Chemical synthesis of affinity tags has been crucial to the maturation of quantitative proteomics. iTRAQ, Tandem mass tags (TMT) and Isotope-coded affinity tag (ICAT) are protein mass-tags that consist of a covalently attaching group, a mass (isobaric or isotopic) encoded linker and a handle for isolation. Varying mass-tags bind to different proteins as a sort of footprint such that when analyzing cells of differing perturbations, the levels of each protein can be compared relatively after enrichment by the introduced handle. Other methods include SILAC and heavy isotope labeling. These methods have been adapted to identify complexing proteins by labeling a bait protein, pulling it down and analyzing the proteins it has complexed.[3] Another method creates an internal tag by introducing novel amino acids that are genetically encoded in prokaryotic and eukaryotic organisms. These modifications create a new level of control and can facilitate photocrosslinking to probe protein–protein interactions.[4] In addition, keto, acetylene, azide, thioester, boronate, and dehydroalanine- containing amino acids can be used to selectively introduce tags, and novel chemical functional groups into proteins.[5]Enzyme probes
To investigate enzymatic activity as opposed to total protein, activity-based reagents have been developed to label the enzymatically active form of proteins (see Activity-based proteomics). For example, serine hydrolase- and cysteine protease-inhibitors have been converted to suicide inhibitors.[6] This strategy enhances the ability to selectively analyze low abundance constituents through direct targeting. Structures that mimic these inhibitors could be introduced with modifications that will aid proteomic analysis- like an identification handle or mass tag.[7] Enzyme activity can also be monitored through converted substrate.[8] This strategy relies on using synthetic substrate conjugates that contain moieties that are acted upon by specific enzymes. The product conjugates are then captured by an affinity reagent and analyzed. The measured concentration of product conjugate allow the determination of the enzyme velocity. Other factors such as temperature, enzyme concentration and substrate concentration can be visualized.[9] Identification of enzyme substrates (of which there may be hundreds or thousands, many of which unknown) is a problem of significant difficulty in proteomics and is vital to the understanding of signal transduction pathways in cells; techniques for labelling cellular substrates of enzymes is an area chemical biologists can address. A method that has been developed uses "analog-sensitive" kinases to label substrates using an unnatural ATP analog, facilitating visualization and identification through a unique handle.[10]Glycobiology
While DNA, RNA and proteins are all encoded at the genetic level, there exists a separate system of trafficked molecules in the cell that are not encoded directly at any direct level: sugars. Thus, glycobiology is an area of dense research for chemical biologists. For instance, live cells can be supplied with synthetic variants of natural sugars in order to probe the function of the sugars in vivo. Carolyn Bertozzi, previously at University of California, Berkeley, has developed a method for site-specifically reacting molecules the surface of cells that have been labeled with synthetic sugars.Combinatorial chemistry
Chemical biologists used automated synthesis of many diverse compounds in order to experiment with effects of small molecules on biological processes. More specifically, they observe changes in the behaviors of proteins when small molecules bind to them. Such experiments may supposedly lead to discovery of small molecules with antibiotic or chemotherapeutic properties. These approaches are identical to those employed in the discipline of pharmacology.Molecular sensing
Chemical biologists are also interested in developing new small-molecule and biomolecule-based tools to study biological processes, often by molecular imaging techniques.[11] The field of molecular sensing was popularized by Roger Tsien's work developing calcium-sensing fluorescent compounds as well as pioneering the use of GFP, for which he was awarded the 2008 Nobel Prize in Chemistry.[12] Today, researchers continue to utilize basic chemical principles to develop new compounds for the study of biological metabolites and processes.Employing biology
Many research programs are also focused on employing natural biomolecules to perform a task or act as support for a new chemical method or material. In this regard, researchers have shown that DNA can serve as a template for synthetic chemistry, self-assembling proteins can serve as a structural scaffold for new materials, and RNA can be evolved in vitro to produce new catalytic function.Protein misfolding and aggregation as a cause of disease
A common form of aggregation is long, ordered spindles called amyloid fibrils that are implicated in Alzheimer’s disease and that have been shown to consist of cross-linked beta sheet regions perpendicular to the backbone of the polypeptide.[13] Another form of aggregation occurs with prion proteins, the glycoproteins found with Creutzfeldt–Jakob disease and bovine spongiform encephalopathy. In both structures, aggregation occurs through hydrophobic interactions and water must be excluded from the binding surface before aggregation can occur.[14] A movie of this process can be seen in "Chemical and Engineering News".[15] The diseases associated with misfolded proteins are life-threatening and extremely debilitating, which makes them an important target for chemical biology research.Through the transcription and translation process, DNA encodes for specific sequences of amino acids. The resulting polypeptides fold into more complex secondary, tertiary, and quaternary structures to form proteins. Based on both the sequence and the structure, a particular protein is conferred its cellular function. However, sometimes the folding process fails due to mutations in the genetic code and thus the amino acid sequence or due to changes in the cell environment (e.g. pH, temperature, reduction potential, etc.). Misfolding occurs more often in aged individuals or in cells exposed to a high degree of oxidative stress, but a fraction of all proteins misfold at some point even in the healthiest of cells.
Normally when a protein does not fold correctly, molecular chaperones in the cell can encourage refolding back into its active form. When refolding is not an option, the cell can also target the protein for degradation back into its component amino acids via proteolytic, lysosomal, or autophagic mechanisms. However, under certain conditions or with certain mutations, the cells can no longer cope with the misfolded protein(s) and a disease state results. Either the protein has a loss-of-function, such as in cystic fibrosis, in which it loses activity or cannot reach its target, or the protein has a gain-of-function, such as with Alzheimer's disease, in which the protein begins to aggregate causing it to become insoluble and non-functional.
Protein misfolding has previously been studied using both computational approaches as well as in vivo biological assays in model organisms such as Drosophila melanogaster and C. elegans. Computational models use a de novo process to calculate possible protein structures based on input parameters such as amino acid sequence, solvent effects, and mutations. This method has the shortcoming that the cell environment has been drastically simplified, which limits the factors that influence folding and stability. On the other hand, biological assays can be quite complicated to perform in vivo with high-throughput like efficiency and there always remains the question of how well lower organism systems approximate human systems.
Dobson et al. propose combining these two approaches such that computational models based on the organism studies can begin to predict what factors will lead to protein misfolding.[16] Several experiments have already been performed based on this strategy. In experiments on Drosophila, different mutations of beta amyloid peptides were evaluated based on the survival rates of the flies as well as their motile ability. The findings from the study show that the more a protein aggregates, the more detrimental the neurological dysfunction.[16][17][18] Further studies using transthyretin, a component of cerebrospinal fluid that binds to beta amyloid peptide deterring aggregation but can itself aggregate especially when mutated, indicate that aggregation prone proteins may not aggregate where they are secreted and rather are deposited in specific organs or tissues based on each mutation.[19] Kelly et al. have shown that the more stable, both kinetically and thermodynamically, a misfolded protein is the more likely the cell is to secrete it from the endoplasmic reticulum rather than targeting the protein for degradation.[20] In addition, the more stress that a cell feels from misfolded proteins the more probable new proteins will misfold.[21] These experiments as well as others having begun to elucidate both the intrinsic and extrinsic causes of misfolding as well as how the cell recognizes if proteins have folded correctly.
As more information is obtained on how the cell copes with misfolded proteins, new therapeutic strategies begin to emerge. An obvious path would be prevention of misfolding. However, if protein misfolding cannot be avoided, perhaps the cell's natural mechanisms for degradation can be bolstered to better deal with the proteins before they begin to aggregate.[22] Before these ideas can be realized, many more experiments need to be done to understand the folding and degradation machinery as well as what factors lead to misfolding. More information about protein misfolding and how it relates to disease can be found in the recently published book by Dobson, Kelly, and Rameriz-Alvarado entitled Protein Misfolding Diseases Current and Emerging Principles and Therapies.[23]
Chemical synthesis of peptides
In contrast to the traditional biotechnological practice of obtaining peptides or proteins by isolation from cellular hosts through cellular protein production, advances in chemical techniques for the synthesis and ligation of peptides has allowed for the total synthesis of some peptides and proteins. Chemical synthesis of proteins is a valuable tool in chemical biology as it allows for the introduction of non-natural amino acids as well as residue specific incorporation of "posttranslational modifications" such as phosphorylation, glycosylation, acetylation, and even ubiquitination. These capabilities are valuable for chemical biologists as non-natural amino acids can be used to probe and alter the functionality of proteins, while post translational modifications are widely known to regulate the structure and activity of proteins. Although strictly biological techniques have been developed to achieve these ends, the chemical synthesis of peptides often has a lower technical and practical barrier to obtaining small amounts of the desired protein. Given the widely recognized importance of proteins as cellular catalysts and recognition elements, the ability to precisely control the composition and connectivity of polypeptides is a valued tool in the chemical biology community and is an area of active research.While chemists have been making peptides for over 100 years,[24] the ability to efficiently and quickly synthesize short peptides came of age with the development of Bruce Merrifield's solid phase peptide synthesis (SPPS). Prior to the development of SPPS, the concept of step-by-step polymer synthesis on an insoluble support was without chemical precedent.[25] The use of a covalently bound insoluble polymeric support greatly simplified the process of peptide synthesis by reducing purification to a simple "filtration and wash" procedure and facilitated a boom in the field of peptide chemistry. The development and "optimization" of SPPS took peptide synthesis from the hands of the specialized peptide synthesis community and put it into the hands of the broader chemistry, biochemistry, and now chemical biology community. SPPS is still the method of choice for linear synthesis of polypeptides up to 50 residues in length[25] and has been implemented in commercially available automated peptide synthesizers. One inherent shortcoming in any procedure that calls for repeated coupling reactions is the buildup of side products resulting from incomplete couplings and side reactions. This places the upper bound for the synthesis of linear polypeptide lengths at around 50 amino acids, while the "average" protein consists of 250 amino acids.[24] Clearly, there was a need for development of "non-linear" methods to allow synthetic access to the average protein.
Although the shortcomings of linear SPPS were recognized not long after its inception, it took until the early 1990s for effective methodology to be developed to ligate small peptide fragments made by SPPS, into protein sized polypeptide chains (for recent review of peptide ligation strategies, see review by Dawson et al.[26]). The oldest and best developed of these methods is termed native chemical ligation. Native chemical ligation was unveiled in a 1994 paper from the laboratory of Stephen B. H. Kent.[27] Native chemical ligation involves the coupling of a C-terminal thioester and an N-terminal cysteine residue, ultimately resulting in formation of a "native" amide bond. Further refinements in native chemical ligation have allowed for kinetically controlled coupling of multiple peptide fragments, allowing access to moderately sized peptides such as an HIV-protease dimer[28] and human lysozyme.[29] Even with the successes and attractive features of native chemical ligation, there are still some drawbacks in the utilization of this technique. Some of these drawbacks include the installation and preservation of a reactive C-terminal thioester, the requirement of an N-terminal cysteine residue (which is the second-least-common amino acid in proteins),[30] and the requirement for a sterically unincumbering C-terminal residue.
Other strategies that have been used for the ligation of peptide fragments using the acyl transfer chemistry first introduced with native chemical ligation include expressed protein ligation,[31] sulfurization/desulfurization techniques,[32] and use of removable thiol auxiliaries.[33]
Expressed protein ligation allows for the biotechnological installation of a C-terminal thioester using intein biochemistry, thereby allowing the appendage of a synthetic N-terminal peptide to the recombinantly produced C-terminal portion. This technique allows for access to much larger proteins, as only the N-terminal portion of the resulting protein has to be chemically synthesized. Both sulfurization/desulfurization techniques and the use of removable thiol auxiliaries involve the installation of a synthetic thiol moiety to carry out the standard native chemical ligation chemistry, followed by removal of the auxiliary/thiol. These techniques help to overcome the requirement of an N-terminal cysteine needed for standard native chemical ligation, although the steric requirements for the C-terminal residue are still limiting.
A final category of peptide ligation strategies include those methods not based on native chemical ligation type chemistry. Methods that fall in this category include the traceless Staudinger ligation,[34] azide-alkyne dipolar cycloadditions,[35] and imine ligations.[36]
Major contributors in this field today include Stephen B. H. Kent, Philip E. Dawson, and Tom W. Muir, as well as many others involved in methodology development and applications of these strategies to biological problems.
Protein design by directed evolution
One of the primary goals of protein engineering is the design of novel peptides or proteins with a desired structure and chemical activity. Because our knowledge of the relationship between primary sequence, structure, and function of proteins is limited, rational design of new proteins with enzymatic activity is extremely challenging. Directed evolution, repeated cycles of genetic diversification followed by a screening or selection process, can be used to mimic Darwinian evolution in the laboratory to design new proteins with a desired activity.[37]Several methods exist for creating large libraries of sequence variants. Among the most widely used are subjecting DNA to UV radiation or chemical mutagens, error-prone PCR, degenerate codons, or recombination.[38][39] Once a large library of variants is created, selection or screening techniques are used to find mutants with a desired attribute. Common selection/screening techniques include fluorescence-activated cell sorting (FACS),[40] mRNA display,[41] phage display, or in vitro compartmentalization.[42] Once useful variants are found, their DNA sequence is amplified and subjected to further rounds of diversification and selection. Since only proteins with the desired activity are selected, multiple rounds of directed evolution lead to proteins with an accumulation beneficial traits.
There are two general strategies for choosing the starting sequence for a directed evolution experiment: de novo design and redesign. In a protein design experiment, an initial sequence is chosen at random and subjected to multiple rounds of directed evolution. For example, this has been employed successfully to create a family of ATP-binding proteins with a new folding pattern not found in nature.[43] Random sequences can also be biased towards specific folds by specifying the characteristics (such as polar vs. nonpolar) but not the specific identity of each amino acid in a sequence. Among other things, this strategy has been used to successfully design four-helix bundle proteins.[44][45] Because it is often thought that a well-defined structure is required for activity, biasing a designed protein towards adopting a specific folded structure is likely to increase the frequency of desirable variants in constructed libraries.
In a protein redesign experiment, an existing sequence serves as the starting point for directed evolution. In this way, old proteins can be redesigned for increased activity or new functions. Protein redesign has been used for protein simplification, creation of new quaternary structures, and topological redesign of a chorismate mutase.[38][46][47] To develop enzymes with new activities, one can take advantage of promiscuous enzymes or enzymes with significant side reactions. In this regard, directed evolution has been used on γ-humulene synthase, an enzyme that creates over 50 different sesquiterpenes, to create enzymes that selectively synthesize individual products.[48] Similarly, completely new functions can be selected for from existing protein scaffolds. In one example of this, an RNA ligase was created from a zinc finger scaffold after 17 rounds of directed evolution. This new enzyme catalyzes a chemical reaction not known to be catalyzed by any natural enzyme.[49]
Computational methods, when combined with experimental approaches, can significantly assist both the design and redesign of new proteins through directed evolution. Computation has been used to design proteins with unnatural folds, such as a right-handed coiled coil.[50] These computational approaches could also be used to redesign proteins to selectively bind specific target molecules. By identifying lead sequences using computational methods, the occurrence of functional proteins in libraries can be dramatically increased before any directed evolution experiments in the laboratory.
Manfred T. Reetz, Frances Arnold, Donald Hilvert, and Jack W. Szostak are significant researchers in this field.
Biocompatible click cycloaddition reactions in chemical biology
Recent advances in technology have allowed scientists to view substructures of cells at levels of unprecedented detail. Unfortunately these "aerial" pictures offer little information about the mechanics of the biological system in question. To be fully effective, precise imaging systems require a complementary technique that better elucidates the machinery of a cell. By attaching tracking devices (optical probes) to biomolecules in vivo, one can learn far more about cell metabolism, molecular transport, cell-cell interactions and many other processes[51]Bioorthogonal reactions
Successful labeling of a molecule of interest requires specific functionalization of that molecule to react chemospecifically with an optical probe. For a labeling experiment to be considered robust, that functionalization must minimally perturb the system.[52] Unfortunately, these requirements can often be extremely hard to meet. Many of the reactions normally available to organic chemists in the laboratory are unavailable in living systems. Water- and redox- sensitive reactions would not proceed, reagents prone to nucleophilic attack would offer no chemospecificity, and any reactions with large kinetic barriers would not find enough energy in the relatively low-heat environment of a living cell. Thus, chemists have recently developed a panel of bioorthogonal chemistry that proceed chemospecifically, despite the milieu of distracting reactive materials in vivo.Design of bioorthogonal reagents and bioorthogonal chemical reporters
The coupling of an optical probe to a molecule of interest must occur within a reasonably short time frame; therefore, the kinetics of the coupling reaction should be highly favorable. Click chemistry is well suited to fill this niche, since click reactions are, by definition, rapid, spontaneous, selective, and high-yielding.[53] Unfortunately, the most famous "click reaction," a [3+2] cycloaddition between an azide and an acyclic alkyne, is copper-catalyzed, posing a serious problem for use in vivo due to copper's toxicity.[54]The issue of copper toxicity can be alleviated using copper-chelating ligands, enabling copper-catalyzed labeling of the surface of live cells.[55]
To bypass the necessity for a catalyst, the lab of Dr. Carolyn Bertozzi introduced inherent strain into the alkyne species by using a cyclic alkyne. In particular, cyclooctyne reacts with azido-molecules with distinctive vigor.[56] Further optimization of the reaction led to the use of difluorinated cyclooctynes (DIFOs), which increased yield and reaction rate.[57] Other coupling partners discovered by separate labs to be analogous to cyclooctynes include trans cyclooctene,[58] norbornene,[59] and a cyclobutene-functionalized molecule.[60]
Use in biological systems
As mentioned above, the use of bioorthogonal reactions to tag biomolecules requires that one half of the reactive "click" pair is installed in the target molecule, while the other is attached to an optical probe. When the probe is added to a biological system, it will selectively conjugate with the target molecule.The most common method of installing bioorthogonal reactivity into a target biomolecule is through metabolic labeling. Cells are immersed in a medium where access to nutrients is limited to synthetically modified analogues of standard fuels such as sugars. As a consequence, these altered biomolecules are incorporated into the cells in the same manner as their wild-type brethren. The optical probe is then incorporated into the system to image the fate of the altered biomolecules. Other methods of functionalization include enzymatically inserting azides into proteins,[61] and synthesizing phospholipids conjugated to cyclooctynes.[62]
Future directions
As these bioorthogonal reactions are further optimized, they will likely be used for increasingly complex interactions involving multiple different classes of biomolecules. More complex interactions have a smaller margin for error, so increased reaction efficiency is paramount to continued success in optically probing cellular machinery. Also, by minimizing side reactions, the experimental design of a minimally perturbed living system is closer to being realized.Discovery of biomolecules through metagenomics
The advances in modern sequencing technologies in the late 1990s allowed scientists to investigate DNA of communities of organisms in their natural environments, so-called "eDNA", without culturing individual species in the lab. This metagenomic approach enabled scientists to study a wide selection of organisms that were previously not characterized due in part to an incompetent growth condition. These sources of eDNA include, but are not limited to, soils, ocean, subsurface, hot springs, hydrothermal vents, polar ice caps, hypersaline habitats, and extreme pH environments.[63] Of the many applications of metagenomics, chemical biologists and microbiologists such as Jo Handelsman, Jon Clardy, and Robert M. Goodman who are pioneers of metagenomics, explored metagenomic approaches toward the discovery of biologically active molecules such as antibiotics.[64]Functional or homology screening strategies have been used to identify genes that produce small bioactive molecules. Functional metagenomic studies are designed to search for specific phenotypes that are associated with molecules with specific characteristics. Homology metagenomic studies, on the other hand, are designed to examine genes to identify conserved sequences that are previously associated with the expression of biologically active molecules.[65]
Functional metagenomic studies enable scientists to discover novel genes that encode biologically active molecules. These assays include top agar overlay assays where antibiotics generate zones of growth inhibition against test microbes, and pH assays that can screen for pH change due to newly synthesized molecules using pH indicator on an agar plate.[66] Substrate-induced gene expression screening (SIGEX), a method to screen for the expression of genes that are induced by chemical compounds, has also been used to search genes with specific functions.[66] These led to the discovery and isolation of several novel proteins and small molecules. For example, the Schipper group identified three eDNA derived AHL lactonases that inhibit biofilm formation of Pseudomonas aeruginosa via functional metagenomic assays.[67] However, these functional screening methods require a good design of probes that detect molecules being synthesized and depend on the ability to express metagenomes in a host organism system.[66]
In contrast, homology metagenomic studies led to a faster discovery of genes that have homologous sequences as the previously known genes that are responsible for the biosynthesis of biologically active molecules. As soon as the genes are sequenced, scientists can compare thousands of bacterial genomes simultaneously.[65] The advantage over functional metagenomic assays is that homology metagenomic studies do not require a host organism system to express the metagenomes, thus this method can potentially save the time spent on analyzing nonfunctional genomes. These also led to the discovery of several novel proteins and small molecules. For example, Banik et al. screened for clones containing genes associated with the synthesis of teicoplanin and vancomycin-like glycopeptide antibiotics and found two new biosynthetic gene clusters.[68] In addition, an in silico examination from the Global Ocean Metagenomic Survey found 20 new lantibiotic cyclases.[69]
There are challenges to metagenomic approaches to discover new biologically active molecules. Only 40% of enzymatic activities present in a sample can be expressed in E. coli..[70] In addition, the purification and isolation of eDNA is essential but difficult when the sources of obtained samples are poorly understood. However, collaborative efforts from individuals from diverse fields including bacterial genetics, molecular biology, genomics, bioinformatics, robots, synthetic biology, and chemistry can solve this problem together and potentially lead to the discovery of many important biologically active molecules.[65]
Protein phosphorylation
Posttranslational modification of proteins with phosphate groups has proven to be a key regulatory step throughout all biological systems. Phosphorylation events, either phosphorylation by protein kinases or dephosphorylation by phosphatases, result in protein activation or deactivation. These events have an immense impact on the regulation of physiological pathways, which makes the ability to dissect and study these pathways integral to understanding the details of cellular processes. There exist a number of challenges—namely the sheer size of the phosphoproteome, the fleeting nature of phosphorylation events and related physical limitations of classical biological and biochemical techniques—that have limited the advancement of knowledge in this area. A recent review[71] provides a detailed examination of the impact of newly developed chemical approaches to dissecting and studying biological systems both in vitro and in vivo.Through the use of a number of classes of small molecule modulators of protein kinases, chemical biologists have been able to gain a better understanding of the effects of protein phosphorylation. For example, nonselective and selective kinase inhibitors, such as a class of pyridinylimidazole compounds described by Wilson, et al.,[72] are potent inhibitors useful in the dissection of MAP kinase signaling pathways. These pyridinylimidazole compounds function by targeting the ATP binding pocket. Although this approach, as well as related approaches,[73][74] with slight modifications, has proven effective in a number of cases, these compounds lack adequate specificity for more general applications. Another class of compounds, mechanism-based inhibitors, combines detailed knowledge of the chemical mechanism of kinase action with previously utilized inhibition motifs. For example, Parang, et al. describe the development of a "bisubstrate analog" that inhibits kinase action by binding both the conserved ATP binding pocket and a protein/peptide recognition site on the specific kinase.[75] While there is no published in vivo data on compounds of this type, the structural data acquired from in vitro studies have expanded the current understanding of how a number of important kinases recognize target substrates. Interestingly, many research groups utilized ATP analogs as a chemical probe to study kinases and identify their substrates.[76][77][78]
The development of novel chemical means of incorporating phosphomimetics into proteins has provided important insight into the effects of phosphorylation events. Historically, phosphorylation events have been studied by mutating an identified phosphorylation site (serine, threonine or tyrosine) to an amino acid, such as alanine, that cannot be phosphorylated. While this approach has been successful in some cases, mutations are permanent in vivo and can have potentially detrimental effects on protein folding and stability. Thus, chemical biologists have developed new ways of investigating protein phosphorylation. By installing phospho-serine, phospho-threonine or analogous phosphonate mimics into native proteins, researchers are able to perform in vivo studies to investigate the effects of phosphorylation by extending the amount of time a phosphorylation event occurs while minimizing the often-unfavorable effects of mutations. Protein semisynthesis, or more specifically expressed protein ligation (EPL), has proven to be successful techniques for synthetically producing proteins that contain phosphomimetic molecules at either the C- or the N-terminus.[31] In addition, researchers have built upon an established technique in which one can insert an unnatural amino acid into a peptide sequence by charging synthetic tRNA that recognizes a nonsense codon with an unnatural amino acid.[79] Recent developments indicate that this technique can also be employed in vivo, although, due to permeability issues, these in vivo experiments using phosphomimetic molecules have not yet been possible.[80]
Advances in chemical biology have also improved upon classical techniques of imaging kinase action. For example, the development of peptide biosensors—peptides containing incorporated fluorophore molecules—allowed for improved temporal resolution in in vitro binding assays.[81] Experimental limitations, however, prevent this technique from being effectively used in vivo. One of the most useful techniques to study kinase action is Fluorescence Resonance Energy Transfer (FRET). To utilize FRET for phosphorylation studies, fluorescent proteins are coupled to both a phosphoamino acid binding domain and a peptide that can by phosphorylated. Upon phosphorylation or dephosphorylation of a substrate peptide, a conformational change occurs that results in a change in fluorescence.[82] FRET has also been used in tandem with Fluorescence Lifetime Imaging Microscopy (FLIM)[83] or fluorescently conjugated antibodies and flow cytometry[84] to provide a detailed, specific, quantitative results with excellent temporal and spatial resolution.
Through the augmentation of classical biochemical methods as well as the development of new tools and techniques, chemical biologists have improved accuracy and precision in the study of protein phosphorylation.
Chemical approaches to stem-cell biology
Advances in stem-cell biology have typically been driven by discoveries in molecular biology and genetics. These have included optimization of culture conditions for the maintenance and differentiation of pluripotent and multipotent stem-cells and the deciphering of signaling circuits that control stem-cell fate. However, chemical approaches to stem-cell biology have recently received increased attention due to the identification of several small molecules capable of modulating stem-cell fate in vitro.[85] A small molecule approach offers particular advantages over traditional methods in that it allows a high degree of temporal control, since compounds can be added or removed at will, and tandem inhibition/activation of multiple cellular targets.Small molecules that modulate stem-cell behavior are commonly identified in high-throughput screens. Libraries of compounds are screened for the induction of a desired phenotypic change in cultured stem-cells. This is usually observed through activation or repression of a fluorescent reporter or by detection of specific cell surface markers by FACS or immunohistochemistry. Hits are then structurally optimized for activity by the synthesis and screening of secondary libraries. The cellular targets of the small molecule can then be identified by affinity chromatography, mass spectrometry, or DNA microarray.
A trademark of pluripotent stem-cells, such as embryonic stem-cells (ESCs), is the ability to self-renew indefinitely. The conventional use of feeder cells and various exogenous growth factors in the culture of ESCs presents a problem in that the resulting highly variable culture conditions make the long-term expansion of un-differentiated ESCs challenging.[86] Ideally, chemically defined culture conditions could be developed to maintain ESCs in a pluripotent state indefinitely. Toward this goal, the Schultz and Ding labs at the Scripps Research Institute identified a small molecule that can preserve the long-term self-renewal of ESCs in the absence of feeder cells and other exogenous growth factors.[87] This novel molecule, called pluripotin, was found to simultaneously inhibit multiple differentiation inducing pathways.
The utility of stem-cells is in their ability to differentiate into all cell types that make up an organism. Differentiation can be achieved in vitro by favoring development toward a particular cell type through the addition of lineage specific growth factors, but this process is typically non-specific and generates low yields of the desired phenotype. Alternatively, inducing differentiation by small molecules is advantageous in that it allows for the development of completely chemically defined conditions for the generation of one specific cell type. A small molecule, neuropathiazol, has been identified which can specifically direct differentiation of multipotent neural stem cells into neurons.[88] Neuropathiazol is so potent that neurons develop even in conditions that normally favor the formation of glial cells, a powerful demonstration of controlling differentiation by chemical means.
Because of the ethical issues surrounding ESC research, the generation of pluripotent cells by reprogramming existing somatic cells into a more "stem-like" state is a promising alternative to the use of standard ESCs. By genetic approaches, this has recently been achieved in the creation of ESCs by somatic cell nuclear transfer[89] and the generation of induced pluripotent stem-cells by viral transduction of specific genes.[90] From a therapeutic perspective, reprogramming by chemical means would be safer than genetic methods because induced stem-cells would be free of potentially dangerous transgenes.[91] Several examples of small molecules that can de-differentiate somatic cells have been identified. In one report, lineage-committed myoblasts were treated with a compound, named reversine, and observed to revert to a more stem-like phenotype.[92] These cells were then shown to be capable of differentiating into osteoblasts and adipocytes under appropriate conditions.[93]
Stem-cell therapies are currently the most promising treatment for many degenerative diseases. Chemical approaches to stem-cell biology support the development of cell-based therapies by enhancing stem-cell growth, maintenance, and differentiation in vitro. Small molecules that have been shown to modulate stem-cell fate are potential therapeutic candidates and provide a natural lean-in to pre-clinical drug development. Small molecule drugs could promote endogenous stem-cells to differentiate, replacing previously damaged tissues and thereby enhancing the body's own regenerative ability. Further investigation of molecules that modulate stem-cell behavior will only unveil new therapeutic targets.
Fluorescence for assessing protein location and function
Fluorophores and techniques to tag proteins
Organisms are composed of cells that, in turn, are composed of macromolecules, e.g. proteins, ribosomes, etc. These macromolecules interact with each other, changing their concentration and suffering chemical modifications. The main goal of many biologists is to understand these interactions, using MRI, ESR, electrochemistry, and fluorescence among others. The advantages of fluorescence reside in its high sensitivity, non-invasiveness, safe detection, and ability to modulate the fluorescence signal. Fluorescence was observed mainly from small organic dyes attached to antibodies to the protein of interest. Later, fluorophores could directly recognize organelles, nucleic acids, and important ions in living cells. In the past decade, the discovery of green fluorescent protein (GFP), by Roger Y. Tsien, hybrid system and quantum dots have enable assessing protein location and function more precisely.[94] Three main types of fluorophores are used: small organic dyes, green fluorescent proteins, and quantum dots. Small organic dyes usually are less than 1 kD, and have been modified to increase photostability, enhance brightness, and reduce self-quenching. Quantum dots have very sharp wavelength, high molar absorptivity and quantum yield. Both organic dyes and quantum dyes do not have the ability to recognize the protein of interest without the aid of antibodies, hence they must use immunolabeling. Since the size of the fluorophore-targeting complex typically exceeds 200 kD, it might interfere with multiprotein recognition in protein complexes, and other methods should be use in parallel. An advantage includes diversity of properties and a limitation is the ability of targeting in live cells. Green fluorescent proteins are genetically encoded and can be covalently fused to your protein of interest. A more developed genetic tagging technique is the tetracysteine biarsenical system, which requires modification of the targeted sequence that includes four cysteines, which binds membrane-permeable biarsenical molecules, the green and the red dyes "FlAsH" and "ReAsH", with picomolar affinity. Both fluorescent proteins and biarsenical tetracysteine can be expressed in live cells, but present major limitations in ectopic expression and might cause lose of function. Giepmans shows parallel applications of targeting methods and fluorophores using GFP and tetracysteine with ReAsH for α-tubulin and β-actin, respectively. After fixation, cells were immunolabeled for the Golgi matrix with QD and for the mitochondrial enzyme cytochrome with Cy5.[94]Protein dynamics
Fluorescent techniques have been used assess a number of protein dynamics including protein tracking, conformational changes, protein–protein interactions, protein synthesis and turnover, and enzyme activity, among others.Three general approaches for measuring protein net redistribution and diffusion are single-particle tracking, correlation spectroscopy and photomarking methods. In single-particle tracking, the individual molecule must be both bright and sparse enough to be tracked from one video to the other. Correlation spectroscopy analyzes the intensity fluctuations resulting from migration of fluorescent objects into and out of a small volume at the focus of a laser. In photomarking, a fluorescent protein can be dequenched in a subcellular area with the use of intense local illumination and the fate of the marked molecule can be imaged directly. Michalet and coworkers used quantum dots for single-particle tracking using biotin-quantum dots in HeLa cells.[95]
One of the best ways to detect conformational changes in proteins is to sandwich said protein between two fluorophores. FRET will respond to internal conformational changes result from reorientation of the fluorophore with respect to the other. Dumbrepatil sandwiched an estrogen receptor between a CFP (cyan fluorescent protein) and a YFP (yellow fluorescent protein) to study conformational changes of the receptor upon binding of a ligand.[96]
Fluorophores of different colors can be applied to detect their respective antigens within the cell. If antigens are located close enough to each other, they will appear colocalized and this phenomenon is known as colocalization.[97] Specialized computer software, such as CoLocalizer Pro, can be used to confirm and characterize the degree of colocalization.
FRET can detect dynamic protein–protein interaction in live cells providing the fluorophores get close enough. Galperin et al. used three fluorescent proteins to study multiprotein interactions in live cells.[98]
Tetracysteine biarsenical systems can be used to study protein synthesis and turnover, which requires discrimination of old copies from new copies. In principle, a tetracysteine-tagged protein is labeled with FlAsH for a short time, leaving green labeled proteins. The protein synthesis is then carried out in the presence of ReAsH, labeling the new proteins as red.[99]
One can also use fluorescence to see endogenous enzyme activity, typically by using a quenched activity based proteomics (qABP). Covalent binding of a qABP to the active site of the targeted enzyme will provide direct evidence concerning if the enzyme is responsible for the signal upon release of the quencher and regain of fluorescence.[100]
The unique combination of high spatial and temporal resolution, nondestructive compatibility with living cells and organisms, and molecular specificity insure that fluorescence techniques will remain central in the analysis of protein networks and systems biology.[94]