Search This Blog

Tuesday, May 11, 2021

Cheminformatics

From Wikipedia, the free encyclopedia

Cheminformatics (also known as chemoinformatics) refers to use of physical chemistry theory with computer and information science techniques—so called "in silico" techniques—in application to a range of descriptive and prescriptive problems in the field of chemistry, including in its applications to biology and related molecular fields. Such in silico techniques are used, for example, by pharmaceutical companies and in academic settings to aid and inform the process of drug discovery, for instance in the design of well-defined combinatorial libraries of synthetic compounds, or to assist in structure-based drug design. The methods can also be used in chemical and allied industries, and such fields as environmental science and pharmacology, where chemical processes are involved or studied.

History

Cheminformatics has been an active field in various guises since the 1970s and earlier, with activity in academic departments and commercial pharmaceutical research and development departments. The term chemoinformatics was defined in its application to drug discover, for instance, by F.K. Brown in 1998:

Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization.

Since then, both terms, cheminformatics and chemoinformatics, have been used, although, lexicographically, cheminformatics appears to be more frequently used, despite academics in Europe declaring for the variant chemoinformatics in 2006. In 2009, a prominent Springer journal in the field, the Journal of Cheminformatics, was founded by transatlantic executive editors, giving yet further impetus to the shorter variant.

Background

Cheminformatics combines the scientific working fields of chemistry, computer science, and information science—for example in the areas of topology, chemical graph theory, information retrieval and data mining in the chemical space. Cheminformatics can also be applied to data analysis for various industries like paper and pulp, dyes and such allied industries.

Applications

Storage and retrieval

A primary application of cheminformatics is the storage, indexing, and search of information relating to chemical compounds. The efficient search of such stored information includes topics that are dealt with in computer science, such as data mining, information retrieval, information extraction, and machine learning. Related research topics include:

File formats

The in silico representation of chemical structures uses specialized formats such as the Simplified molecular input line entry specifications (SMILES) or the XML-based Chemical Markup Language. These representations are often used for storage in large chemical databases. While some formats are suited for visual representations in two- or three-dimensions, others are more suited for studying physical interactions, modeling and docking studies.

Virtual libraries

Chemical data can pertain to real or virtual molecules. Virtual libraries of compounds may be generated in various ways to explore chemical space and hypothesize novel compounds with desired properties. Virtual libraries of classes of compounds (drugs, natural products, diversity-oriented synthetic products) were recently generated using the FOG (fragment optimized growth) algorithm. This was done by using cheminformatic tools to train transition probabilities of a Markov chain on authentic classes of compounds, and then using the Markov chain to generate novel compounds that were similar to the training database.

Virtual screening

In contrast to high-throughput screening, virtual screening involves computationally screening in silico libraries of compounds, by means of various methods such as docking, to identify members likely to possess desired properties such as biological activity against a given target. In some cases, combinatorial chemistry is used in the development of the library to increase the efficiency in mining the chemical space. More commonly, a diverse library of small molecules or natural products is screened.

Quantitative structure-activity relationship (QSAR)

This is the calculation of quantitative structure–activity relationship and quantitative structure property relationship values, used to predict the activity of compounds from their structures. In this context there is also a strong relationship to chemometrics. Chemical expert systems are also relevant, since they represent parts of chemical knowledge as an in silico representation. There is a relatively new concept of matched molecular pair analysis or prediction-driven MMPA which is coupled with QSAR model in order to identify activity cliff.

Combinatorial chemistry

From Wikipedia, the free encyclopedia

Combinatorial chemistry comprises chemical synthetic methods that make it possible to prepare a large number (tens to thousands or even millions) of compounds in a single process. These compound libraries can be made as mixtures, sets of individual compounds or chemical structures generated by computer software. Combinatorial chemistry can be used for the synthesis of small molecules and for peptides.

Strategies that allow identification of useful components of the libraries are also part of combinatorial chemistry. The methods used in combinatorial chemistry are applied outside chemistry, too.

History

Combinatorial chemistry had been invented by Furka Á (Eötvös Loránd University Budapest Hungary) who described the principle of it, the combinatorial synthesis and a deconvolution procedure in a document that was notarized in 1982. The principle of the combinatorial method is: synthesize a multi-component compound mixture (combinatorial library) in a single stepwise procedure and screen it to find drug candidates or other kinds of useful compounds also in a single process. The most important innovation of the combinatorial method is to use mixtures in the synthesis and screening that ensures the high productivity of the process. Motivations that led to the invention had been published in 2002.

Introduction

Synthesis of molecules in a combinatorial fashion can quickly lead to large numbers of molecules. For example, a molecule with three points of diversity (R1, R2, and R3) can generate possible structures, where , , and are the numbers of different substituents utilized.

The basic principle of combinatorial chemistry is to prepare libraries of a very large number of compounds then identify the useful components of the libraries.

Although combinatorial chemistry has only really been taken up by industry since the 1990s, its roots can be seen as far back as the 1960s when a researcher at Rockefeller University, Bruce Merrifield, started investigating the solid-phase synthesis of peptides.

In its modern form, combinatorial chemistry has probably had its biggest impact in the pharmaceutical industry. Researchers attempting to optimize the activity profile of a compound create a 'library' of many different but related compounds. Advances in robotics have led to an industrial approach to combinatorial synthesis, enabling companies to routinely produce over 100,000 new and unique compounds per year.

In order to handle the vast number of structural possibilities, researchers often create a 'virtual library', a computational enumeration of all possible structures of a given pharmacophore with all available reactants. Such a library can consist of thousands to millions of 'virtual' compounds. The researcher will select a subset of the 'virtual library' for actual synthesis, based upon various calculations and criteria.

Polymers (peptides and oligonucleotides)

Peptides forming in cycles 3 and 4

Combinatorial split-mix (split and pool) synthesis

Combinatorial split-mix (split and pool) synthesis is based on the solid-phase synthesis developed by Merrifield. If a combinatorial peptide library is synthesized using 20 amino acids (or other kinds of building blocks) the bead form solid support is divided into 20 equal portions. This is followed by coupling a different amino acid to each portion. The third step is the mixing of all portions. These three steps comprise a cycle. Elongation of the peptide chains can be realized by simply repeating the steps of the cycle.

Flow diagram of the split-mix combinatorial synthesis

The procedure is illustrated by the synthesis of a dipeptide library using the same three amino acids as building blocks in both cycles. Each component of this library contains two amino acids arranged in different orders. The amino acids used in couplings are represented by yellow, blue and red circles in the figure. Divergent arrows show dividing solid support resin (green circles) into equal portions, vertical arrows mean coupling and convergent arrows represent mixing and homogenizing the portions of the support.

The figure shows that in the two synthetic cycles 9 dipeptides are formed. In the third and fourth cycles, 27 tripeptides and 81 tetrapeptides would form, respectively.

The "split-mix synthesis" has several outstanding features:

  • It is highly efficient. As the figure demonstrates the number of peptides formed in the synthetic process (3, 9, 27, 81) increases exponentially with the number of executed cycles. Using 20 amino acids in each synthetic cycle the number of formed peptides are: 400, 8,000, 160,000 and 3,200,000, respectively. This means that the number of peptides increases exponentially with the number of the executed cycles.
  • All peptide sequences are formed in the process that can be deduced by a combination of the amino acids used in the cycles.
  • Portioning of the support into equal samples assures formation of the components of the library in nearly equal molar quantities.
  • Only a single peptide forms on each bead of the support. This is the consequence of using only one amino acid in the coupling steps. It is completely unknown, however, which is the peptide that occupies a selected bead.
  • The split-mix method can be used for the synthesis of organic or any other kind of library that can be prepared from its building blocks in a stepwise process.

In 1990 three groups described methods for preparing peptide libraries by biological methods and one year later Fodor et al. published a remarkable method for synthesis of peptide arrays on small glass slides.

A "parallel synthesis" method was developed by Mario Geysen and his colleagues for preparation of peptide arrays. They synthesized 96 peptides on plastic rods (pins) coated at their ends with the solid support. The pins were immersed into the solution of reagents placed in the wells of a microtiter plate. The method is widely applied particularly by using automatic parallel synthesizers. Although the parallel method is much slower than the real combinatorial one, its advantage is that it is exactly known which peptide or other compound forms on each pin.

Further procedures were developed to combine the advantages of both split-mix and parallel synthesis. In the method described by two groups the solid support was enclosed into permeable plastic capsules together with a radiofrequency tag that carried the code of the compound to be formed in the capsule. The procedure was carried out similar to the split-mix method. In the split step, however, the capsules were distributed among the reaction vessels according to the codes read from the radiofrequency tags of the capsules.

A different method for the same purpose was developed by Furka et al. is named "string synthesis". In this method, the capsules carried no code. They are strung like the pearls in a necklace and placed into the reaction vessels in stringed form. The identity of the capsules, as well as their contents, are stored by their position occupied on the strings. After each coupling step, the capsules are redistributed among new strings according to definite rules.

Small molecules

In the drug discovery process, the synthesis and biological evaluation of small molecules of interest have typically been a long and laborious process. Combinatorial chemistry has emerged in recent decades as an approach to quickly and efficiently synthesize large numbers of potential small molecule drug candidates. In a typical synthesis, only a single target molecule is produced at the end of a synthetic scheme, with each step in a synthesis producing only a single product. In a combinatorial synthesis, when using only single starting material, it is possible to synthesize a large library of molecules using identical reaction conditions that can then be screened for their biological activity. This pool of products is then split into three equal portions containing each of the three products, and then each of the three individual pools is then reacted with another unit of reagent B, C, or D, producing 9 unique compounds from the previous 3. This process is then repeated until the desired number of building blocks is added, generating many compounds. When synthesizing a library of compounds by a multi-step synthesis, efficient reaction methods must be employed, and if traditional purification methods are used after each reaction step, yields and efficiency will suffer.

Solid-phase synthesis offers potential solutions to obviate the need for typical quenching and purification steps often used in synthetic chemistry. In general, a starting molecule is adhered to a solid support (typically an insoluble polymer), then additional reactions are performed, and the final product is purified and then cleaved from the solid support. Since the molecules of interest are attached to a solid support, it is possible to reduce the purification after each reaction to a single filtration/wash step, eliminating the need for tedious liquid-liquid extraction and solvent evaporation steps that most synthetic chemistry involves. Furthermore, by using heterogeneous reactants, excess reagents can be used to drive sluggish reactions to completion, which can further improve yields. Excess reagents can simply be washed away without the need for additional purification steps such as chromatography.

Use of a solid-supported polyamine to scavenge excess reagent

Over the years, a variety of methods have been developed to refine the use of solid-phase organic synthesis in combinatorial chemistry, including efforts to increase the ease of synthesis and purification, as well as non-traditional methods to characterize intermediate products. Although the majority of the examples described here will employ heterogeneous reaction media in every reaction step, Booth and Hodges provide an early example of using solid-supported reagents only during the purification step of traditional solution-phase syntheses. In their view, solution-phase chemistry offers the advantages of avoiding attachment and cleavage reactions necessary to anchor and remove molecules to resins as well as eliminating the need to recreate solid-phase analogues of established solution-phase reactions.

The single purification step at the end of a synthesis allows one or more impurities to be removed, assuming the chemical structure of the offending impurity is known. While the use of solid-supported reagents greatly simplifies the synthesis of compounds, many combinatorial syntheses require multiple steps, each of which still requires some form of purification. Armstrong, et al. describe a one-pot method for generating combinatorial libraries, called multiple-component condensations (MCCs). In this scheme, three or more reagents react such that each reagent is incorporated into the final product in a single step, eliminating the need for a multi-step synthesis that involves many purification steps. In MCCs, there is no deconvolution required to determine which compounds are biologically-active because each synthesis in an array has only a single product, thus the identity of the compound should be unequivocally known.

Example of a solid-phase supported dye to signal ligand binding

In another array synthesis, Still generated a large library of oligopeptides by split synthesis. The drawback to making many thousands of compounds is that it is difficult to determine the structure of the formed compounds. Their solution is to use molecular tags, where a tiny amount (1 pmol/bead) of a dye is attached to the beads, and the identity of a certain bead can be determined by analyzing which tags are present on the bead. Despite how easy attaching tags makes identification of receptors, it would be quite impossible to individually screen each compound for its receptor binding ability, so a dye was attached to each receptor, such that only those receptors that bind to their substrate produce a color change.

When many reactions need to be run in an array (such as the 96 reactions described in one of Armstrong's MCC arrays), some of the more tedious aspects of synthesis can be automated to improve efficiency. DeWitt and Czarnik detail a method called the "DIVERSOMER method," in which many miniaturized versions of chemical reactions are all run simultaneously. This method uses a device that automates the resin loading and wash cycles, as well as the reaction cycle monitoring and purification, and demonstrate the feasibility of their method and apparatus by using it to synthesize a variety of molecule classes, such as hydantoins and benzodiazepines, running 40 individual reactions in most cases.

Oftentimes, it is not possible to use expensive equipment, and Schwabacher, et al. describe a simple method of combining parallel synthesis of library members and evaluation of entire libraries of compounds. In their method, a thread that is partitioned into different regions is wrapped around a cylinder, where a different reagent is then coupled to each region which bears only a single species. The thread is then re-divided and wrapped around a cylinder of a different size, and this process is then repeated. The beauty of this method is that the identity of each product can be known simply by its location along the thread, and the corresponding biological activity is identified by Fourier transformation of fluorescence signals.

Use of a traceless linker

In most of the syntheses described here, it is necessary to attach and remove the starting reagent to/from a solid support. This can lead to the generation of a hydroxyl group, which can potentially affect the biological activity of a target compound. Ellman uses solid phase supports in a multi-step synthesis scheme to obtain 192 individual 1,4-benzodiazepine derivatives, which are well-known therapeutic agents. To eliminate the possibility of potential hydroxyl group interference, a novel method using silyl-aryl chemistry is used to link the molecules to the solid support which cleaves from the support and leaves no trace of the linker.

Compounds that can be synthesized from solid-phase bound imines

When anchoring a molecule to a solid support, intermediates cannot be isolated from one another without cleaving the molecule from the resin. Since many of the traditional characterization techniques used to track reaction progress and confirm product structure are solution-based, different techniques must be used. Gel-phase 13 C NMR spectroscopy, MALDI mass spectrometry, and IR spectroscopy have been used to confirm structure and monitor the progress of solid-phase reactions. Gordon et al., describe several case studies that utilize imines and peptidyl phosphonates to generate combinatorial libraries of small molecules. To generate the imine library, an amino acid tethered to a resin is reacted in the presence of an aldehyde. The authors demonstrate the use of fast 13 C gel phase NMR spectroscopy and magic angle spinning 1 H NMR spectroscopy to monitor the progress of reactions and showed that most imines could be formed in as little as 10 minutes at room temperature when trimethyl orthoformate was used as the solvent. The formed imines were then derivatized to generate 4-thiazolidinones, B-lactams, and pyrrolidines.

The use of solid-phase supports greatly simplifies the synthesis of large combinatorial libraries of compounds. This is done by anchoring a starting material to a solid support and then running subsequent reactions until a sufficiently large library is built, after which the products are cleaved from the support. The use of solid-phase purification has also been demonstrated for use in solution-phase synthesis schemes in conjunction with standard liquid-liquid extraction purification techniques.

Deconvolution and screening

Combinatorial libraries

Combinatorial libraries are special multi-component mixtures of small-molecule chemical compounds that are synthesized in a single stepwise process. They differ from collection of individual compounds as well as from series of compounds prepared by parallel synthesis. It is an important feature that mixtures are used in their synthesis. The use of mixtures ensures the very high efficiency of the process. Both reactants can be mixtures and in this case the procedure would be even more efficient. For practical reasons however, it is advisable to use the split-mix method in which one of two mixtures is replaced by single building blocks (BBs). The mixtures are so important that there are no combinatorial libraries without using mixture in the synthesis, and if a mixture is used in a process inevitably combinatorial library forms. The split-mix synthesis is usually realized using solid support but it is possible to apply it in solution, too. Since he structures the components are unknown deconvolution methods need to be used in screening. One of the most important features of combinatorial libraries is that the whole mixture can be screened in a single process. This makes these libraries very useful in pharmaceutical research. Partial libraries of full combinatorial libraries can also be synthesized. Some of them can be used in deconvolution.

Deconvolution of libraries cleaved from the solid support

If the synthesized molecules of a combinatorial library are cleaved from the solid support a soluble mixture forms. In such solution, millions of different compounds may be found. When this synthetic method was developed, it first seemed impossible to identify the molecules, and to find molecules with useful properties. Strategies for identification of the useful components had been developed, however, to solve the problem. All these strategies are based on synthesis and testing of partial libraries. The earliest iterative strategy is described in the above mentioned document of Furka notarized in 1982 and.The method was later independently published by Erb et al. under the name „Recursive deconvolution”

Recursive deconvolution. Blue, yellow and red circles: amino acids, Green circle: solid support

Recursive deconvolution

The method is made understandable by the figure. A 27 member peptide library is synthesized from three amino acids. After the first (A) and second (B) cycles samples were set aside before mixing them. The products of the third cycle (C) are cleaved down before mixing then are tested for activity. Suppose the group labeled by + sign is active. All members have the red amino acid at the last coupling position (CP). Consequently the active member also has the red amino acid at the last CP. Then the red amino acid is coupled to the three samples set aside after the second cycle (B) to get samples D. After cleaving, the three E samples are formed. If after testing the sample marked by + is the active one it shows that the blue amino acid occupies the second CP in the active component. Then to the three A samples first the blue then the red amino acid is coupled (F) then tested again after cleaving (G). If the + component proves to be active, the sequence of the active component is determined and shown in H.

Positional scanning

Positional scanning was introduced independently by Furka et al. and Pinilla et al. The method is based on the synthesis and testing of series of sublibraries. in which a certain sequence position is occupied by the same amino acid. The figure shows the nine sublibraries (B1-D3) of a full peptide trimer library (A) made from three amino acids. In sublibraries there is a position which is occupied by the same amino acid in all components. In the synthesis of a sublibrary the support is not divided and only one amino acid is coupled to the whole sample. As a result one position is really occupied by the same amino acid in all components. For example in the B2 sublibrary position 2 is occupied by the „yellow” amino acid in all the nine components. If in a screening test this sublibrary gives positive answer it means that position 2 in the active peptide is also occupied by the „yellow” amino acid. The amino acid sequence can be determined by testing all the nine (or sometime less) sublibraries.

Positional scanning. Full trimer peptide library made from 3 amino acids and its 9 sublibraries. The first row shows the coupling positions
A 27 member tripeptide full library and the three omission libraries. The color circles are amino acids

Omission libraries

In omission libraries a certain amino acid is missing from all peptides of the mixture. The figure shows the full library and the three omission libraries. At the top the omitted amino acids are shown. If the omission library gives a negative test the omitted amino acid is present in the active component.

Deconvolution of tethered combinatorial libraries

If the peptides are not cleaved from the solid support we deal with a mixture of beads, each bead containing a single peptide. Smith and his colleagues showed earlier that peptides could be tested in tethered form, too. This approach was also used in screening peptide libraries. The tethered peptide library was tested with a dissolved target protein. The beads to which the protein was attached were picked out, removed the protein from the bead then the tethered peptide was identified by sequencing. A somewhat different approach was followed by Taylor and Morken. They used infrared thermography to identify catalysts in non-peptide tethered libraries. The method is based on the heat that is evolved in the beads that contain a catalyst when the tethered library immersed into a solution of a substrate. When the beads are examined through an infrared microscope the catalyst containing beads appear as bright spots and can be picked out.

Encoded combinatorial libraries

If we deal with a non-peptide organic libraries library it is not as simple to determine the identity of the content of a bead as in the case of a peptide one. In order to circumvent this difficulty methods had been developed to attach to the beads, in parallel with the synthesis of the library, molecules that encode the structure of the compound formed in the bead. Ohlmeyer and his colleagues published a binary encoding method They used mixtures of 18 tagging molecules that after cleaving them from the beads could be identified by Electron Capture Gas Chromatography. Sarkar et al. described chiral oligomers of pentenoic amides (COPAs) that can be used to construct mass encoded OBOC libraries. Kerr et al. introduced an innovative encoding method An orthogonally protected removable bifunctional linker was attached to the beads. One end of the linker was used to attach the non-natural building blocks of the library while to the other end encoding amino acid triplets were linked. The building blocks were non-natural amino acids and the series of their encoding amino acid triplets could be determined by Edman degradation. The important aspect of this kind of encoding was the possibility to cleave down from the beads the library members together with their attached encoding tags forming a soluble library. The same approach was used by Nikolajev et al. for encoding with peptides. In 1992 by Brenner and Lerner introduced DNA sequences to encode the beads of the solid support that proved to be the most successful encoding method. Nielsen, Brenner and Janda also used the Kerr approach for implementing the DNA encoding In the latest period of time there were important advancements in DNA sequencing. The next generation techniques make it possible to sequence large number of samples in parallel that is very important in screening of DNA encoded libraries. There was another innovation that contributed to the success of DNA encoding. In 2000 Halpin and Harbury omitted the solid support in the split-mix synthesis of the DNA encoded combinatorial libraries and replaced it by the encoding DNA oligomers. In solid phase split and pool synthesis the number of components of libraries can’t exceed the number of the beads of the support. By the novel approach of the authors, this restraint was entirely eliminated and made it possible to prepare new compounds in practically unlimited number.  The Danish company Nuevolution for example synthesized a DNA encoded library containing 40 trillion! components The DNA encoded libraries are soluble that makes possible to apply the efficient affinity binding in screening. Some authors apply the DEL for acromim of DNA encoded combinatorial libraries others are using DECL. The latter seems better since in this name the combinatorial nature of these libraries is clearly expressed. Several types of DNA encoded combinatorial libraries had been introduced and described in the first decade of the present millennium. These libraries are very successfully applied in drug research.

  • DNA templated synthesis of combinatorial libraries described in 2001 by Gartner et al. 
  • Dual pharmacophore DNA encoded combinatorial libraries invented in 2004 by Mlecco et al. 
  • Sequence encoded routing published by Harbury Halpin and Harbury in 2004. 
  • Single pharmacophore DNA encoded combinatorial libraries introduced in 2008 by Manocci et al. 
  • DNA encoded combinatorial libraries formed by using yoctoliter-scale reactor published by Hansen et al. in 2009

Details are found about their synthesis and application in the page DNA-encoded chemical library. The DNA encoded soluble combinatorial libraries have drawbacks, too. First of all the advantage coming from the use of solid support is completely lost. In addition, the polyionic character of DNA encoding chains limits the utility of non-aqueous solvents in the synthesis. For this reason many laboratories choose to develop DNA compatible reactions for use in the synthesis of DECLs. Quite a few of available ones are already described

Materials science

Materials science has applied the techniques of combinatorial chemistry to the discovery of new materials. This work was pioneered by P.G. Schultz et al. in the mid-nineties in the context of luminescent materials obtained by co-deposition of elements on a silicon substrate. His work was preceded by J. J. Hanak in 1970 but the computer and robotics tools were not available for the method to spread at the time. Work has been continued by several academic groups as well as companies with large research and development programs (Symyx Technologies, GE, Dow Chemical etc.). The technique has been used extensively for catalysis, coatings, electronics, and many other fields. The application of appropriate informatics tools is critical to handle, administer, and store the vast volumes of data produced. New types of Design of experiments methods have also been developed to efficiently address the large experimental spaces that can be tackled using combinatorial methods.

Diversity-oriented libraries

Even though combinatorial chemistry has been an essential part of early drug discovery for more than two decades, so far only one de novo combinatorial chemistry-synthesized chemical has been approved for clinical use by FDA (sorafenib, a multikinase inhibitor indicated for advanced renal cancer). The analysis of the poor success rate of the approach has been suggested to connect with the rather limited chemical space covered by products of combinatorial chemistry. When comparing the properties of compounds in combinatorial chemistry libraries to those of approved drugs and natural products, Feher and Schmidt noted that combinatorial chemistry libraries suffer particularly from the lack of chirality, as well as structure rigidity, both of which are widely regarded as drug-like properties. Even though natural product drug discovery has not probably been the most fashionable trend in the pharmaceutical industry in recent times, a large proportion of new chemical entities still are nature-derived compounds, and thus, it has been suggested that effectiveness of combinatorial chemistry could be improved by enhancing the chemical diversity of screening libraries. As chirality and rigidity are the two most important features distinguishing approved drugs and natural products from compounds in combinatorial chemistry libraries, these are the two issues emphasized in so-called diversity oriented libraries, i.e. compound collections that aim at coverage of the chemical space, instead of just huge numbers of compounds.

Patent classification subclass

In the 8th edition of the International Patent Classification (IPC), which entered into force on January 1, 2006, a special subclass has been created for patent applications and patents related to inventions in the domain of combinatorial chemistry: "C40B".

Bioprospecting

From Wikipedia, the free encyclopedia

Many important medications have been discovered by bioprospecting including the diabetes drug metformin (developed from a natural product found in Galega officinalis).

Bioprospecting (also known as biodiversity prospecting) is the exploration of natural sources for small molecules, macromolecules and biochemical and genetic information that could be developed into commercially valuable products for the agricultural, aquaculture, bioremediation, cosmetics, nanotechnology, or pharmaceutical industries. In the pharmaceutical industry, for example, almost one third of all small-molecule drugs approved by the U.S. Food and Drug Administration (FDA) between 1981 and 2014 were either natural products or compounds derived from natural products.

Terrestrial plants, fungi and actinobacteria have been the focus of many past bioprospecting programs, but interest is growing in less explored ecosystems (e.g. seas and oceans) and organisms (e.g. myxobacteria, archaea) as a means of identifying new compounds with novel biological activities. Species may be randomly screened for bioactivity or rationally selected and screened based on ecological, ethnobiological, ethnomedical, historical or genomic information.

When a region’s biological resources or indigenous knowledge are unethically appropriated or commercially exploited without providing fair compensation, this is known as biopiracy. Various international treaties have been negotiated to provide countries legal recourse in the event of biopiracy and to offer commercial actors legal certainty for investment. These include the UN Convention on Biological Diversity and the Nagoya Protocol.

Other risks associated with bioprospecting are the overharvesting of individual species and environmental damage, but legislation has been developed to combat these also. Examples include national laws such as the US Marine Mammal Protection Act and US Endangered Species Act, and international treaties such as the UN Convention on Biological Diversity, the UN Convention on the Law of the Sea, and the UN Antarctic Treaty.

Bioprospecting-derived resources and products

Agriculture

Annonin-based biopesticides, used to protect crops from beetles and other pests, were developed from the plant Annona squamosa.

Bioprospecting-derived resources and products used in agriculture include biofertilizers, biopesticides and veterinary antibiotics. Rhizobium is a genus of soil bacteria used as biofertilizers, Bacillus thuringiensis (also called Bt) and the annonins (obtained from seeds of the plant Annona squamosa) are examples of biopesticides, and valnemulin and tiamulin (discovered and developed from the basidiomycete fungus Clitopilus passeckerianus) are examples of veterinary antibiotics.

Bioremediation

Examples of bioprospecting products used in bioremediation include Coriolopsis gallica- and Phanerochaete chrysosporium-derived laccase enzymes, used for treating beer factory wastewater and for dechlorinating and decolorizing paper mill effluent.

Cosmetics and personal care

Cosmetics and personal care products obtained from bioprospecting include Porphyridium cruentum-derived oligosaccharide and oligoelement blends used to treat erythema (rosacea, flushing and dark circles), Xanthobacter autotrophicus-derived zeaxanthin used for skin hydration and UV protection, Clostridium histolyticum-derived collagenases used for skin regeneration, and Microsporum-derived keratinases used for hair removal.

Nanotechnology and biosensors

Because microbial laccases have a broad substrate range, they can be used in biosensor technology to detect a wide range of organic compounds. For example, laccase-containing electrodes are used to detect polyphenolic compounds in wine, and lignins and phenols in wastewater.

Pharmaceuticals

Many of the antibacterial drugs in current clinical use were discovered through bioprospecting including the β-lactam antibiotics, aminoglycosides, tetracyclines, amphenicols, polymyxins, macrolides, pleuromutilins, glycopeptides, rifamycins, lincosamides, streptogramins and phosphonic acid antibiotics. The aminoglycoside antibiotic streptomycin, for example, was discovered from the soil bacterium Streptomyces griseus, the fusidane antibiotic fusidic acid was discovered from the soil fungus Acremonium fusidioides, and the pleuromutilin antibiotics (eg. lefamulin) were discovered and developed from the basidiomycete fungus Clitopilus passeckerianus.

Other examples of bioprospecting-derived anti-infective drugs include the antifungal drug griseofulvin (discovered from the soil fungus Penicillium griseofulvum), the antifungal and antileishmanial drug amphotericin B (discovered from the soil bacterium Streptomyces nodosus), the antimalarial drug artemisinin (discovered from the plant Artemisia annua), and the antihelminthic drug ivermectin (developed from the soil bacterium Streptomyces avermitilis).

Bioprospecting-derived pharmaceuticals have been developed for the treatment of non-communicable diseases and conditions too. These include the anticancer drug bleomycin (obtained from the soil bacterium Streptomyces verticillus), the immunosuppressant drug ciclosporin used to treat autoimmune diseases such as rheumatoid arthritis and psoriasis (obtained from the soil fungus Tolypocladium inflatum), the anti-inflammatory drug colchicine used to treat and prevent gout flares (obtained from the plant Colchicum autumnale), the analgesic drug ziconotide (developed from the cone snail Conus magus), and the acetylcholinesterase inhibitor galantamine used to treat Alzheimer's disease (obtained from plants in the Galanthus genus).

Bioprospecting pitfalls

Errors and oversights can occur at different steps in the bioprospecting process including collection of source material, screening source material for bioactivity, testing isolated compounds for toxicity, and identification of mechanism of action.

Collection of source material

Voucher deposition allows species identity to be re-evaluated if there are problems re-isolating an active constituent from a biological source.

Prior to collecting biological material or traditional knowledge, the correct permissions must be obtained from the source country, land owner etc. Failure to do so can result in criminal proceedings and rejection of any subsequent patent applications. It is also important to collect biological material in adequate quantities, to have biological material formally identified, and to deposit a voucher specimen with a repository for long-term preservation and storage. This helps ensure any important discoveries are reproducible.

Bioactivity and toxicity testing

When testing extracts and isolated compounds for bioactivity and toxicity, the use of standard protocols (eg. CLSI, ISO, NIH, EURL ECVAM, OECD) is desirable because this improves test result accuracy and reproducibility. Also, if the source material is likely to contain known (previously discovered) active compounds (eg. streptomycin in the case of actinomycetes), then dereplication is necessary to exclude these extracts and compounds from the discovery pipeline as early as possible. In addition, it is important to consider solvent effects on the cells or cell lines being tested, to include reference compounds (ie. pure chemical compounds for which accurate bioactivity and toxicity data are available), to set limits on cell line passage number (eg. 10-20 passages), to include all the necessary positive and negative controls, and to be aware of assay limitations. These steps help ensure assay results are accurate, reproducible and interpreted correctly.

Identification of mechanism of action

When attempting to elucidate the mechanism of action of an extract or isolated compound, it is important to use multiple orthogonal assays. Using just a single assay, especially a single in vitro assay, gives a very incomplete picture of an extract or compound’s effect on the human body. In the case of Valeriana officinalis root extract, for example, the sleep-inducing effects of this extract are due to multiple compounds and mechanisms including interaction with GABA receptors and relaxation of smooth muscle. The mechanism of action of an isolated compound can also be misidentified if a single assay is used because some compounds interfere with assays. For example, the sulfhydryl-scavenging assay used to detect histone acetyltransferase inhibition can give a false positive result if the test compound reacts covalently with cysteines.

Biopiracy

The term biopiracy was coined by Pat Mooney, to describe a practice in which indigenous knowledge of nature, originating with indigenous peoples, is used by others for profit, without authorization or compensation to the indigenous people themselves. For example, when bioprospectors draw on indigenous knowledge of medicinal plants which is later patented by medical companies without recognizing the fact that the knowledge is not new or invented by the patenter, this deprives the indigenous community of their potential rights to the commercial product derived from the technology that they themselves had developed. Critics of this practice, such as Greenpeace, claim these practices contribute to inequality between developing countries rich in biodiversity, and developed countries hosting biotech firms.

In the 1990s many large pharmaceutical and drug discovery companies responded to charges of biopiracy by ceasing work on natural products, turning to combinatorial chemistry to develop novel compounds.

Famous cases of biopiracy

A white rosy periwinkle

The rosy periwinkle

The rosy periwinkle case dates from the 1950s. The rosy periwinkle, while native to Madagascar, had been widely introduced into other tropical countries around the world well before the discovery of vincristine. Different countries are reported as having acquired different beliefs about the medical properties of the plant. This meant that researchers could obtain local knowledge from one country and plant samples from another. The use of the plant for diabetes was the original stimulus for research. Effectiveness in the treatment of both Hodgkin's Disease and leukemia were discovered instead. The Hodgkin's lymphoma chemotherapeutic drug vinblastine is derivable from the rosy periwinkle.

The Maya ICBG controversy

The Maya ICBG bioprospecting controversy took place in 1999–2000, when the International Cooperative Biodiversity Group led by ethnobiologist Brent Berlin was accused of being engaged in unethical forms of bioprospecting by several NGOs and indigenous organizations. The ICBG aimed to document the biodiversity of Chiapas, Mexico and the ethnobotanical knowledge of the indigenous Maya people – in order to ascertain whether there were possibilities of developing medical products based on any of the plants used by the indigenous groups.

The Maya ICBG case was among the first to draw attention to the problems of distinguishing between benign forms of bioprospecting and unethical biopiracy, and to the difficulties of securing community participation and prior informed consent for would-be bioprospectors.

The neem tree

A neem tree

In 1994, the U.S. Department of Agriculture and W. R. Grace and Company received a European patent on methods of controlling fungal infections in plants using a composition that included extracts from the neem tree (Azadirachta indica), which grows throughout India and Nepal. In 2000 the patent was successfully opposed by several groups from the EU and India including the EU Green Party, Vandana Shiva, and the International Federation of Organic Agriculture Movements (IFOAM) on the basis that the fungicidal activity of neem extract had long been known in Indian traditional medicine.[45] WR Grace appealed and lost in 2005.

Basmati rice

In 1997, the US corporation RiceTec (a subsidiary of RiceTec AG of Liechtenstein) attempted to patent certain hybrids of basmati rice and semidwarf long-grain rice. The Indian government challenged this patent and, in 2002, fifteen of the patent's twenty claims were invalidated.

The Enola bean

The Enola bean

The Enola bean is a variety of Mexican yellow bean, so called after the wife of the man who patented it in 1999. The allegedly distinguishing feature of the variety is seeds of a specific shade of yellow. The patent-holder subsequently sued a large number of importers of Mexican yellow beans with the following result: "...export sales immediately dropped over 90% among importers that had been selling these beans for years, causing economic damage to more than 22,000 farmers in northern Mexico who depended on sales of this bean." A lawsuit was filed on behalf of the farmers and, in 2005, the US-PTO ruled in favor of the farmers. In 2008, the patent was revoked.

Hoodia gordonii

The succulent Hoodia gordonii

Hoodia gordonii, a succulent plant, originates from the Kalahari Desert of South Africa. For generations it has been known to the traditionally living San people as an appetite suppressant. In 1996 South Africa's Council for Scientific and Industrial Research began working with companies, including Unilever, to develop dietary supplements based on Hoodia. Originally the San people were not scheduled to receive any benefits from the commercialization of their traditional knowledge, but in 2003 the South African San Council made an agreement with CSIR in which they would receive from 6 to 8% of the revenue from the sale of Hoodia products.

In 2008 after having invested €20 million in R&D on Hoodia as a potential ingredient in dietary supplements for weight loss, Unilever terminated the project because their clinical studies did not show that Hoodia was safe and effective enough to bring to market.

Further cases

The following is a selection of further recent cases of biopiracy. Most of them do not relate to traditional medicines.

Legal and political aspects

Patent law

One common misunderstanding is that pharmaceutical companies patent the plants they collect. While obtaining a patent on a naturally occurring organism as previously known or used is not possible, patents may be taken out on specific chemicals isolated or developed from plants. Often these patents are obtained with a stated and researched use of those chemicals. Generally the existence, structure and synthesis of those compounds is not a part of the indigenous medical knowledge that led researchers to analyze the plant in the first place. As a result, even if the indigenous medical knowledge is taken as prior art, that knowledge does not by itself make the active chemical compound "obvious," which is the standard applied under patent law.

In the United States, patent law can be used to protect "isolated and purified" compounds – even, in one instance, a new chemical element (see USP 3,156,523). In 1873, Louis Pasteur patented a "yeast" which was "free from disease" (patent #141072). Patents covering biological inventions have been treated similarly. In the 1980 case of Diamond v. Chakrabarty, the Supreme Court upheld a patent on a bacterium that had been genetically modified to consume petroleum, reasoning that U.S. law permits patents on "anything under the sun that is made by man." The United States Patent and Trademark Office (USPTO) has observed that "a patent on a gene covers the isolated and purified gene but does not cover the gene as it occurs in nature".

Also possible under US law is patenting a cultivar, a new variety of an existing organism. The patent on the Enola bean (now revoked) was an example of this sort of patent. The intellectual property laws of the US also recognize plant breeders' rights under the Plant Variety Protection Act, 7 U.S.C. §§ 2321–2582.

Convention on Biological Diversity (CBD)

  Parties to the CBD
  Signed, but not ratified
  Non-signatory

The CBD came into force in 1993. It secured rights to control access to genetic resources for the countries in which those resources are located. One objective of the CBD is to enable lesser-developed countries to better benefit from their resources and traditional knowledge. Under the rules of the CBD, bioprospectors are required to obtain informed consent to access such resources, and must share any benefits with the biodiversity-rich country. However, some critics believe that the CBD has failed to establish appropriate regulations to prevent biopiracy. Others claim that the main problem is the failure of national governments to pass appropriate laws implementing the provisions of the CBD. The Nagoya Protocol to the CBD, which came into force in 2014, provides further regulations. The CBD has been ratified, acceded or accepted by 196 countries and jurisdictions globally, with exceptions including the Holy See and United States.

Bioprospecting contracts

The requirements for bioprospecting as set by CBD has created a new branch of international patent and trade law, bioprospecting contracts. Bioprospecting contracts lay down the rules of benefit sharing between researchers and countries, and can bring royalties to lesser-developed countries. However, although these contracts are based on prior informed consent and compensation (unlike biopiracy), every owner or carrier of an indigenous knowledge and resources are not always consulted or compensated, as it would be difficult to ensure every individual is included. Because of this, some have proposed that the indigenous or other communities form a type of representative micro-government that would negotiate with researchers to form contracts in such a way that the community benefits from the arrangements. Unethical bioprospecting contracts (as distinct from ethical ones) can be viewed as a new form of biopiracy.

An example of a bioprospecting contract is the agreement between Merck and INBio of Costa Rica.

Traditional knowledge database

Due to previous cases of biopiracy and to prevent further cases, the Government of India has converted traditional Indian medicinal information from ancient manuscripts and other resources into an electronic resource; this resulted in the Traditional Knowledge Digital Library in 2001. The texts are being recorded from Tamil, Sanskrit, Urdu, Persian and Arabic; made available to patent offices in English, German, French, Japanese and Spanish. The aim is to protect India's heritage from being exploited by foreign companies. Hundreds of yoga poses are also kept in the collection. The library has also signed agreements with leading international patent offices such as European Patent Office (EPO), United Kingdom Trademark & Patent Office (UKTPO) and the United States Patent and Trademark Office to protect traditional knowledge from biopiracy as it allows patent examiners at International Patent Offices to access TKDL databases for patent search and examination purposes.

Gene

From Wikipedia, the free encyclopedia (Redirected from Protein-coding gene ) This article is about sequenc...