Cheminformatics has been an active field in various guises since the
1970s and earlier, with activity in academic departments and commercial
pharmaceutical research and development departments. The term chemoinformatics was defined in its application to drug discover, for instance, by F.K. Brown in 1998:
Chemoinformatics
is the mixing of those information resources to transform data into
information and information into knowledge for the intended purpose of
making better decisions faster in the area of drug lead identification
and optimization.
Since then, both terms, cheminformatics and chemoinformatics, have been used, although, lexicographically, cheminformatics appears to be more frequently used, despite academics in Europe declaring for the variant chemoinformatics in 2006. In 2009, a prominent Springer journal in the field, the Journal of Cheminformatics, was founded by transatlantic executive editors, giving yet further impetus to the shorter variant.
Background
Cheminformatics
combines the scientific working fields of chemistry, computer science,
and information science—for example in the areas of topology, chemical graph theory, information retrieval and data mining in the chemical space. Cheminformatics can also be applied to data analysis for various industries like paper and pulp, dyes and such allied industries.
Applications
Storage and retrieval
A primary application of cheminformatics is the storage, indexing, and search of information relating to chemical compounds.
The efficient search of such stored information includes topics that
are dealt with in computer science, such as data mining, information
retrieval, information extraction, and machine learning. Related research topics include:
The in silico representation of chemical structures uses specialized formats such as the Simplified molecular input line entry specifications (SMILES) or the XML-based Chemical Markup Language. These representations are often used for storage in large chemical databases.
While some formats are suited for visual representations in two- or
three-dimensions, others are more suited for studying physical
interactions, modeling and docking studies.
Virtual libraries
Chemical data can pertain to real or virtual molecules. Virtual
libraries of compounds may be generated in various ways to explore
chemical space and hypothesize novel compounds with desired properties.
Virtual libraries of classes of compounds (drugs, natural products,
diversity-oriented synthetic products) were recently generated using the
FOG (fragment optimized growth) algorithm. This was done by using cheminformatic tools to train transition probabilities of a Markov chain
on authentic classes of compounds, and then using the Markov chain to
generate novel compounds that were similar to the training database.
Virtual screening
In contrast to high-throughput screening, virtual screening involves computationally
screening in silico libraries of compounds, by means of various methods such as
docking, to identify members likely to possess desired properties
such as biological activity against a given target. In some cases, combinatorial chemistry
is used in the development of the library to increase the efficiency in
mining the chemical space. More commonly, a diverse library of small
molecules or natural products is screened.
Combinatorial chemistry comprises chemical synthetic methods that make it possible to prepare a large number (tens to thousands or even millions) of compounds in a single process. These compound libraries can be made as mixtures, sets of individual compounds or chemical structures generated by computer software. Combinatorial chemistry can be used for the synthesis of small molecules and for peptides.
Strategies that allow identification of useful components of the
libraries are also part of combinatorial chemistry. The methods used in
combinatorial chemistry are applied outside chemistry, too.
History
Combinatorial
chemistry had been invented by Furka Á (Eötvös Loránd University
Budapest Hungary) who described the principle of it, the combinatorial
synthesis and a deconvolution procedure in a document that was notarized
in 1982.
The principle of the combinatorial method is: synthesize a
multi-component compound mixture (combinatorial library) in a single
stepwise procedure and screen it to find drug candidates or other kinds
of useful compounds also in a single process. The most important
innovation of the combinatorial method is to use mixtures in the
synthesis and screening that ensures the high productivity of the
process. Motivations that led to the invention had been published in
2002.
Introduction
Synthesis of molecules in a combinatorial fashion can quickly lead to large numbers of molecules. For example, a molecule with three points of diversity (R1, R2, and R3) can generate possible structures, where , , and are the numbers of different substituents utilized.
The basic principle of combinatorial chemistry is to prepare libraries of a very large number of compounds then identify the useful components of the libraries.
In its modern form, combinatorial chemistry has probably had its biggest impact in the pharmaceutical industry. Researchers attempting to optimize the activity profile of a compound create a 'library' of many different but related compounds.Advances in robotics
have led to an industrial approach to combinatorial synthesis, enabling
companies to routinely produce over 100,000 new and unique compounds
per year.
In order to handle the vast number of structural possibilities,
researchers often create a 'virtual library', a computational
enumeration of all possible structures of a given pharmacophore with all available reactants.
Such a library can consist of thousands to millions of 'virtual'
compounds. The researcher will select a subset of the 'virtual library'
for actual synthesis, based upon various calculations and criteria.
Polymers (peptides and oligonucleotides)
Peptides forming in cycles 3 and 4
Combinatorial split-mix (split and pool) synthesis
Combinatorial split-mix (split and pool) synthesis is based on the solid-phase synthesis developed by Merrifield. If a combinatorial peptide library is synthesized using 20 amino acids
(or other kinds of building blocks) the bead form solid support is
divided into 20 equal portions. This is followed by coupling a different
amino acid to each portion. The third step is the mixing of all
portions. These three steps comprise a cycle. Elongation of the peptide
chains can be realized by simply repeating the steps of the cycle.
Flow diagram of the split-mix combinatorial synthesis
The procedure is illustrated by the synthesis of a dipeptide
library using the same three amino acids as building blocks in both
cycles. Each component of this library contains two amino acids arranged
in different orders. The amino acids used in couplings are represented
by yellow, blue and red circles in the figure. Divergent arrows show
dividing solid support resin (green circles) into equal portions,
vertical arrows mean coupling and convergent arrows represent mixing and
homogenizing the portions of the support.
The figure shows that in the two synthetic cycles 9 dipeptides
are formed. In the third and fourth cycles, 27 tripeptides and 81
tetrapeptides would form, respectively.
The "split-mix synthesis" has several outstanding features:
It is highly efficient. As the figure demonstrates the number of
peptides formed in the synthetic process (3, 9, 27, 81) increases
exponentially with the number of executed cycles. Using 20 amino acids
in each synthetic cycle the number of formed peptides are: 400, 8,000,
160,000 and 3,200,000, respectively. This means that the number of
peptides increases exponentially with the number of the executed cycles.
All peptide sequences are formed in the process that can be deduced by a combination of the amino acids used in the cycles.
Portioning of the support into equal samples assures formation of
the components of the library in nearly equal molar quantities.
Only a single peptide forms on each bead of the support. This is the
consequence of using only one amino acid in the coupling steps. It is
completely unknown, however, which is the peptide that occupies a
selected bead.
The split-mix method can be used for the synthesis of organic or any
other kind of library that can be prepared from its building blocks in a
stepwise process.
In 1990 three groups described methods for preparing peptide libraries by biological methods and one year later Fodor et al. published a remarkable method for synthesis of peptide arrays on small glass slides.
A "parallel synthesis" method was developed by Mario Geysen and his colleagues for preparation of peptide arrays.
They synthesized 96 peptides on plastic rods (pins) coated at their
ends with the solid support. The pins were immersed into the solution of
reagents placed in the wells of a microtiter plate.
The method is widely applied particularly by using automatic parallel
synthesizers. Although the parallel method is much slower than the real
combinatorial one, its advantage is that it is exactly known which
peptide or other compound forms on each pin.
Further procedures were developed to combine the advantages of
both split-mix and parallel synthesis. In the method described by two
groups
the solid support was enclosed into permeable plastic capsules together
with a radiofrequency tag that carried the code of the compound to be
formed in the capsule. The procedure was carried out similar to the
split-mix method. In the split step, however, the capsules were
distributed among the reaction vessels according to the codes read from
the radiofrequency tags of the capsules.
A different method for the same purpose was developed by Furka et al.
is named "string synthesis". In this method, the capsules carried no
code. They are strung like the pearls in a necklace and placed into the
reaction vessels in stringed form. The identity of the capsules, as well
as their contents, are stored by their position occupied on the
strings. After each coupling step, the capsules are redistributed among
new strings according to definite rules.
Small molecules
In the drug discovery process, the synthesis and biological evaluation of small molecules
of interest have typically been a long and laborious process.
Combinatorial chemistry has emerged in recent decades as an approach to
quickly and efficiently synthesize large numbers of potential
small molecule drug candidates. In a typical synthesis, only a single
target molecule is produced at the end of a synthetic scheme, with each
step in a synthesis producing only a single product. In a combinatorial synthesis,
when using only single starting material, it is possible to synthesize a
large library of molecules using identical reaction conditions that can
then be screened for their biological activity.
This pool of products is then split into three equal portions
containing each of the three products, and then each of the three
individual pools is then reacted with another unit of reagent B, C, or
D, producing 9 unique compounds from the previous 3. This process is
then repeated until the desired number of building blocks is added,
generating many compounds. When synthesizing a library of compounds by a
multi-step synthesis, efficient reaction methods must be employed, and
if traditional purification methods are used after each reaction step,
yields and efficiency will suffer.
Solid-phase synthesis offers potential solutions to obviate the
need for typical quenching and purification steps often used in
synthetic chemistry. In general, a starting molecule is adhered to a
solid support (typically an insoluble polymer),
then additional reactions are performed, and the final product is
purified and then cleaved from the solid support. Since the molecules of
interest are attached to a solid support, it is possible to reduce the
purification after each reaction to a single filtration/wash step,
eliminating the need for tedious liquid-liquid extraction and solvent
evaporation steps that most synthetic chemistry involves. Furthermore,
by using heterogeneous reactants, excess reagents can be used to drive
sluggish reactions to completion, which can further improve yields.
Excess reagents can simply be washed away without the need for
additional purification steps such as chromatography.
Use of a solid-supported polyamine to scavenge excess reagent
Over the years, a variety of methods have been developed to refine
the use of solid-phase organic synthesis in combinatorial chemistry,
including efforts to increase the ease of synthesis and purification, as
well as non-traditional methods to characterize intermediate products.
Although
the majority of the examples described here will employ heterogeneous
reaction media in every reaction step, Booth and Hodges provide an early
example of using solid-supported reagents only during the purification
step of traditional solution-phase syntheses.
In their view,
solution-phase chemistry offers the advantages of avoiding attachment
and cleavage reactions necessary to anchor and remove molecules to
resins as well as eliminating the need to recreate solid-phase analogues
of established solution-phase reactions.
The single purification step at the end of a synthesis allows one
or more impurities to be removed, assuming the chemical structure of
the offending impurity is known. While the use of solid-supported
reagents greatly simplifies the synthesis of compounds, many
combinatorial syntheses require multiple steps, each of which still
requires some form of purification. Armstrong, et al. describe a one-pot
method for generating combinatorial libraries, called
multiple-component condensations (MCCs).
In this scheme, three or more reagents react such that each reagent is
incorporated into the final product in a single step, eliminating the
need for a multi-step synthesis that involves many purification steps.
In MCCs, there is no deconvolution required to determine which compounds
are biologically-active because each synthesis in an array has only a
single product, thus the identity of the compound should be
unequivocally known.
Example of a solid-phase supported dye to signal ligand binding
In another array synthesis, Still generated a large library of oligopeptides by split synthesis.
The drawback to making many thousands of compounds is that it is
difficult to determine the structure of the formed compounds. Their
solution is to use molecular tags, where a tiny amount (1 pmol/bead) of a
dye is attached to the beads, and the identity of a certain bead can be
determined by analyzing which tags are present on the bead. Despite how
easy attaching tags makes identification of receptors, it would be
quite impossible to individually screen each compound for its receptor
binding ability, so a dye was attached to each receptor, such that only
those receptors that bind to their substrate produce a color change.
When many reactions need to be run in an array (such as the 96
reactions described in one of Armstrong's MCC arrays), some of the more
tedious aspects of synthesis can be automated to improve efficiency.
DeWitt and Czarnik detail a method called the "DIVERSOMER method," in which many miniaturized versions of chemical reactions are all run simultaneously.
This method uses a device that automates the resin loading and wash
cycles, as well as the reaction cycle monitoring and purification, and
demonstrate the feasibility of their method and apparatus by using it to
synthesize a variety of molecule classes, such as hydantoins and benzodiazepines, running 40 individual reactions in most cases.
Oftentimes, it is not possible to use expensive equipment, and
Schwabacher, et al. describe a simple method of combining parallel
synthesis of library members and evaluation of entire libraries of
compounds.
In their method, a thread that is partitioned into different regions is
wrapped around a cylinder, where a different reagent is then coupled to
each region which bears only a single species. The thread is then
re-divided and wrapped around a cylinder of a different size, and this
process is then repeated. The beauty of this method is that the identity
of each product can be known simply by its location along the thread,
and the corresponding biological activity is identified by Fourier transformation of fluorescence signals.
Use of a traceless linker
In most of the syntheses described here, it is necessary to attach
and remove the starting reagent to/from a solid support. This can lead
to the generation of a hydroxyl group, which can potentially affect the
biological activity of a target compound. Ellman uses solid phase
supports in a multi-step synthesis scheme to obtain 192 individual
1,4-benzodiazepine derivatives, which are well-known therapeutic agents.
To eliminate the possibility of potential hydroxyl group interference, a
novel method using silyl-aryl chemistry is used to link the molecules
to the solid support which cleaves from the support and leaves no trace
of the linker.
Compounds that can be synthesized from solid-phase bound imines
When anchoring a molecule to a solid support, intermediates cannot be
isolated from one another without cleaving the molecule from the resin.
Since many of the traditional characterization techniques used to track
reaction progress and confirm product structure are solution-based,
different techniques must be used. Gel-phase 13 C NMR spectroscopy,
MALDI mass spectrometry, and IR spectroscopy have been used to confirm
structure and monitor the progress of solid-phase reactions.
Gordon et al., describe several case studies that utilize imines and
peptidyl phosphonates to generate combinatorial libraries of small
molecules.
To generate the imine library, an amino acid tethered to a resin is
reacted in the presence of an aldehyde. The authors demonstrate the use
of fast 13 C gel phase NMR spectroscopy and magic angle spinning 1 H NMR
spectroscopy to monitor the progress of reactions and showed that most
imines could be formed in as little as 10 minutes at room temperature
when trimethyl orthoformate was used as the solvent. The formed imines
were then derivatized to generate 4-thiazolidinones, B-lactams, and
pyrrolidines.
The use of solid-phase supports greatly simplifies the synthesis
of large combinatorial libraries of compounds. This is done by anchoring
a starting material to a solid support and then running subsequent
reactions until a sufficiently large library is built, after which the
products are cleaved from the support. The use of solid-phase
purification has also been demonstrated for use in solution-phase
synthesis schemes in conjunction with standard liquid-liquid extraction
purification techniques.
Deconvolution and screening
Combinatorial libraries
Combinatorial
libraries are special multi-component mixtures of small-molecule
chemical compounds that are synthesized in a single stepwise process.
They differ from collection of individual compounds as well as from
series of compounds prepared by parallel synthesis.
It is an important feature that mixtures are used in their synthesis.
The use of mixtures ensures the very high efficiency of the process.
Both reactants can be mixtures and in this case the procedure would be
even more efficient. For practical reasons however, it is advisable to
use the split-mix method in which one of two mixtures is replaced by
single building blocks (BBs). The mixtures are so important that there
are no combinatorial libraries without using mixture in the synthesis,
and if a mixture is used in a process inevitably combinatorial library
forms. The split-mix synthesis is usually realized using solid support
but it is possible to apply it in solution, too. Since he structures the
components are unknown deconvolution methods need to be used in
screening.
One of the most important features of combinatorial libraries is that
the whole mixture can be screened in a single process. This makes these
libraries very useful in pharmaceutical research.
Partial libraries of full combinatorial libraries can also be
synthesized. Some of them can be used in deconvolution.
Deconvolution of libraries cleaved from the solid support
If
the synthesized molecules of a combinatorial library are cleaved from
the solid support a soluble mixture forms. In such solution, millions of
different compounds may be found. When this synthetic method was
developed, it first seemed impossible to identify the molecules, and to
find molecules with useful properties. Strategies for identification of
the useful components had been developed, however, to solve the problem.
All these strategies are based on synthesis and testing of partial
libraries.
The earliest iterative strategy is described in the above mentioned
document of Furka notarized in 1982 and.The method was later independently published by Erb et al. under the name „Recursive deconvolution”
Recursive deconvolution. Blue, yellow and red circles: amino acids, Green circle: solid support
Recursive deconvolution
The
method is made understandable by the figure. A 27 member peptide
library is synthesized from three amino acids. After the first (A) and
second (B) cycles samples were set aside before mixing them. The
products of the third cycle (C) are cleaved down before mixing then are
tested for activity. Suppose the group labeled by + sign is active. All
members have the red amino acid at the last coupling position (CP).
Consequently the active member also has the red amino acid at the last
CP. Then the red amino acid is coupled to the three samples set aside
after the second cycle (B) to get samples D. After cleaving, the three E
samples are formed. If after testing the sample marked by + is the
active one it shows that the blue amino acid occupies the second CP in
the active component. Then to the three A samples first the blue then
the red amino acid is coupled (F) then tested again after cleaving (G).
If the + component proves to be active, the sequence of the active
component is determined and shown in H.
Positional scanning
Positional scanning was introduced independently by Furka et al. and Pinilla et al.
The method is based on the synthesis and testing of series of
sublibraries. in which a certain sequence position is occupied by the
same amino acid. The figure shows the nine sublibraries (B1-D3) of a
full peptide trimer library (A) made from three amino acids. In
sublibraries there is a position which is occupied by the same amino
acid in all components. In the synthesis of a sublibrary the support is
not divided and only one amino acid is coupled to the whole sample. As a
result one position is really occupied by the same amino acid in all
components. For example in the B2 sublibrary position 2 is occupied by
the „yellow” amino acid in all the nine components. If in a screening
test this sublibrary gives positive answer it means that position 2 in
the active peptide is also occupied by the „yellow” amino acid. The
amino acid sequence can be determined by testing all the nine (or
sometime less) sublibraries.
Positional
scanning. Full trimer peptide library made from 3 amino acids and its 9
sublibraries. The first row shows the coupling positions
A 27 member tripeptide full library and the three omission libraries. The color circles are amino acids
Omission libraries
In omission libraries
a certain amino acid is missing from all peptides of the mixture. The
figure shows the full library and the three omission libraries. At the
top the omitted amino acids are shown. If the omission library gives a
negative test the omitted amino acid is present in the active component.
Deconvolution of tethered combinatorial libraries
If
the peptides are not cleaved from the solid support we deal with a
mixture of beads, each bead containing a single peptide. Smith and his
colleagues
showed earlier that peptides could be tested in tethered form, too.
This approach was also used in screening peptide libraries. The tethered
peptide library was tested with a dissolved target protein. The beads
to which the protein was attached were picked out, removed the protein
from the bead then the tethered peptide was identified by sequencing.
A somewhat different approach was followed by Taylor and Morken.
They used infrared thermography to identify catalysts in non-peptide
tethered libraries. The method is based on the heat that is evolved in
the beads that contain a catalyst when the tethered library immersed
into a solution of a substrate. When the beads are examined through an
infrared microscope the catalyst containing beads appear as bright spots
and can be picked out.
Encoded combinatorial libraries
If
we deal with a non-peptide organic libraries library it is not as
simple to determine the identity of the content of a bead as in the case
of a peptide one. In order to circumvent this difficulty methods had
been developed to attach to the beads, in parallel with the synthesis of
the library, molecules that encode the structure of the compound formed
in the bead.
Ohlmeyer and his colleagues published a binary encoding method
They used mixtures of 18 tagging molecules that after cleaving them
from the beads could be identified by Electron Capture Gas
Chromatography. Sarkar et al. described chiral oligomers of pentenoic
amides (COPAs) that can be used to construct mass encoded OBOC
libraries.
Kerr et al. introduced an innovative encoding method
An orthogonally protected removable bifunctional linker was attached to
the beads. One end of the linker was used to attach the non-natural
building blocks of the library while to the other end encoding amino
acid triplets were linked. The building blocks were non-natural amino
acids and the series of their encoding amino acid triplets could be
determined by Edman degradation. The important aspect of this kind of
encoding was the possibility to cleave down from the beads the library
members together with their attached encoding tags forming a soluble
library. The same approach was used by Nikolajev et al. for encoding
with peptides.
In 1992 by Brenner and Lerner introduced DNA sequences to encode the
beads of the solid support that proved to be the most successful
encoding method. Nielsen, Brenner and Janda also used the Kerr approach for implementing the DNA encoding
In the latest period of time there were important advancements in DNA
sequencing. The next generation techniques make it possible to sequence
large number of samples in parallel that is very important in screening
of DNA encoded libraries. There was another innovation that contributed
to the success of DNA encoding. In 2000 Halpin and Harbury omitted the
solid support in the split-mix synthesis of the DNA encoded
combinatorial libraries and replaced it by the encoding DNA oligomers.
In solid phase split and pool synthesis the number of components of
libraries can’t exceed the number of the beads of the support. By the
novel approach of the authors, this restraint was entirely eliminated
and made it possible to prepare new compounds in practically unlimited
number. The Danish company Nuevolution for example synthesized a DNA encoded library containing 40 trillion! components
The DNA encoded libraries are soluble that makes possible to apply the
efficient affinity binding in screening. Some authors apply the DEL for
acromim of DNA encoded combinatorial libraries others are using DECL.
The latter seems better since in this name the combinatorial nature of
these libraries is clearly expressed.
Several types of DNA encoded combinatorial libraries had been introduced
and described in the first decade of the present millennium. These
libraries are very successfully applied in drug research.
DNA templated synthesis of combinatorial libraries described in 2001 by Gartner et al.
Dual pharmacophore DNA encoded combinatorial libraries invented in 2004 by Mlecco et al.
Sequence encoded routing published by Harbury Halpin and Harbury in 2004.
Single pharmacophore DNA encoded combinatorial libraries introduced in 2008 by Manocci et al.
DNA encoded combinatorial libraries formed by using yoctoliter-scale reactor published by Hansen et al. in 2009
Details are found about their synthesis and application in the page DNA-encoded chemical library.
The DNA encoded soluble combinatorial libraries have drawbacks, too.
First of all the advantage coming from the use of solid support is
completely lost. In addition, the polyionic character of DNA encoding
chains limits the utility of non-aqueous solvents in the synthesis. For
this reason many laboratories choose to develop DNA compatible reactions
for use in the synthesis of DECLs. Quite a few of available ones are
already described
Materials science
Materials science has applied the techniques of combinatorial chemistry to the discovery of new materials. This work was pioneered by P.G. Schultz et al. in the mid-nineties in the context of luminescent materials obtained by co-deposition of
elements on a silicon substrate. His work was preceded by J. J. Hanak in
1970
but the computer and robotics tools were not available for the method
to spread at the time. Work has been continued by several academic
groups as well as companies with large research and development programs (Symyx Technologies, GE, Dow Chemical etc.). The technique has been used extensively for catalysis, coatings, electronics, and many other fields.
The application of appropriate informatics tools is critical to handle,
administer, and store the vast volumes of data produced. New types of Design of experiments
methods have also been developed to efficiently address the large
experimental spaces that can be tackled using combinatorial methods.
Diversity-oriented libraries
Even
though combinatorial chemistry has been an essential part of early drug
discovery for more than two decades, so far only one de novo
combinatorial chemistry-synthesized chemical has been approved for
clinical use by FDA (sorafenib, a multikinase inhibitor indicated for advanced renal cancer). The analysis of the poor success rate of the approach has been suggested to connect with the rather limited chemical space covered by products of combinatorial chemistry.
When comparing the properties of compounds in combinatorial chemistry
libraries to those of approved drugs and natural products, Feher and
Schmidt noted that combinatorial chemistry libraries suffer particularly from the lack of chirality, as well as structure rigidity, both of which are widely regarded as drug-like properties. Even though natural product drug discovery has not probably been the most fashionable trend in the pharmaceutical industry in recent times, a large proportion of new chemical entities still are nature-derived compounds,
and thus, it has been suggested that effectiveness of combinatorial
chemistry could be improved by enhancing the chemical diversity of
screening libraries.
As chirality and rigidity are the two most important features
distinguishing approved drugs and natural products from compounds in
combinatorial chemistry libraries, these are the two issues emphasized
in so-called diversity oriented libraries, i.e. compound collections
that aim at coverage of the chemical space, instead of just huge numbers
of compounds.
Many important medications have been discovered by bioprospecting including the diabetes drug metformin (developed from a natural product found in Galega officinalis).
When a region’s biological resources or indigenous knowledge are unethically appropriated or commercially exploited without providing fair compensation, this is known as biopiracy. Various international treaties have been negotiated to provide
countries legal recourse in the event of biopiracy and to offer
commercial actors legal certainty for investment. These include the UNConvention on Biological Diversity and the Nagoya Protocol.
Other risks associated with bioprospecting are the overharvesting
of individual species and environmental damage, but legislation has
been developed to combat these also. Examples include national laws
such as the US Marine Mammal Protection Act and US Endangered Species Act, and international treaties such as the UN Convention on Biological Diversity, the UN Convention on the Law of the Sea, and the UN Antarctic Treaty.
Bioprospecting-derived resources and products
Agriculture
Annonin-based biopesticides, used to protect crops from beetles and other pests, were developed from the plant Annona squamosa.
Errors
and oversights can occur at different steps in the bioprospecting
process including collection of source material, screening source
material for bioactivity, testing isolated compounds for toxicity, and identification of mechanism of action.
Collection of source material
Voucher
deposition allows species identity to be re-evaluated if there are
problems re-isolating an active constituent from a biological source.
Prior to collecting biological material or traditional knowledge, the correct permissions must be obtained from the source country, land owner etc. Failure to do so can result in criminal proceedings and rejection of any subsequent patent
applications. It is also important to collect biological material in
adequate quantities, to have biological material formally identified, and to deposit a voucher specimen with a repository for long-term preservation and storage. This helps ensure any important discoveries are reproducible.
Bioactivity and toxicity testing
When testing extracts and isolated compounds for bioactivity and toxicity, the use of standard protocols (eg. CLSI, ISO, NIH, EURL ECVAM, OECD)
is desirable because this improves test result accuracy and
reproducibility. Also, if the source material is likely to contain
known (previously discovered) active compounds (eg. streptomycin in the
case of actinomycetes), then dereplication is necessary to exclude these
extracts and compounds from the discovery pipeline as early as
possible. In addition, it is important to consider solvent effects on the cells or cell lines being tested, to include reference compounds (ie. pure chemical compounds
for which accurate bioactivity and toxicity data are available), to set
limits on cell line passage number (eg. 10-20 passages), to include all
the necessary positive and negative controls,
and to be aware of assay limitations. These steps help ensure assay
results are accurate, reproducible and interpreted correctly.
Identification of mechanism of action
When
attempting to elucidate the mechanism of action of an extract or
isolated compound, it is important to use multiple orthogonal assays.
Using just a single assay, especially a single in vitro assay, gives a very incomplete picture of an extract or compound’s effect on the human body. In the case of Valeriana officinalis root extract, for example, the sleep-inducing effects of this extract are due to multiple compounds and mechanisms including interaction with GABA receptors and relaxation of smooth muscle. The mechanism of action of an isolated compound can also be misidentified if a single assay is used because some compounds interfere with assays. For example, the sulfhydryl-scavenging assay used to detect histone acetyltransferase inhibition can give a false positive result if the test compound reacts covalently with cysteines.
Biopiracy
The term biopiracy was coined by Pat Mooney, to describe a practice in which indigenous knowledge of nature, originating with indigenous peoples, is used by others for profit, without authorization or compensation to the indigenous people themselves. For example, when bioprospectors draw on indigenous knowledge of medicinal plants which is later patented
by medical companies without recognizing the fact that the knowledge is
not new or invented by the patenter, this deprives the indigenous
community of their potential rights to the commercial product derived
from the technology that they themselves had developed. Critics of this practice, such as Greenpeace, claim these practices contribute to inequality between developing countries rich in biodiversity, and developed countries hosting biotech firms.
In the 1990s many large pharmaceutical and drug discovery
companies responded to charges of biopiracy by ceasing work on natural
products, turning to combinatorial chemistry to develop novel compounds.
Famous cases of biopiracy
A white rosy periwinkle
The rosy periwinkle
The rosy periwinkle case dates from the 1950s. The rosy periwinkle, while native to Madagascar, had been widely introduced into other tropical countries around the world well before the discovery of vincristine. Different countries are reported as having acquired different beliefs about the medical properties of the plant.
This meant that researchers could obtain local knowledge from one
country and plant samples from another. The use of the plant for diabetes was the original stimulus for research. Effectiveness in the treatment of both Hodgkin's Disease and leukemia were discovered instead. The Hodgkin's lymphoma chemotherapeutic drug vinblastine is derivable from the rosy periwinkle.
The Maya ICBG controversy
The Maya ICBG bioprospecting controversy took place in 1999–2000, when the International Cooperative Biodiversity Group led by ethnobiologistBrent Berlin was accused of being engaged in unethical forms of bioprospecting by several NGOs and indigenous organizations. The ICBG aimed to document the biodiversity of Chiapas, Mexico and the ethnobotanical knowledge of the indigenous Maya people
– in order to ascertain whether there were possibilities of developing
medical products based on any of the plants used by the indigenous
groups.
The Maya ICBG case was among the first to draw attention to the
problems of distinguishing between benign forms of bioprospecting and
unethical biopiracy, and to the difficulties of securing community
participation and prior informed consent for would-be bioprospectors.
In 1997, the US corporation RiceTec (a subsidiary of RiceTec AG of Liechtenstein) attempted to patent certain hybrids of basmati rice and semidwarf long-grain rice. The Indian government challenged this patent and, in 2002, fifteen of the patent's twenty claims were invalidated.
The Enola bean
The Enola bean
The Enola bean is a variety of Mexican yellow bean, so called after the wife of the man who patented it in 1999.
The allegedly distinguishing feature of the variety is seeds of a
specific shade of yellow. The patent-holder subsequently sued a large
number of importers of Mexican yellow beans with the following result:
"...export sales immediately dropped over 90% among importers that had
been selling these beans for years, causing economic damage to more than
22,000 farmers in northern Mexico who depended on sales of this bean."
A lawsuit was filed on behalf of the farmers and, in 2005, the US-PTO
ruled in favor of the farmers. In 2008, the patent was revoked.
Hoodia gordonii
The succulent Hoodia gordonii
Hoodia gordonii, a succulent plant, originates from the Kalahari Desert of South Africa. For generations it has been known to the traditionally living San people as an appetite suppressant. In 1996 South Africa's Council for Scientific and Industrial Research began working with companies, including Unilever, to develop dietary supplements based on Hoodia.
Originally the San people were not scheduled to receive any benefits
from the commercialization of their traditional knowledge, but in 2003
the South African San Council made an agreement with CSIR in which they
would receive from 6 to 8% of the revenue from the sale of Hoodia products.
In 2008 after having invested €20 million in R&D on Hoodia as a potential ingredient in dietary supplements for weight loss, Unilever terminated the project because their clinical studies did not show that Hoodia was safe and effective enough to bring to market.
Further cases
The following is a selection of further recent cases of biopiracy. Most of them do not relate to traditional medicines.
One common misunderstanding is that pharmaceutical companies patent
the plants they collect. While obtaining a patent on a naturally
occurring organism as previously known or used is not possible, patents
may be taken out on specific chemicals isolated or developed from
plants. Often these patents are obtained with a stated and researched
use of those chemicals.
Generally the existence, structure and synthesis of those compounds is
not a part of the indigenous medical knowledge that led researchers to
analyze the plant in the first place. As a result, even if the
indigenous medical knowledge is taken as prior art, that knowledge does
not by itself make the active chemical compound "obvious," which is the
standard applied under patent law.
In the United States, patent law
can be used to protect "isolated and purified" compounds – even, in one
instance, a new chemical element (see USP 3,156,523). In 1873, Louis Pasteur
patented a "yeast" which was "free from disease" (patent #141072).
Patents covering biological inventions have been treated similarly. In
the 1980 case of Diamond v. Chakrabarty, the Supreme Court
upheld a patent on a bacterium that had been genetically modified to
consume petroleum, reasoning that U.S. law permits patents on "anything
under the sun that is made by man." The United States Patent and Trademark Office
(USPTO) has observed that "a patent on a gene covers the isolated and
purified gene but does not cover the gene as it occurs in nature".
The CBD came into force in 1993. It secured rights to control access
to genetic resources for the countries in which those resources are
located. One objective of the CBD is to enable lesser-developed
countries to better benefit from their resources and traditional
knowledge. Under the rules of the CBD, bioprospectors are required to
obtain informed consent to access such resources, and must share any benefits with the biodiversity-rich country. However, some critics believe that the CBD has failed to establish appropriate regulations to prevent biopiracy.
Others claim that the main problem is the failure of national
governments to pass appropriate laws implementing the provisions of the
CBD. The Nagoya Protocol to the CBD, which came into force in 2014, provides further regulations. The CBD has been ratified, acceded or accepted by 196 countries and jurisdictions globally, with exceptions including the Holy See and United States.
Bioprospecting contracts
The requirements for bioprospecting as set by CBD has created a new branch of international patent and trade law, bioprospecting contracts. Bioprospecting contracts lay down the rules of benefit sharing between researchers and countries, and can bring royalties to lesser-developed countries.
However, although these contracts are based on prior informed consent
and compensation (unlike biopiracy), every owner or carrier of an
indigenous knowledge and resources are not always consulted or
compensated, as it would be difficult to ensure every individual is included.
Because of this, some have proposed that the indigenous or other
communities form a type of representative micro-government that would
negotiate with researchers to form contracts in such a way that the
community benefits from the arrangements. Unethical bioprospecting contracts (as distinct from ethical ones) can be viewed as a new form of biopiracy.
An example of a bioprospecting contract is the agreement between Merck and INBio of Costa Rica.