High-content screening (HCS), also known as high-content analysis (HCA) or cellomics, is a method that is used in biological research and drug discovery to identify substances such as small molecules, peptides, or RNAi that alter the phenotype of a cell in a desired manner. Hence high content screening is a type of phenotypic screen
conducted in cells involving the analysis of whole cells or components
of cells with simultaneous readout of several parameters. HCS is related to high-throughput screening
(HTS), in which thousands of compounds are tested in parallel for their
activity in one or more biological assays, but involves assays of more
complex cellular phenotypes as outputs. Phenotypic changes may include increases or decreases in the production of cellular products such as proteins and/or changes in the morphology (visual appearance) of the cell. Hence HCA typically involves automated microscopy and image analysis.
Unlike high-content analysis, high-content screening implies a level of
throughput which is why the term "screening" differentiates HCS from
HCA, which may be high in content but low in throughput.
In high content screening, cells are first incubated
with the substance and after a period of time, structures and molecular
components of the cells are analyzed. The most common analysis
involves labeling proteins with fluorescent tags, and finally changes in cell phenotype are measured using automated image analysis.
Through the use of fluorescent tags with different absorption and
emission maxima, it is possible to measure several different cell
components in parallel. Furthermore, the imaging is able to detect
changes at a subcellular level (e.g., cytoplasm vs. nucleus vs. other organelles).
Therefore a large number of data points can be collected per cell. In
addition to fluorescent labeling, various label free assays have been
used in high content screening.
General principles
One of the applications of HCS is the discovery of new drug candidates
High-content screening (HCS) in cell-based systems uses living cells
as tools in biological research to elucidate the workings of normal and
diseased cells. HCS is also used to discover and optimize new drug
candidates. High content screening is a combination of modern cell biology, with all its molecular tools, with automated high resolution microscopy and robotic handling. Cells are first exposed to chemicals or RNAi reagents. Changes in cell morphology are then detected using image analysis. Changes in the amounts of proteins synthesized by cells are measured using a variety of techniques such as the green fluorescent proteins fused to endogenous proteins, or by fluorescent antibodies.
The technology may be used to determine whether a potential drug is disease modifying. For example, in humans G-protein coupled receptors
(GPCRs) are a large family of around 880 cell surface proteins that
transduce extra-cellular changes in the environment into a cell
response, like triggering an increase in blood pressure because of the
release of a regulatory hormone into the blood stream. Activation of
these GPCRs can involve their entry into cells and when this can be
visualised it can be the basis of a systematic analysis of receptor
function through chemical genetics, systematic genome wide screening or physiological manipulation.
At a cellular level, parallel acquisition of data on different cell properties, for example activity of signal transduction cascades and cytoskeleton integrity is the main advantage of this method in comparison to the faster but less detailed high throughput screening. While HCS is slower, the wealth of acquired data allows a more profound understanding of drug effects.
Automated image based screening permits the identification of small compounds altering cellular phenotypes and is of interest for the discovery of new pharmaceuticals
and new cell biological tools for modifying cell function. The
selection of molecules based on a cellular phenotype does not require a
prior knowledge of the biochemical targets that are affected by
compounds. However the identification of the biological target
will make subsequent preclinical optimization and clinical development
of the compound hit significantly easier. Given the increase in the use
of phenotypic/visual screening as a cell biological tool, methods are
required that permit systematic biochemical target identification if
these molecules are to be of broad use. Target identification has been defined as the rate limiting step in chemical genetics/high-content screening.
Instrumentation
An automated confocal image reader
High-content screening technology is mainly based on automated digital microscopy and flow cytometry,
in combination with IT-systems for the analysis and storage of the
data.
“High-content” or visual biology technology has two purposes, first to
acquire spatially or temporally resolved information on an event and
second to automatically quantify it. Spatially resolved instruments are
typically automated microscopes,
and temporal resolution still requires some form of fluorescence
measurement in most cases. This means that a lot of HCS instruments are (fluorescence)
microscopes that are connected to some form of image analysis package.
These take care of all the steps in taking fluorescent images of cells
and provide rapid, automated and unbiased assessment of experiments.
HCS instruments on the market today can be separated based on an
array of specifications that significantly influence the instruments
versatility and overall cost. These include speed, a live cell chamber
that includes temperature and CO2 control (some also have humidity
control for longer term live cell imaging), a built in pipettor or
injector for fast kinetic assays, and additional imaging modes such as
confocal, bright field, phase contrast and FRET. One of the most
incisive difference is whether the instruments are optical confocal or not. Confocal microscopy
summarizes as imaging/resolving a thin slice through an object and
rejecting out of focus light that comes from outside this slice.
Confocal imaging enables higher image signal to noise and higher
resolution than the more commonly applied epi-fluorescence microscopy.
Depending on the instrument confocality is achieved via laser
scanning, a single spinning disk with pinholes or slits, a dual spinning
disk, or a virtual slit. There are trade offs of sensitivity,
resolution, speed, photo-toxicity, photo-bleaching, instrument
complexity, and price between these various confocal techniques.
What all instruments share is the ability to take, store and
interpret images automatically and integrate into large robotic
cell/medium handling platforms.
Software
Many
screens are analyzed using the image analysis software that accompanies
the instrument, providing a turn-key solution. Third-party software
alternatives are often used for particularly challenging screens or
where a laboratory or facility has multiple instruments and wishes to
standardize to a single analysis platform. Some instrument software
provides bulk importing and exporting of images and data, for users who
want to do such standardization on a single analysis platform without
the use of third-party software, however.
Applications
This
technology allows a (very) large number of experiments to be performed,
allowing explorative screening. Cell-based systems are mainly used in
chemical genetics where large, diverse small molecule collections are
systematically tested for their effect on cellular model systems. Novel
drugs can be found using screens of tens of thousands of molecules, and
these have promise for the future of drug development.
Beyond drug discovery, chemical genetics is aimed at functionalizing the
genome by identifying small molecules that acts on most of the 21,000
gene products in a cell. High-content technology will be part of this
effort which could provide useful tools for learning where and when
proteins act by knocking them out chemically. This would be most useful
for gene where knock out mice (missing one or several genes) can not be
made because the protein is required for development, growth or
otherwise lethal when it is not there. Chemical knock out could address
how and where these genes work.
Further the technology is used in combination with RNAi
to identify sets of genes involved in specific mechanisms, for example
cell division. Here, libraries of RNAis, covering a whole set of
predicted genes inside the target organism's genome can be used to
identify relevant subsets, facilitating the annotation of genes for
which no clear role has been established beforehand.
The large datasets produced by automated cell biology contain spatially
resolved, quantitative data which can be used for building for systems
level models and simulations of how cells and organisms function.
Systems biology models of cell function would permit prediction of why,
where and how the cell responds to external changes, growth and disease.
History
High-content
screening technology allows for the evaluation of multiple biochemical
and morphological parameters in intact biological systems.
For cell-based approaches the utility of automated cell biology
requires an examination of how automation and objective measurement can
improve the experimentation and the understanding of disease. First, it
removes the influence of the investigator in most, but not all, aspects
of cell biology research and second it makes entirely new approaches
possible.
In review, classical 20th century cell biology used cell lines
grown in culture where the experiments were measured using very similar
to that described here, but there the investigator made the choice on
what was measured and how. In the early 1990s, the development of CCD cameras (charge coupled devicecameras)
for research created the opportunity to measure features in pictures of
cells- such as how much protein is in the nucleus, how much is outside.
Sophisticated measurements soon followed using new fluorescent
molecules, which are used to measure cell properties like second messenger
concentrations or the pH of internal cell compartments. The wide use of
the green fluorescent protein, a natural fluorescent protein molecule
from jellyfish, then accelerated the trend toward cell imaging as a
mainstream technology in cell biology. Despite these advances, the
choice of which cell to image and which data to present and how to
analyze it was still selected by the investigator.
By analogy, if one imagines a football field and dinner plates
laid across it, instead of looking at all of them, the investigator
would choose a handful near the score line and had to leave the rest. In
this analogy the field is a tissue culture dish, the plates the cells
growing on it. While this was a reasonable and pragmatic approach
automation of the whole process and the analysis makes possible the
analysis of the whole population of living cells, so the whole football
field can be measured.
Modern drug discovery involves the identification of screening hits, medicinal chemistry and optimization of those hits to increase the affinity, selectivity (to reduce the potential of side effects), efficacy/potency, metabolic stability (to increase the half-life), and oral bioavailability. Once a compound that fulfills all of these requirements has been identified, the process of drug development can continue. If successful, clinical trials are developed.
Modern drug discovery is thus usually a capital-intensive process that involves large investments by pharmaceutical industry corporations as well as national governments (who provide grants and loan guarantees).
Despite advances in technology and understanding of biological systems,
drug discovery is still a lengthy, "expensive, difficult, and
inefficient process" with low rate of new therapeutic discovery. In 2010, the research and development cost of each new molecular entity was about US$1.8 billion. In the 21st century,
basic discovery research is funded primarily by governments and by
philanthropic organizations, while late-stage development is funded
primarily by pharmaceutical companies or venture capitalists.
To be allowed to come to market, drugs must undergo several successful
phases of clinical trials, and pass through a new drug approval process,
called the New Drug Application in the United States.
Discovering drugs that may be a commercial success, or a public
health success, involves a complex interaction between investors,
industry, academia, patent laws, regulatory exclusivity, marketing and the need to balance secrecy with communication. Meanwhile, for disorders whose rarity means that no large commercial success or public health effect can be expected, the orphan drug funding process ensures that people who experience those disorders can have some hope of pharmacotherapeutic advances.
History
The
idea that the effect of a drug in the human body is mediated by
specific interactions of the drug molecule with biological
macromolecules, (proteins or nucleic acids
in most cases) led scientists to the conclusion that individual
chemicals are required for the biological activity of the drug. This
made for the beginning of the modern era in pharmacology, as pure chemicals, instead of crude extracts of medicinal plants, became the standard drugs. Examples of drug compounds isolated from crude preparations are morphine, the active agent in opium, and digoxin, a heart stimulant originating from Digitalis lanata. Organic chemistry also led to the synthesis of many of the natural products isolated from biological sources.
Historically, substances, whether crude extracts or purified
chemicals, were screened for biological activity without knowledge of
the biological target. Only after an active substance was identified was an effort made to identify the target. This approach is known as classical pharmacology, forward pharmacology, or phenotypic drug discovery.
Later, small molecules were synthesized to specifically target a
known physiological/pathological pathway, avoiding the mass screening of
banks of stored compounds. This led to great success, such as the work
of Gertrude Elion and George H. Hitchings on purine metabolism, the work of James Black on beta blockers and cimetidine, and the discovery of statins by Akira Endo. Another champion of the approach of developing chemical analogues of known active substances was Sir David Jack at Allen and Hanbury's, later Glaxo, who pioneered the first inhaled selective beta2-adrenergic agonist for asthma, the first inhaled steroid for asthma, ranitidine as a successor to cimetidine, and supported the development of the triptans.
Gertrude Elion, working mostly with a group of fewer than 50
people on purine analogues, contributed to the discovery of the first
anti-viral; the first immunosuppressant (azathioprine)
that allowed human organ transplantation; the first drug to induce
remission of childhood leukemia; pivotal anti-cancer treatments; an
anti-malarial; an anti-bacterial; and a treatment for gout.
Cloning of human proteins made possible the screening of large
libraries of compounds against specific targets thought to be linked to
specific diseases. This approach is known as reverse pharmacology and is the most frequently used approach today.
Targets
A "target" is produced within the pharmaceutical industry.
Generally, the "target" is the naturally existing cellular or molecular
structure involved in the pathology of interest where the
drug-in-development is meant to act.
However, the distinction between a "new" and "established" target can
be made without a full understanding of just what a "target" is. This
distinction is typically made by pharmaceutical companies engaged in the
discovery and development of therapeutics. In an estimate from 2011, 435 human genome products were identified as therapeutic drug targets of FDA-approved drugs.
"Established targets" are those for which there is a good
scientific understanding, supported by a lengthy publication history, of
both how the target functions in normal physiology and how it is
involved in human pathology. This does not imply that the mechanism of action of drugs that are thought to act through a particular established target is fully understood.
Rather, "established" relates directly to the amount of background
information available on a target, in particular functional information.
In general, "new targets" are all those targets that are not
"established targets" but which have been or are the subject of drug
discovery efforts. The majority of targets selected for drug discovery
efforts are proteins, such as G-protein-coupled receptors (GPCRs) and protein kinases.
Screening and design
The process of finding a new drug against a chosen target for a particular disease usually involves high-throughput screening
(HTS), wherein large libraries of chemicals are tested for their
ability to modify the target. For example, if the target is a novel GPCR, compounds will be screened for their ability to inhibit or stimulate that receptor (see antagonist and agonist): if the target is a protein kinase, the chemicals will be tested for their ability to inhibit that kinase.
Another important function of HTS is to show how selective the
compounds are for the chosen target, as one wants to find a molecule
which will interfere with only the chosen target, but not other, related
targets.
To this end, other screening runs will be made to see whether the
"hits" against the chosen target will interfere with other related
targets – this is the process of cross-screening. Cross-screening is important, because the more unrelated targets a compound hits, the more likely that off-target toxicity will occur with that compound once it reaches the clinic.
It is unlikely that a perfect drug candidate will emerge from
these early screening runs. One of the first steps is to screen for
compounds that are unlikely to be developed into drugs; for example
compounds that are hits in almost every assay, classified by medicinal
chemists as "pan-assay interference compounds", are removed at this stage, if they were not already removed from the chemical library. It is often observed that several compounds are found to have some degree of activity, and if these compounds share common chemical features, one or more pharmacophores can then be developed. At this point, medicinal chemists will attempt to use structure–activity relationships (SAR) to improve certain features of the lead compound:
This process will require several iterative screening runs, during
which, it is hoped, the properties of the new molecular entities will
improve, and allow the favoured compounds to go forward to in vitro and in vivo testing for activity in the disease model of choice.
Amongst the physicochemical properties associated with drug absorption include ionization (pKa), and solubility; permeability can be determined by PAMPA and Caco-2.
PAMPA is attractive as an early screen due to the low consumption of
drug and the low cost compared to tests such as Caco-2, gastrointestinal
tract (GIT) and Blood–brain barrier (BBB) with which there is a high correlation.
A range of parameters can be used to assess the quality of a compound, or a series of compounds, as proposed in the Lipinski's Rule of Five. Such parameters include calculated properties such as cLogP to estimate lipophilicity, molecular weight, polar surface area and measured properties, such as potency, in-vitro measurement of enzymatic clearance etc. Some descriptors such as ligand efficiency (LE) and lipophilic efficiency (LiPE) combine such parameters to assess druglikeness.
While HTS is a commonly used method for novel drug discovery, it
is not the only method. It is often possible to start from a molecule
which already has some of the desired properties. Such a molecule might
be extracted from a natural product or even be a drug on the market
which could be improved upon (so-called "me too" drugs). Other methods,
such as virtual high throughput screening,
where screening is done using computer-generated models and attempting
to "dock" virtual libraries to a target, are also often used.
Another important method for drug discovery is de novodrug design, in which a prediction is made of the sorts of chemicals that might (e.g.) fit into an active site of the target enzyme. For example, virtual screening and computer-aided drug design are often used to identify new chemical moieties that may interact with a target protein. Molecular modelling and molecular dynamics simulations can be used as a guide to improve the potency and properties of new drug leads.
There is also a paradigm shift in the drug discovery community to
shift away from HTS, which is expensive and may only cover limited chemical space, to the screening of smaller libraries (maximum a few thousand compounds). These include fragment-based lead discovery (FBDD) and protein-directed dynamic combinatorial chemistry. The ligands in these approaches are usually much smaller, and they bind to the target protein with weaker binding affinity than hits that are identified from HTS. Further modifications through organic synthesis into lead compounds are often required. Such modifications are often guided by protein X-ray crystallography of the protein-fragment complex.
The advantages of these approaches are that they allow more efficient
screening and the compound library, although small, typically covers a
large chemical space when compared to HTS.
Phenotypic screens have also provided new chemical starting points in drug discovery.
A variety of models have been used including yeast, zebrafish, worms,
immortalized cell lines, primary cell lines, patient-derived cell lines
and whole animal models. These screens are designed to find compounds
which reverse a disease phenotype such as death, protein aggregation,
mutant protein expression, or cell proliferation as examples in a more
holistic cell model or organism. Smaller screening sets are often used
for these screens, especially when the models are expensive or
time-consuming to run.
In many cases, the exact mechanism of action of hits from these screens
is unknown and may require extensive target deconvolution experiments
to ascertain.
Once a lead compound series has been established with sufficient
target potency and selectivity and favourable drug-like properties, one
or two compounds will then be proposed for drug development. The best of these is generally called the lead compound, while the other will be designated as the "backup". These important decisions are generally supported by computational modelling innovations.
Traditionally, many drugs and other chemicals with biological
activity have been discovered by studying chemicals that organisms
create to affect the activity of other organisms for survival.
Despite the rise of combinatorial chemistry as an integral part
of lead discovery process, natural products still play a major role as
starting material for drug discovery. A 2007 report found that of the 974 small molecule new chemical entities developed between 1981 and 2006, 63% were natural derived or semisynthetic derivatives of natural products.
For certain therapy areas, such as antimicrobials, antineoplastics,
antihypertensive and anti-inflammatory drugs, the numbers were higher. In many cases, these products have been used traditionally for many years.
Natural products may be useful as a source of novel chemical
structures for modern techniques of development of antibacterial
therapies.
Plant-derived
Many secondary metabolites
produced by plants have potential therapeutic medicinal properties.
These secondary metabolites contain, bind to, and modify the function of
proteins (receptors, enzymes, etc.). Consequently, plant derived natural products have often been used as the starting point for drug discovery.
History
Until the Renaissance, the vast majority of drugs in Western medicine were plant-derived extracts.
This has resulted in a pool of information about the potential of plant
species as important sources of starting materials for drug discovery. Botanical knowledge about different metabolites and hormones
that are produced in different anatomical parts of the plant (e.g.
roots, leaves, and flowers) are crucial for correctly identifying
bioactive and pharmacological plant properties.
Identifying new drugs and getting them approved for market has proved
to be a stringent process due to regulations set by national drug regulatory agencies.
Jasmonates
Chemical structure of methyl jasmonate (JA).
Jasmonates are important in responses to injury and intracellular signals. They induce apoptosis and protein cascade via proteinase inhibitor, have defense functions, and regulate plant responses to different biotic and abiotic stresses. Jasmonates also have the ability to directly act on mitochondrial membranes by inducing membrane depolarization via release of metabolites.
Jasmonate derivatives (JAD) are also important in wound response and tissue regeneration in plant cells. They have also been identified to have anti-aging effects on human epidermal layer. It is suspected that they interact with proteoglycans (PG) and glycosaminoglycan (GAG) polysaccharides, which are essential extracellular matrix (ECM) components to help remodel the ECM.
The discovery of JADs on skin repair has introduced newfound interest
in the effects of these plant hormones in therapeutic medicinal
application.
Salicylates
Chemical structure of acetylsalicylic acid, more commonly known as Aspirin.
Salicylic acid (SA), a phytohormone, was initially derived from willow bark and has since been identified in many species. It is an important player in plant immunity, although its role is still not fully understood by scientists.
They are involved in disease and immunity responses in plant and animal
tissues. They have salicylic acid binding proteins (SABPs) that have
shown to affect multiple animal tissues.
The first discovered medicinal properties of the isolated compound was
involved in pain and fever management. They also play an active role in
the suppression of cell proliferation. They have the ability to induce death in lymphoblasticleukemia and other human cancer cells. One of the most common drugs derived from salicylates is aspirin, also known as acetylsalicylic acid, with anti-inflammatory and anti-pyretic properties.
Microbial metabolites
Microbes
compete for living space and nutrients. To survive in these conditions,
many microbes have developed abilities to prevent competing species
from proliferating. Microbes are the main source of antimicrobial drugs.
Streptomyces isolates
have been such a valuable source of antibiotics, that they have been
called medicinal molds. The classic example of an antibiotic discovered
as a defense mechanism against another microbe is penicillin in bacterial cultures contaminated by Penicillium fungi in 1928.
Marine invertebrates
Marine environments are potential sources for new bioactive agents. Arabinosenucleosides
discovered from marine invertebrates in 1950s, demonstrated for the
first time that sugar moieties other than ribose and deoxyribose can
yield bioactive nucleoside structures. It took until 2004 when the first
marine-derived drug was approved. For example, the cone snail toxin ziconotide,
also known as Prialt treats severe neuropathic pain. Several other
marine-derived agents are now in clinical trials for indications such as
cancer, anti-inflammatory use and pain. One class of these agents are bryostatin-like compounds, under investigation as anti-cancer therapy.
Chemical diversity
As
above mentioned, combinatorial chemistry was a key technology enabling
the efficient generation of large screening libraries for the needs of
high-throughput screening. However, now, after two decades of
combinatorial chemistry, it has been pointed out that despite the
increased efficiency in chemical synthesis, no increase in lead or drug
candidates has been reached.
This has led to analysis of chemical characteristics of combinatorial
chemistry products, compared to existing drugs or natural products. The chemoinformatics concept chemical diversity, depicted as distribution of compounds in the chemical space
based on their physicochemical characteristics, is often used to
describe the difference between the combinatorial chemistry libraries
and natural products. The synthetic, combinatorial library compounds
seem to cover only a limited and quite uniform chemical space, whereas
existing drugs and particularly natural products, exhibit much greater
chemical diversity, distributing more evenly to the chemical space.
The most prominent differences between natural products and compounds
in combinatorial chemistry libraries is the number of chiral centers
(much higher in natural compounds), structure rigidity (higher in
natural compounds) and number of aromatic moieties (higher in
combinatorial chemistry libraries). Other chemical differences between
these two groups include the nature of heteroatoms (O and N enriched in
natural products, and S and halogen atoms more often present in
synthetic compounds), as well as level of non-aromatic unsaturation
(higher in natural products). As both structure rigidity and chirality are well-established factors in medicinal chemistry
known to enhance compounds specificity and efficacy as a drug, it has
been suggested that natural products compare favourably to today's
combinatorial chemistry libraries as potential lead molecules.
Screening
Two main approaches exist for the finding of new bioactive chemical entities from natural sources.
The first is sometimes referred to as random collection and
screening of material, but the collection is far from random.
Biological (often botanical) knowledge is often used to identify
families that show promise. This approach is effective because only a
small part of the earth's biodiversity has ever been tested for
pharmaceutical activity. Also, organisms living in a species-rich
environment need to evolve defensive and competitive mechanisms to
survive. Those mechanisms might be exploited in the development of
beneficial drugs.
A collection of plant, animal and microbial samples from rich
ecosystems can potentially give rise to novel biological activities
worth exploiting in the drug development process. One example of
successful use of this strategy is the screening for antitumor agents by
the National Cancer Institute, which started in the 1960s. Paclitaxel was identified from Pacific yew tree Taxus brevifolia.
Paclitaxel showed anti-tumour activity by a previously undescribed
mechanism (stabilization of microtubules) and is now approved for
clinical use for the treatment of lung, breast, and ovarian cancer, as
well as for Kaposi's sarcoma. Early in the 21st century, Cabazitaxel (made by Sanofi, a French firm), another relative of taxol has been shown effective against prostate cancer,
also because it works by preventing the formation of microtubules,
which pull the chromosomes apart in dividing cells (such as cancer
cells). Other examples are: 1. Camptotheca (Camptothecin · Topotecan · Irinotecan · Rubitecan · Belotecan); 2. Podophyllum (Etoposide · Teniposide); 3a. Anthracyclines (Aclarubicin · Daunorubicin · Doxorubicin · Epirubicin · Idarubicin · Amrubicin · Pirarubicin · Valrubicin · Zorubicin); 3b. Anthracenediones (Mitoxantrone · Pixantrone).
The second main approach involves ethnobotany, the study of the general use of plants in society, and ethnopharmacology, an area inside ethnobotany, which is focused specifically on medicinal uses.
The
elucidation of the chemical structure is critical to avoid the
re-discovery of a chemical agent that is already known for its structure
and chemical activity. Mass spectrometry
is a method in which individual compounds are identified based on their
mass/charge ratio, after ionization. Chemical compounds exist in nature
as mixtures, so the combination of liquid chromatography and mass
spectrometry (LC-MS) is often used to separate the individual chemicals.
Databases of mass spectras for known compounds are available and can be
used to assign a structure to an unknown mass spectrum. Nuclear
magnetic resonance spectroscopy is the primary technique for determining
chemical structures of natural products. NMR yields information about
individual hydrogen and carbon atoms in the structure, allowing detailed
reconstruction of the molecule's architecture.
New Drug Application
When
a drug is developed with evidence throughout its history of research to
show it is safe and effective for the intended use in the United
States, the company can file an application – the New Drug Application (NDA) – to have the drug commercialized and available for clinical application.
NDA status enables the FDA to examine all submitted data on the drug to
reach a decision on whether to approve or not approve the drug
candidate based on its safety, specificity of effect, and efficacy of
doses.
Drug design, often referred to as rational drug design or simply rational design, is the inventive process of finding new medications based on the knowledge of a biological target. The drug is most commonly an organicsmall molecule that activates or inhibits the function of a biomolecule such as a protein, which in turn results in a therapeutic benefit to the patient. In the most basic sense, drug design involves the design of molecules that are complementary in shape and charge
to the biomolecular target with which they interact and therefore will
bind to it. Drug design frequently but not necessarily relies on computer modeling techniques. This type of modeling is sometimes referred to as computer-aided drug design. Finally, drug design that relies on the knowledge of the three-dimensional structure of the biomolecular target is known as structure-based drug design. In addition to small molecules, biopharmaceuticals including peptides and especially therapeutic antibodies
are an increasingly important class of drugs and computational methods
for improving the affinity, selectivity, and stability of these
protein-based therapeutics have also been developed.
The phrase "drug design" is to some extent a misnomer. A more accurate term is ligand design (i.e., design of a molecule that will bind tightly to its target).
Although design techniques for prediction of binding affinity are
reasonably successful, there are many other properties, such as bioavailability, metabolic half-life, side effects,
etc., that first must be optimized before a ligand can become a safe
and efficacious drug. These other characteristics are often difficult to
predict with rational design techniques. Nevertheless, due to high
attrition rates, especially during clinical phases of drug development, more attention is being focused early in the drug design process on selecting candidate drugs whose physicochemical
properties are predicted to result in fewer complications during
development and hence more likely to lead to an approved, marketed drug. Furthermore, in vitro experiments complemented with computation methods are increasingly used in early drug discovery to select compounds with more favorable ADME (absorption, distribution, metabolism, and excretion) and toxicological profiles.
Drug targets
A biomolecular target (most commonly a protein or a nucleic acid) is a key molecule involved in a particular metabolic or signaling pathway that is associated with a specific disease condition or pathology or to the infectivity or survival of a microbialpathogen. Potential drug targets are not necessarily disease causing but must by definition be disease modifying. In some cases, small molecules
will be designed to enhance or inhibit the target function in the
specific disease modifying pathway. Small molecules (for example
receptor agonists, antagonists, inverse agonists, or modulators; enzyme activators or inhibitors; or ion channel openers or blockers) will be designed that are complementary to the binding site of target. Small molecules (drugs) can be designed so as not to affect any other important "off-target" molecules (often referred to as antitargets) since drug interactions with off-target molecules may lead to undesirable side effects. Due to similarities in binding sites, closely related targets identified through sequence homology have the highest chance of cross reactivity and hence highest side effect potential.
Most commonly, drugs are organicsmall molecules produced through chemical synthesis, but biopolymer-based drugs produced through biological processes are becoming increasingly more common. In addition, mRNA-based gene silencing technologies may have therapeutic applications.
Rational drug discovery
In contrast to traditional methods of drug discovery (known as forward pharmacology), which rely on trial-and-error testing of chemical substances on cultured cells or animals, and matching the apparent effects to treatments, rational drug design (also called reverse pharmacology)
begins with a hypothesis that modulation of a specific biological
target may have therapeutic value. In order for a biomolecule to be
selected as a drug target, two essential pieces of information are
required. The first is evidence that modulation of the target will be
disease modifying. This knowledge may come from, for example, disease
linkage studies that show an association between mutations in the
biological target and certain disease states. The second is that the target is "druggable". This means that it is capable of binding to a small molecule and that its activity can be modulated by the small molecule.
Once a suitable target has been identified, the target is normally cloned and produced and purified. The purified protein is then used to establish a screening assay. In addition, the three-dimensional structure of the target may be determined.
The search for small molecules that bind to the target is begun
by screening libraries of potential drug compounds. This may be done by
using the screening assay (a "wet screen"). In addition, if the
structure of the target is available, a virtual screen may be performed of candidate drugs. Ideally the candidate drug compounds should be "drug-like", that is they should possess properties that are predicted to lead to oral bioavailability, adequate chemical and metabolic stability, and minimal toxic effects. Several methods are available to estimate druglikeness such as Lipinski's Rule of Five and a range of scoring methods such as lipophilic efficiency. Several methods for predicting drug metabolism have also been proposed in the scientific literature.
Due to the large number of drug properties that must be simultaneously optimized during the design process, multi-objective optimization techniques are sometimes employed.
Finally because of the limitations in the current methods for
prediction of activity, drug design is still very much reliant on serendipity and bounded rationality.
Computer-aided drug design
The most fundamental goal in drug design is to predict whether a given molecule will bind to a target and if so how strongly. Molecular mechanics or molecular dynamics is most often used to estimate the strength of the intermolecular interaction between the small molecule and its biological target. These methods are also used to predict the conformation of the small molecule and to model conformational changes in the target that may occur when the small molecule binds to it. Semi-empirical, ab initio quantum chemistry methods, or density functional theory
are often used to provide optimized parameters for the molecular
mechanics calculations and also provide an estimate of the electronic
properties (electrostatic potential, polarizability, etc.) of the drug candidate that will influence binding affinity.
Molecular mechanics methods may also be used to provide
semi-quantitative prediction of the binding affinity. Also,
knowledge-based scoring function may be used to provide binding affinity estimates. These methods use linear regression, machine learning, neural nets
or other statistical techniques to derive predictive binding affinity
equations by fitting experimental affinities to computationally derived
interaction energies between the small molecule and the target.
Ideally, the computational method will be able to predict
affinity before a compound is synthesized and hence in theory only one
compound needs to be synthesized, saving enormous time and cost. The
reality is that present computational methods are imperfect and provide,
at best, only qualitatively accurate estimates of affinity. In practice
it still takes several iterations of design, synthesis, and testing
before an optimal drug is discovered. Computational methods have
accelerated discovery by reducing the number of iterations required and
have often provided novel structures.
Drug design with the help of computers may be used at any of the following stages of drug discovery:
hit identification using virtual screening (structure- or ligand-based design)
hit-to-lead optimization of affinity and selectivity (structure-based design, QSAR, etc.)
lead optimization of other pharmaceutical properties while maintaining affinity
Flowchart of a Usual Clustering Analysis for Structure-Based Drug Design
In order to overcome the insufficient prediction of binding affinity
calculated by recent scoring functions, the protein-ligand interaction
and compound 3D structure information are used for analysis. For
structure-based drug design, several post-screening analyses focusing on
protein-ligand interaction have been developed for improving enrichment
and effectively mining potential candidates:
Consensus scoring
Selecting candidates by voting of multiple scoring functions
May lose the relationship between protein-ligand structural information and scoring criterion
Cluster analysis
Represent and cluster candidates according to protein-ligand 3D information
Needs meaningful representation of protein-ligand interactions.
Types
Drug discovery cycle highlighting both ligand-based (indirect) and structure-based (direct) drug design strategies.
There are two major types of drug design. The first is referred to as ligand-based drug design and the second, structure-based drug design.
Ligand-based
Ligand-based drug design (or indirect drug design)
relies on knowledge of other molecules that bind to the biological
target of interest. These other molecules may be used to derive a pharmacophore model that defines the minimum necessary structural characteristics a molecule must possess in order to bind to the target.
In other words, a model of the biological target may be built based on
the knowledge of what binds to it, and this model in turn may be used to
design new molecular entities that interact with the target.
Alternatively, a quantitative structure-activity relationship (QSAR), in which a correlation between calculated properties of molecules and their experimentally determined biological activity, may be derived. These QSAR relationships in turn may be used to predict the activity of new analogs.
Structure-based
Structure-based drug design (or direct drug design) relies on knowledge of the three dimensional structure of the biological target obtained through methods such as x-ray crystallography or NMR spectroscopy. If an experimental structure of a target is not available, it may be possible to create a homology model
of the target based on the experimental structure of a related protein.
Using the structure of the biological target, candidate drugs that are
predicted to bind with high affinity and selectivity to the target may be designed using interactive graphics and the intuition of a medicinal chemist. Alternatively various automated computational procedures may be used to suggest new drug candidates.
Current methods for structure-based drug design can be divided roughly into three main categories.
The first method is identification of new ligands for a given receptor
by searching large databases of 3D structures of small molecules to find
those fitting the binding pocket of the receptor using fast approximate
docking programs. This method is known as virtual screening.
A second category is de novo design of new ligands. In this method,
ligand molecules are built up within the constraints of the binding
pocket by assembling small pieces in a stepwise manner. These pieces can
be either individual atoms or molecular fragments. The key advantage of
such a method is that novel structures, not contained in any database,
can be suggested. A third method is the optimization of known ligands by evaluating proposed analogs within the binding cavity.
Binding site identification
Binding site identification is the first step in structure based design. If the structure of the target or a sufficiently similar homolog
is determined in the presence of a bound ligand, then the ligand should
be observable in the structure in which case location of the binding
site is trivial. However, there may be unoccupied allosteric binding sites that may be of interest. Furthermore, it may be that only apoprotein
(protein without ligand) structures are available and the reliable
identification of unoccupied sites that have the potential to bind
ligands with high affinity is non-trivial. In brief, binding site
identification usually relies on identification of concave surfaces on the protein that can accommodate drug sized molecules that also possess appropriate "hot spots" (hydrophobic surfaces, hydrogen bonding sites, etc.) that drive ligand binding.
Scoring functions
Structure-based drug design attempts to use the structure of proteins
as a basis for designing new ligands by applying the principles of molecular recognition. Selective high affinity binding to the target is generally desirable since it leads to more efficacious
drugs with fewer side effects. Thus, one of the most important
principles for designing or obtaining potential new ligands is to
predict the binding affinity of a certain ligand to its target (and
known antitargets) and use the predicted affinity as a criterion for selection.
One early general-purposed empirical scoring function to describe
the binding energy of ligands to receptors was developed by Böhm. This empirical scoring function took the form:
where:
ΔG0 – empirically derived offset that in part
corresponds to the overall loss of translational and rotational entropy
of the ligand upon binding.
ΔGhb – contribution from hydrogen bonding
ΔGionic – contribution from ionic interactions
ΔGlip – contribution from lipophilic interactions where |Alipo| is surface area of lipophilic contact between the ligand and receptor
ΔGrot – entropy penalty due to freezing a rotatable in the ligand bond upon binding
A more general thermodynamic "master" equation is as follows:
where:
desolvation – enthalpic penalty for removing the ligand from solvent
motion – entropic penalty for reducing the degrees of freedom when a ligand binds to its receptor
configuration – conformational strain energy required to put the ligand in its "active" conformation
interaction – enthalpic gain for "resolvating" the ligand with its receptor
The basic idea is that the overall binding free energy can be
decomposed into independent components that are known to be important
for the binding process. Each component reflects a certain kind of free
energy alteration during the binding process between a ligand and its
target receptor. The Master Equation is the linear combination of these
components. According to Gibbs free energy equation, the relation
between dissociation equilibrium constant, Kd, and the components of free energy was built.
Various computational methods are used to estimate each of the
components of the master equation. For example, the change in polar
surface area upon ligand binding can be used to estimate the desolvation
energy. The number of rotatable bonds frozen upon ligand binding is
proportional to the motion term. The configurational or strain energy
can be estimated using molecular mechanics
calculations. Finally the interaction energy can be estimated using
methods such as the change in non polar surface, statistically derived potentials of mean force,
the number of hydrogen bonds formed, etc. In practice, the components
of the master equation are fit to experimental data using multiple
linear regression. This can be done with a diverse training set
including many types of ligands and receptors to produce a less accurate
but more general "global" model or a more restricted set of ligands and
receptors to produce a more accurate but less general "local" model.
Examples
A
particular example of rational drug design involves the use of
three-dimensional information about biomolecules obtained from such
techniques as X-ray crystallography and NMR spectroscopy. Computer-aided
drug design in particular becomes much more tractable when there is a
high-resolution structure of a target protein bound to a potent ligand.
This approach to drug discovery is sometimes referred to as
structure-based drug design. The first unequivocal example of the
application of structure-based drug design leading to an approved drug is the carbonic anhydrase inhibitor dorzolamide, which was approved in 1995.
It has been argued that the highly rigid and focused nature of rational drug design suppresses serendipity in drug discovery.
Because many of the most significant medical discoveries have been
inadvertent, the recent focus on rational drug design may limit the
progress of drug discovery. Furthermore, the rational design of a drug
may be limited by a crude or incomplete understanding of the underlying
molecular processes of the disease it is intended to treat.