Virtual screening (VS) is a computational technique used in drug discovery to search libraries of small molecules in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or enzyme.
Virtual screening has been defined as the "automatically evaluating very large libraries of compounds" using computer programs.[4] As this definition suggests, VS has largely been a numbers game focusing on how the enormous chemical space of over 1060 conceivable compounds[5] can be filtered to a manageable number that can be synthesized, purchased, and tested. Although searching the entire chemical universe may be a theoretically interesting problem, more practical VS scenarios focus on designing and optimizing targeted combinatorial libraries and enriching libraries of available compounds from in-house compound repositories or vendor offerings. As the accuracy of the method has increased, virtual screening has become an integral part of the drug discovery process.[6][1] Virtual Screening can be used to select in house database compounds for screening, choose compounds that can be purchased externally, and to choose which compound should be synthesized next.
Methods
There are two broad categories of screening techniques: ligand-based and structure-based.[7] The remainder of this page will reflect Figure 1 Flow Chart of Virtual Screening.Ligand-based
Given a set of structurally diverse ligands that binds to a receptor, a model of the receptor can be built by exploiting the collective information contained in such set of ligands. These are known as pharmacophore models. A candidate ligand can then be compared to the pharmacophore model to determine whether it is compatible with it and therefore likely to bind.[8]Another approach to ligand-based virtual screening is to use 2D chemical similarity analysis methods[9] to scan a database of molecules against one or more active ligand structure.
A popular approach to ligand-based virtual screening is based on searching molecules with shape similar to that of known actives, as such molecules will fit the target's binding site and hence will be likely to bind the target. There are a number of prospective applications of this class of techniques in the literature.[10][11] Pharmacophoric extensions of these 3D methods are also freely-available as webservers.[12][13]
Structure-based
Structure-based virtual screening involves docking of candidate ligands into a protein target followed by applying a scoring function to estimate the likelihood that the ligand will bind to the protein with high affinity.[14][15][16] Webservers oriented to prospective virtual screening are available to all.[17][18]Computing Infrastructure
The computation of pair-wise interactions between atoms, which is a prerequisite for the operation of many virtual screening programs, is of computational complexity, where N is the number of atoms in the system. Because of the quadratic scaling with respect to the number of atoms, the computing infrastructure may vary from a laptop computer for a ligand-based method to a mainframe for a structure-based method.Ligand-based
Ligand-based methods typically require a fraction of a second for a single structure comparison operation. A single CPU is enough to perform a large screening within hours. However, several comparisons can be made in parallel in order to expedite the processing of a large database of compounds.Structure-based
The size of the task requires a parallel computing infrastructure, such as a cluster of Linux systems, running a batch queue processor to handle the work, such as Sun Grid Engine or Torque PBS.A means of handling the input from large compound libraries is needed. This requires a form of compound database that can be queried by the parallel cluster, delivering compounds in parallel to the various compute nodes. Commercial database engines may be too ponderous, and a high speed indexing engine, such as Berkeley DB, may be a better choice. Furthermore, it may not be efficient to run one comparison per job, because the ramp up time of the cluster nodes could easily outstrip the amount of useful work. To work around this, it is necessary to process batches of compounds in each cluster job, aggregating the results into some kind of log file. A secondary process, to mine the log files and extract high scoring candidates, can then be run after the whole experiment has been run.
Accuracy
The aim of virtual screening is to identify molecules of novel chemical structure that bind to the macromolecular target of interest. Thus, success of a virtual screen is defined in terms of finding interesting new scaffolds rather than the total number of hits. Interpretations of virtual screening accuracy should therefore be considered with caution. Low hit rates of interesting scaffolds are clearly preferable over high hit rates of already known scaffolds.Most tests of virtual screening studies in the literature are retrospective. In these studies, the performance of a VS technique is measured by its ability to retrieve a small set of previously known molecules with affinity to the target of interest (active molecules or just actives) from a library containing a much higher proportion of assumed inactives or decoys. By contrast, in prospective applications of virtual screening, the resulting hits are subjected to experimental confirmation (e.g., IC50 measurements). There is consensus that retrospective benchmarks are not good predictors of prospective performance and consequently only prospective studies constitute conclusive proof of the suitability of a technique for a particular target.[19][20][21][22]
Application to drug discovery
Virtual screening is a very useful application when it comes to identifying hit molecules as a beginning for medicinal chemistry. As the virtual screening approach begins to become a more vital and substantial technique within the medicinal chemistry industry the approach has had an expeditious increase.[23]Ligand-based methods
While not knowing the structure trying to predict how the ligands will bind to the receptor. With the use of pharmacophore features each ligand identified donor, and acceptors. Equating features are overlaid, however given it is unlikely there is a single correct solution.[1]Pharmacophore models
This technique is used when merging the results of searches by using unlike reference compounds, same descriptors and coefficient, but different active compounds. This technique is beneficial because it is more efficient than just using a single reference structure along with the most accurate performance when it comes to diverse actives.[1]Pharmacophore is an ensemble of steric and electronic features that are needed to have an optimal supramolecular interaction or interactions witha biological target structure in order to precipitate it's biological response. Choose a representative as a set of actives, most methods will look for similar bindings. It is preferred to have multiple rigid molecules and the ligands should be diversified, in other words ensure to have different features that don't occur during the binding phase.[1]