Search This Blog

Tuesday, November 6, 2018

History of genetic engineering

From Wikipedia, the free encyclopedia
 
Herbert Boyer (pictured) and Stanley Cohen created the first genetically modified organism in 1972

Genetic recombination caused by human activity has been occurring since around 12,000 BC, when humans first began to domesticate organisms. Genetic engineering as the direct transfer of DNA from one organism to another was first accomplished by Herbert Boyer and Stanley Cohen in 1972. It was the result of a series of advancements in techniques that allowed the direct modification of the genome. Important advances included the discovery of restriction enzymes and DNA ligases, the ability to design plasmids and technologies like polymerase chain reaction and sequencing. Transformation of the DNA into a host organism was accomplished with the invention of biolistics, Agrobacterium-mediated recombination and microinjection.

The first genetically modified animal was a mouse created in 1974 by Rudolf Jaenisch. In 1976 the technology was commercialised, with the advent of genetically modified bacteria that produced somatostatin, followed by insulin in 1978. In 1983 an antibiotic resistant gene was inserted into tobacco, leading to the first genetically engineered plant. Advances followed that allowed scientists to manipulate and add genes to a variety of different organisms and induce a range of different effects. Plants were first commercialized with virus resistant tobacco released in China in 1992. The first genetically modified food was the Flavr Savr tomato marketed in 1994. By 2010, 29 countries had planted commercialized biotech crops. In 2000 a paper published in Science introduced golden rice, the first food developed with increased nutrient value.

Agriculture

DNA studies suggested that the dog most likely arose from a common ancestor with the grey wolf.
 
Genetic engineering is the direct manipulation of an organism's genome using certain biotechnology techniques that have only existed since the 1970s. Human directed genetic manipulation was occurring much earlier, beginning with the domestication of plants and animals through artificial selection. The dog is believed to be the first animal domesticated, possibly arising from a common ancestor of the grey wolf, with archeological evidence dating to about 12,000 BC. Other carnivores domesticated in prehistoric times include the cat, which cohabited with human 9,500 years ago. Archeological evidence suggests sheep, cattle, pigs and goats were domesticated between 9 000 BC and 8 000 BC in the Fertile Crescent.

The first evidence of plant domestication comes from emmer and einkorn wheat found in pre-Pottery Neolithic A villages in Southwest Asia dated about 10,500 to 10,100 BC. The Fertile Crescent of Western Asia, Egypt, and India were sites of the earliest planned sowing and harvesting of plants that had previously been gathered in the wild. Independent development of agriculture occurred in northern and southern China, Africa's Sahel, New Guinea and several regions of the Americas. The eight Neolithic founder crops (emmer wheat, einkorn wheat, barley, peas, lentils, bitter vetch, chick peas and flax) had all appeared by about 7000 BC. Horticulture first appears in the Levant during the Chalcolithic period about 6 800 to 6,300 BC. Due to the soft tissues, archeological evidence for early vegetables is scarce. The earliest vegetable remains have been found in Egyptian caves that date back to the 2nd millennium BC.

Selective breeding of domesticated plants was once the main way early farmers shaped organisms to suit their needs. Charles Darwin described three types of selection: methodical selection, wherein humans deliberately select for particular characteristics; unconscious selection, wherein a characteristic is selected simply because it is desirable; and natural selection, wherein a trait that helps an organism survive better is passed on. Early breeding relied on unconscious and natural selection. The introduction of methodical selection is unknown. Common characteristics that were bred into domesticated plants include grains that did not shatter to allow easier harvesting, uniform ripening, shorter lifespans that translate to faster growing, loss of toxic compounds, and productivity. Some plants, like the Banana, were able to be propagated by vegetative cloning. Offspring often did not contain seeds, and therefore sterile. However, these offspring were usually juicier and larger. Propagation through cloning allows these mutant varieties to be cultivated despite their lack of seeds.

Hybridization was another way that rapid changes in plant's makeup were introduced. It often increased vigor in plants, and combined desirable traits together. Hybridization most likely first occurred when humans first grew similar, yet slightly different plants in close proximity. Triticum aestivum, wheat used in baking bread, is an allopolyploid. Its creation is the result of two separate hybridization events.

Grafting can transfer chloroplasts (specialised DNA in plants that can conduct photosynthesis), mitichondrial DNA and the entire cell nucleus containing the genome to potentially make a new species making grafting a form of natural genetic engineering.

X-rays were first used to deliberately mutate plants in 1927. Between 1927 and 2007, more than 2,540 genetically mutated plant varieties had been produced using x-rays.

Genetics

Griffith proved the existence of a "transforming principle", which Avery, MacLeod and McCarty later showed to be DNA
 
The bacterium Agrobacterium tumefaciens inserts T-DNA into infected plant cells, which is then incorporated into the plants genome.
 
Various genetic discoveries have been essential in the development of genetic engineering. Genetic inheritance was first discovered by Gregor Mendel in 1865 following experiments crossing peas. Although largely ignored for 34 years he provided the first evidence of hereditary segregation and independent assortment. In 1889 Hugo de Vries came up with the name "(pan)gene" after postulating that particles are responsible for inheritance of characteristics and the term "genetics" was coined by William Bateson in 1905. In 1928 Frederick Griffith proved the existence of a "transforming principle" involved in inheritance, which Avery, MacLeod and McCarty later (1944) identified as DNA. Edward Lawrie Tatum and George Wells Beadle developed the central dogma that genes code for proteins in 1941. The double helix structure of DNA was identified by James Watson and Francis Crick in 1953.

As well as discovering how DNA works, tools had to be developed that allowed it to be manipulated. In 1970 Hamilton Smiths lab discovered restriction enzymes that allowed DNA to be cut at specific places and separated out on an electrophoresis gel. This enabled scientists to isolate genes from an organism's genome. DNA ligases, that join broken DNA together, had been discovered earlier in 1967 and by combining the two enzymes it was possible to "cut and paste" DNA sequences to create recombinant DNA. Plasmids, discovered in 1952, became important tools for transferring information between cells and replicating DNA sequences. Frederick Sanger developed a method for sequencing DNA in 1977, greatly increasing the genetic information available to researchers.  Polymerase chain reaction (PCR), developed by Kary Mullis in 1983, allowed small sections of DNA to be amplified and aided identification and isolation of genetic material.

As well as manipulating the DNA, techniques had to be developed for its insertion (known as transformation) into an organism's genome. Griffiths experiment had already shown that some bacteria had the ability to naturally take up and express foreign DNA. Artificial competence was induced in Escherichia coli in 1970 when Morton Mandel and Akiko Higa showed that it could take up bacteriophage λ after treatment with calcium chloride solution (CaCl2). Two years later, Stanley Cohen showed that CaCl2 treatment was also effective for uptake of plasmid DNA. Transformation using electroporation was developed in the late 1980s, increasing the efficiency and bacterial range. In 1907 a bacterium that caused plant tumors, Agrobacterium tumefaciens, was discovered and in the early 1970s the tumor inducing agent was found to be a DNA plasmid called the Ti plasmid. By removing the genes in the plasmid that caused the tumor and adding in novel genes researchers were able to infect plants with A. tumefaciens and let the bacteria insert their chosen DNA into the genomes of the plants.

Early genetically modified organisms

Paul Berg created the first recombinant DNA molecules in 1972.

In 1972 Paul Berg used restriction enzymes and DNA ligases to create the first recombinant DNA molecules. He combined DNA from the monkey virus SV40 with that of the lambda virus. Herbert Boyer and Stanley Norman Cohen took Berg's work a step further and introduced recombinant DNA into a bacterial cell. Cohen was researching plasmids, while Boyers work involved restriction enzymes. They recognised the complementary nature of their work and teamed up in 1972. Together they found a restriction enzyme that cut the pSC101 plasmid at a single point and were able to insert and ligate a gene that conferred resistance to the kanamycin antibiotic into the gap. Cohen had previously devised a method where bacteria could be induced to take up a plasmid and using this they were able to create a bacteria that survived in the presence of the kanamycin. This represented the first genetically modified organism. They repeated experiments showing that other genes could be expressed in bacteria, including one from the toad Xenopus laevis, the first cross kingdom transformation.

In 1974 Rudolf Jaenisch created the first GM animal.

In 1974 Rudolf Jaenisch created a transgenic mouse by introducing foreign DNA into its embryo, making it the world’s first transgenic animal. Jaenisch was studying mammalian cells infected with simian virus 40 (SV40) when he happened to read a paper from Beatrice Mintz describing the generation of chimera mice. He took his SV40 samples to Mintz's lab and injected them into early mouse embryos expecting tumours to develop. The mice appeared normal, but after using radioactive probes he discovered that the virus had integrated itself into the mice genome. However the mice did not pass the transgene to their offspring. In 1981 the laboratories of Frank Ruddle, Frank Constantini and Elizabeth Lacy injected purified DNA into a single-cell mouse embryo and showed transmission of the genetic material to subsequent generations.

The first genetically engineered plant was tobacco, reported in 1983. It was developed by Michael W. Bevan, Richard B. Flavell and Mary-Dell Chilton by creating a chimeric gene that joined an antibiotic resistant gene to the T1 plasmid from Agrobacterium. The tobacco was infected with Agrobacterium transformed with this plasmid resulting in the chimeric gene being inserted into the plant. Through tissue culture techniques a single tobacco cell was selected that contained the gene and a new plant grown from it.

Regulation

The development of genetic engineering technology led to concerns in the scientific community about potential risks. The development of a regulatory framework concerning genetic engineering began in 1975, at Asilomar, California. The Asilomar meeting recommended a set of guidelines regarding the cautious use of recombinant technology and any products resulting from that technology. The Asilomar recommendations were voluntary, but in 1976 the US National Institute of Health (NIH) formed a recombinant DNA advisory committee. This was followed by other regulatory offices (the United States Department of Agriculture (USDA), Environmental Protection Agency (EPA) and Food and Drug Administration (FDA), effectively making all recombinant DNA research tightly regulated in the USA.

In 1982 the Organization for Economic Co-operation and Development (OECD) released a report into the potential hazards of releasing genetically modified organisms into the environment as the first transgenic plants were being developed. As the technology improved and genetically organisms moved from model organisms to potential commercial products the USA established a committee at the Office of Science and Technology (OSTP) to develop mechanisms to regulate the developing technology. In 1986 the OSTP assigned regulatory approval of genetically modified plants in the US to the USDA, FDA and EPA. In the late 1980s and early 1990s, guidance on assessing the safety of genetically engineered plants and food emerged from organizations including the FAO and WHO.

The European Union first introduced laws requiring GMO's to be labelled in 1997. In 2013 Connecticut became the first state to enact a labeling law in the USA, although it would not take effect until other states followed suit.

Research and medicine

A laboratory mouse in which a gene affecting hair growth has been knocked out (left), is shown next to a normal lab mouse.

The ability to insert, alter or remove genes in model organisms allowed scientists to study the genetic elements of human diseases. Genetically modified mice were created in 1984 that carried cloned oncogenes that predisposed them to developing cancer. The technology has also been used to generate mice with genes knocked out. The first recorded knockout mouse was created by Mario R. Capecchi, Martin Evans and Oliver Smithies in 1989. In 1992 oncomice with tumor suppressor genes knocked out were generated. Creating Knockout rats is much harder and only became possible in 2003.

After the discovery of microRNA in 1993, RNA interference (RNAi) has been used to silence an organism's genes. By modifying an organism to express microRNA targeted to its endogenous genes, researchers have been able to knockout or partially reduce gene function in a range of species. The ability to partially reduce gene function has allowed the study of genes that are lethal when completely knocked out. Other advantages of using RNAi include the availability of inducible and tissue specific knockout. In 2007 microRNA targeted to insect and nematode genes was expressed in plants, leading to suppression when they fed on the transgenic plant, potentially creating a new way to control pests. Targeting endogenous microRNA expression has allowed further fine tuning of gene expression, supplementing the more traditional gene knock out approach.

Genetic engineering has been used to produce proteins derived from humans and other sources in organisms that normally cannot synthesize these proteins. Human insulin-synthesising bacteria were developed in 1979 and were first used as a treatment in 1982. In 1988 the first human antibodies were produced in plants. In 2000 Vitamin A-enriched golden rice, was the first food with increased nutrient value.

Further advances

As not all plant cells were susceptible to infection by A. tumefaciens other methods were developed, including electroporation, micro-injection and particle bombardment with a gene gun (invented in 1987). In the 1980s techniques were developed to introduce isolated chloroplasts back into a plant cell that had its cell wall removed. With the introduction of the gene gun in 1987 it became possible to integrate foreign genes into a chloroplast.

Genetic transformation has become very efficient in some model organisms. In 2008 genetically modified seeds were produced in Arabidopsis thaliana by simply dipping the flowers in an Agrobacterium solution. The range of plants that can be transformed has increased as tissue culture techniques have been developed for different species.

The first transgenic livestock were produced in 1985, by micro-injecting foreign DNA into rabbit, sheep and pig eggs. The first animal to synthesise transgenic proteins in their milk were mice, engineered to produce human tissue plasminogen activator. This technology was applied to sheep, pigs, cows and other livestock.

In 2010 scientists at the J. Craig Venter Institute announced that they had created the first synthetic bacterial genome. The researchers added the new genome to bacterial cells and selected for cells that contained the new genome. To do this the cells undergoes a process called resolution, where during bacterial cell division one new cell receives the original DNA genome of the bacteria, whilst the other receives the new synthetic genome. When this cell replicates it uses the synthetic genome as its template. The resulting bacterium the researchers developed, named Synthia, was the world's first synthetic life form.

In 2014 a bacteria was developed that replicated a plasmid containing an unnatural base pair. This required altering the bacterium so it could import the unnatural nucleotides and then efficiently replicate them. The plasmid retained the unnatural base pairs when it doubled an estimated 99.4% of the time. This is the first organism engineered to use an expanded genetic alphabet.

In 2015 CRISPR and TALENs was used to modify plant genomes. Chinese labs used it to create a fungus-resistant wheat and boost rice yields, while a U.K. group used it to tweak a barley gene that could help produce drought-resistant varieties. When used to precisely remove material from DNA without adding genes from other species, the result is not subject the lengthy and expensive regulatory process associated with GMOs. While CRISPR may use foreign DNA to aid the editing process, the second generation of edited plants contain none of that DNA. Researchers celebrated the acceleration because it may allow them to "keep up" with rapidly evolving pathogens. The U.S. Department of Agriculture stated that some examples of gene-edited corn, potatoes and soybeans are not subject to existing regulations. As of 2016 other review bodies had yet to make statements.

Commercialisation

In 1976 Genentech, the first genetic engineering company was founded by Herbert Boyer and Robert Swanson and a year later the company produced a human protein (somatostatin) in E.coli. Genentech announced the production of genetically engineered human insulin in 1978. In 1980 the U.S. Supreme Court in the Diamond v. Chakrabarty case ruled that genetically altered life could be patented. The insulin produced by bacteria, branded humulin, was approved for release by the Food and Drug Administration in 1982.

In 1983 a biotech company, Advanced Genetic Sciences (AGS) applied for U.S. government authorization to perform field tests with the ice-minus strain of P. syringae to protect crops from frost, but environmental groups and protestors delayed the field tests for four years with legal challenges. In 1987 the ice-minus strain of P. syringae became the first genetically modified organism (GMO) to be released into the environment when a strawberry field and a potato field in California were sprayed with it. Both test fields were attacked by activist groups the night before the tests occurred: "The world's first trial site attracted the world's first field trasher".

The first genetically modified crop plant was produced in 1982, an antibiotic-resistant tobacco plant. The first field trials of genetically engineered plants occurred in France and the USA in 1986, tobacco plants were engineered to be resistant to herbicides. In 1987 Plant Genetic Systems, founded by Marc Van Montagu and Jeff Schell, was the first company to genetically engineer insect-resistant plants by incorporating genes that produced insecticidal proteins from Bacillus thuringiensis (Bt) into tobacco.

Genetically modified microbial enzymes were the first application of genetically modified organisms in food production and were approved in 1988 by the US Food and Drug Administration. In the early 1990s, recombinant chymosin was approved for use in several countries. Cheese had typically been made using the enzyme complex rennet that had been extracted from cows' stomach lining. Scientists modified bacteria to produce chymosin, which was also able to clot milk, resulting in cheese curds. The People’s Republic of China was the first country to commercialize transgenic plants, introducing a virus-resistant tobacco in 1992. In 1994 Calgene attained approval to commercially release the Flavr Savr tomato, a tomato engineered to have a longer shelf life. Also in 1994, the European Union approved tobacco engineered to be resistant to the herbicide bromoxynil, making it the first genetically engineered crop commercialized in Europe. In 1995 Bt Potato was approved safe by the Environmental Protection Agency, after having been approved by the FDA, making it the first pesticide producing crop to be approved in the USA. In 1996 a total of 35 approvals had been granted to commercially grow 8 transgenic crops and one flower crop (carnation), with 8 different traits in 6 countries plus the EU.

By 2010, 29 countries had planted commercialized biotech crops and a further 31 countries had granted regulatory approval for transgenic crops to be imported. In 2013 Robert Fraley (Monsanto’s executive vice president and chief technology officer), Marc Van Montagu and Mary-Dell Chilton were awarded the World Food Prize for improving the "quality, quantity or availability" of food in the world.

The first genetically modified animal to be commercialised was the GloFish, a Zebra fish with a fluorescent gene added that allows it to glow in the dark under ultraviolet light. The first genetically modified animal to be approved for food use was AquAdvantage salmon in 2015. The salmon were transformed with a growth hormone-regulating gene from a Pacific Chinook salmon and a promoter from an ocean pout enabling it to grow year-round instead of only during spring and summer.

Opposition

Opposition and support for the use of genetic engineering has existed since the technology was developed. After Arpad Pusztai went public with research he was conducting in 1998 the public opposition to genetically modified food increased. Opposition continued following controversial and publicly debated papers published in 1999 and 2013 that claimed negative environmental and health impacts from genetically modified crops.

Pharmacogenomics

From Wikipedia, the free encyclopedia

Pharmacogenomics is the study of the role of the genome in drug response. Its name (pharmaco- + genomics) reflects its combining of pharmacology and genomics. Pharmacogenomics analyzes how the genetic makeup of an individual affects his/her response to drugs. It deals with the influence of acquired and inherited genetic variation on drug response in patients by correlating gene expression or single-nucleotide polymorphisms with pharmacokinetics (drug absorption, distribution, metabolism, and elimination) and pharmacodynamics (effects mediated through a drug's biological targets). The term pharmacogenomics is often used interchangeably with pharmacogenetics. Although both terms relate to drug response based on genetic influences, pharmacogenetics focuses on single drug-gene interactions, while pharmacogenomics encompasses a more genome-wide association approach, incorporating genomics and epigenetics while dealing with the effects of multiple genes on drug response.

Pharmacogenomics aims to develop rational means to optimize drug therapy, with respect to the patients' genotype, to ensure maximum efficiency with minimal adverse effects. Through the utilization of pharmacogenomics, it is hoped that pharmaceutical drug treatments can deviate from what is dubbed as the "one-dose-fits-all" approach. Pharmacogenomics also attempts to eliminate the trial-and-error method of prescribing, allowing physicians to take into consideration their patient's genes, the functionality of these genes, and how this may affect the efficacy of the patient's current or future treatments (and where applicable, provide an explanation for the failure of past treatments). Such approaches promise the advent of precision medicine and even personalized medicine, in which drugs and drug combinations are optimized for narrow subsets of patients or even for each individual's unique genetic makeup. Whether used to explain a patient's response or lack thereof to a treatment, or act as a predictive tool, it hopes to achieve better treatment outcomes, greater efficacy, minimization of the occurrence of drug toxicities and adverse drug reactions (ADRs). For patients who have lack of therapeutic response to a treatment, alternative therapies can be prescribed that would best suit their requirements. In order to provide pharmacogenomic recommendations for a given drug, two possible types of input can be used: genotyping or exome or whole genome sequencing. Sequencing provides many more data points, including detection of mutations that prematurely terminate the synthesized protein (early stop codon).

History

Pharmacogenomics was first recognized by Pythagoras around 510 BC when he made a connection between the dangers of fava bean ingestion with hemolytic anemia and oxidative stress. This identification was later validated and attributed to deficiency of G6PD in the 1950s and called favism. Although the first official publication dates back to 1961, circa 1950s marked the unofficial beginnings of this science. Reports of prolonged paralysis and fatal reactions linked to genetic variants in patients who lacked butyryl-cholinesterase (‘pseudocholinesterase’) following administration of succinylcholine injection during anesthesia were first reported in 1956. The term pharmacogenetic was first coined in 1959 by Friedrich Vogel of Heidelberg, Germany (although some papers suggest it was 1957 or 1958). In the late 1960s, twin studies supported the inference of genetic involvement in drug metabolism, with identical twins sharing remarkable similarities to drug response compared to fraternal twins. The term pharmacogenomics first began appearing around the 1990s.

The first FDA approval of a pharmacogenetic test was in 2005 (for alleles in CYP2D6 and CYP2C19).

Drug-metabolizing enzymes

There are several known genes which are largely responsible for variances in drug metabolism and response. The focus of this article will remain on the genes that are more widely accepted and utilized clinically for brevity.
  • Cytochrome P450s
  • VKORC1
  • TPMT

Cytochrome P450

The most prevalent drug-metabolizing enzymes (DME) are the Cytochrome P450 (CYP) enzymes. The term Cytochrome P450 was coined by Omura and Sato in 1962 to describe the membrane-bound, heme-containing protein characterized by 450 nm spectral peak when complexed with carbon monoxide. The human CYP family consists of 57 genes, with 18 families and 44 subfamilies. CYP proteins are conveniently arranged into these families and subfamilies on the basis of similarities identified between the amino acid sequences. Enzymes that share 35-40% identity are assigned to the same family by an Arabic numeral, and those that share 55-70% make up a particular subfamily with a designated letter. For example, CYP2D6 refers to family 2, subfamily D, and gene number 6.

From a clinical perspective, the most commonly tested CYPs include: CYP2D6, CYP2C19, CYP2C9, CYP3A4 and CYP3A5. These genes account for the metabolism of approximately 80-90% of currently available prescription drugs. The table below provides a summary for some of the medications that take these pathways.

CYP2D6

Also known as debrisoquine hydroxylase (named after the drug that led to its discovery), CYP2D6 is the most well-known and extensively studied CYP gene. It is a gene of great interest also due to its highly polymorphic nature, and involvement in a high number of medication metabolisms (both as a major and minor pathway). More than 100 CYP2D6 genetic variants have been identified.

CYP2C19

Discovered in the early 1980s, CYP2C19 is the second most extensively studied and well understood gene in pharmacogenomics. Over 28 genetic variants have been identified for CYP2C19, of which affects the metabolism of several classes of drugs, such as antidepressants and proton pump inhibitors.

CYP2C9

CYP2C9 constitutes the majority of the CYP2C subfamily, representing approximately 20% of the liver content. It is involved in the metabolism of approximately 10% of all drugs, which include medications with narrow therapeutic windows such as warfarin and tolbutamide. There are approximately 57 genetic variants associated with CYP2C9.

CYP3A4 and CYP3A5

The CYP3A family is the most abundantly found in the liver, with CYP3A4 accounting for 29% of the liver content. These enzymes also cover between 40-50% of the current prescription drugs, with the CYP3A4 accounting for 40-45% of these medications. CYP3A5 has over 11 genetic variants identified at the time of this publication.

VKORC1

The vitamin K epoxide reductase complex subunit 1 (VKORC1) is responsible for the pharmacodynamics of warfarin. VKORC1 along with CYP2C9 are useful for identifying the risk of bleeding during warfarin administration. Warfarin works by inhibiting VKOR, which is encoded by the VKORC1 gene. Individuals with polymorphism in this have an affected response to warfarin treatment.

TPMT

Thiopurine methyltransferase (TPMT) catalyzes the S-methylation of thiopurines, thereby regulating the balance between cytotoxic thioguanine nucleotide and inactive metabolites in hematopoietic cells.  TPMT is highly involved in 6-MP metabolism and TMPT activity and TPMT genotype is known to affect the risk of toxicity. Excessive levels of 6-MP can cause myelosuppression and myelotoxicity.

Codeine, clopidogrel, tamoxifen, and warfarin a few examples of medications that follow the above metabolic pathways.

Predictive prescribing

Patient genotypes are usually categorized into the following predicted phenotypes:
  • Ultra-rapid metabolizer: patients with substantially increased metabolic activity;
  • Extensive metabolizer: normal metabolic activity;
  • Intermediate metabolizer: patients with reduced metabolic activity; and
  • Poor metabolizer: patients with little to no functional metabolic activity.
The two extremes of this spectrum are the poor metabolizers and ultra-rapid metabolizers. Efficacy of a medication is not only based on the above metabolic statuses, but also the type of drug consumed. Drugs can be classified into two main groups: active drugs and prodrugs. Active drugs refer to drugs that are inactivated during metabolism, and prodrugs are inactive until they are metabolized.

 
An overall process of how pharmacogenomics functions in a clinical practice. From the raw genotype results, this is then translated to the physical trait, the phenotype. Based on these observations, optimal dosing is evaluated.
 
For example, we have two patients who are taking codeine for pain relief. Codeine is a prodrug, so it requires conversion from its inactive form to its active form. The active form of codeine is morphine, which provides the therapeutic effect of pain relief. If person A receives one *1 allele each from mother and father to code for the CYP2D6 gene, then that person is considered to have an extensive metabolizer (EM) phenotype, as allele *1 is considered to have a normal-function (this would be represented as CYP2D6 *1/*1). If person B on the other hand had received one *1 allele from the mother and a *4 allele from the father, that individual would be an Intermediate Metabolizer (IM) (the genotype would be CYP2D6 *1/*4). Although both individuals are taking the same dose of codeine, person B could potentially lack the therapeutic benefits of codeine due to the decreased conversion rate of codeine to its active counterpart morphine.

Each phenotype is based upon the allelic variation within the individual genotype. However, several genetic events can influence a same phenotypic trait, and establishing genotype-to-phenotype relationships can thus be far from consensual with many enzymatic patterns. For instance, the influence of the CYP2D6*1/*4 allelic variant on the clinical outcome in patients treated with Tamoxifen remains debated today. In oncology, genes coding for DPD, UGT1A1, TPMT, CDA involved in the pharmacokinetics of 5-FU/capecitabine, irinotecan, 6-mercaptopurine and gemcitabine/cytarabine, respectively, have all been described as being highly polymorphic. A strong body of evidence suggests that patients affected by these genetic polymorphisms will experience severe/lethal toxicities upon drug intake, and that pre-therapeutic screening does help to reduce the risk of treatment-related toxicities through adaptive dosing strategies.

Applications

The list below provides a few more commonly known applications of pharmacogenomics:
  • Improve drug safety, and reduce ADRs;
  • Tailor treatments to meet patients' unique genetic pre-disposition, identifying optimal dosing;
  • Improve drug discovery targeted to human disease; and
  • Improve proof of principle for efficacy trials.
Pharmacogenomics may be applied to several areas of medicine, including Pain Management, Cardiology, Oncology, and Psychiatry. A place may also exist in Forensic Pathology, in which pharmacogenomics can be used to determine the cause of death in drug-related deaths where no findings emerge using autopsy.

In cancer treatment, pharmacogenomics tests are used to identify which patients are most likely to respond to certain cancer drugs. In behavioral health, pharmacogenomic tests provide tools for physicians and care givers to better manage medication selection and side effect amelioration. Pharmacogenomics is also known as companion diagnostics, meaning tests being bundled with drugs. Examples include KRAS test with cetuximab and EGFR test with gefitinib. Beside efficacy, germline pharmacogenetics can help to identify patients likely to undergo severe toxicities when given cytotoxics showing impaired detoxification in relation with genetic polymorphism, such as canonical 5-FU.

In cardiovascular disorders, the main concern is response to drugs including warfarin, clopidogrel, beta blockers, and statins.

Example case studies

Case A – Antipsychotic adverse reaction

Patient A suffers from schizophrenia. Their treatment included a combination of ziprasidone, olanzapine, trazodone and benzotropine. The patient experienced dizziness and sedation, so they were tapered off ziprasidone and olanzapine, and transition to quetiapine. Trazodone was discontinued. The patient then experienced excessive sweating, tachycardia and neck pain, gained considerable weight and had hallucinations. Five months later, quetiapine was tapered and discontinued, with ziprasidone re-introduction into their treatment due to the excessive weight gain. Although the patient lost the excessive weight they gained, they then developed muscle stiffness, cogwheeling, tremor and night sweats. When benztropine was added they experienced blurry vision. After an additional five months, the patient was switched from ziprasidone to aripiprazole. Over the course of 8 months, patient A gradually experienced more weight gain, sedation, developed difficulty with their gait, stiffness, cogwheel and dyskinetic ocular movements. A pharmacogenomics test later proved the patient had a CYP2D6 *1/*41, with has a predicted phenotype of IM and CYP2C19 *1/*2 with predicted phenotype of IM as well.

Case B – Pain Management

Patient B is a woman who gave birth by caesarian section. Her physician prescribed codeine for post-caesarian pain. She took the standard prescribed dose, however experienced nausea and dizziness while she was taking codeine. She also noticed that her breastfed infant was lethargic and feeding poorly. When the patient mentioned these symptoms to her physician, they recommended that she discontinue codeine use. Within a few days, both the patient and her infant’s symptoms were no longer present. It is assumed that if the patient underwent a pharmacogenomic test, it would have revealed she may have had a duplication of the gene CYP2D6 placing her in the Ultra-rapid metabolizer (UM) category, explaining her ADRs to codeine use.

Case C – FDA Warning on Codeine Overdose for Infants

On February 20, 2013, the FDA released a statement addressing a serious concern regarding the connection between children who are known as CYP2D6 UM and fatal reactions to codeine following tonsillectomy and/or adenoidectomy (surgery to remove the tonsils and/or adenoids). They released their strongest Boxed Warning to elucidate the dangers of CYP2D6 UMs consuming codeine. Codeine is converted to morphine by CYP2D6, and those who have UM phenotypes are at danger of producing large amounts of morphine due to the increased function of the gene. The morphine can elevate to life-threatening or fatal amounts, as became evident with the death of three children in August 2012.

Polypharmacy

A potential role pharmacogenomics may play would be to reduce the occurrence of polypharmacy. It is theorized that with tailored drug treatments, patients will not have the need to take several medications that are intended to treat the same condition. In doing so, they could potentially minimize the occurrence of ADRs, have improved treatment outcomes, and can save costs by avoiding purchasing extraneous medications. An example of this can be found in psychiatry, where patients tend to be receiving more medications than even age-matched non-psychiatric patients. This has been associated with an increased risk of inappropriate prescribing.

The need for pharmacogenomics tailored drug therapies may be most evident in a survey conducted by the Slone Epidemiology Center at Boston University from February 1998 to April 2007. The study elucidated that an average of 82% of adults in the United States are taking at least one medication (prescription or nonprescription drug, vitamin/mineral, herbal/natural supplement), and 29% are taking five or more. The study suggested that those aged 65 years or older continue to be the biggest consumers of medications, with 17-19 % in this age group taking at least ten medications in a given week. Polypharmacy has also shown to have increased since 2000 from 23% to 29%.

Drug labeling

The U.S. Food and Drug Administration (FDA) appears to be very invested in the science of pharmacogenomics as is demonstrated through the 120 and more FDA-approved drugs that include pharmacogenomic biomarkers in their labels. This number increased varies over the years. A study of the labels of FDA-approved drugs as of 20 June 2014 found that there were 140 different drugs with a pharmacogenomic biomarker in their label. Because a drug can have different biomarkers, this corresponded to 158 drug–biomarker pairs. Only 29% stated a requirement or recommendation for genetic biomarker testing but this was higher for oncology drugs (62%). On May 22, 2005, the FDA issued its first Guidance for Industry: Pharmacogenomic Data Submissions, which clarified the type of pharmacogenomic data required to be submitted to the FDA and when. Experts recognized the importance of the FDA’s acknowledgement that pharmacogenomics experiments will not bring negative regulatory consequences. The FDA had released its latest guide Clinical Pharmacogenomics (PGx): Premarket Evaluation in Early-Phase Clinical Studies and Recommendations for Labeling in January, 2013. The guide is intended to address the use of genomic information during drug development and regulatory review processes.

Challenges

Consecutive phases and associated challenges in Pharmacogenomics.
 
Although there appears to be a general acceptance of the basic tenet of pharmacogenomics amongst physicians and healthcare professionals, several challenges exist that slow the uptake, implementation, and standardization of pharmacogenomics. Some of the concerns raised by physicians include:
  • Limitation on how to apply the test into clinical practices and treatment;
  • A general feeling of lack of availability of the test;
  • The understanding and interpretation of evidence-based research; and
  • Ethical, legal and social issues.
Issues surrounding the availability of the test include:
  • The lack of availability of scientific data: Although there are considerable number of DME involved in the metabolic pathways of drugs, only a fraction have sufficient scientific data to validate their use within a clinical setting; and
  • Demonstrating the cost-effectiveness of pharmacogenomics: Publications for the pharmacoeconomics of pharmacogenomics are scarce, therefore sufficient evidence does not at this time exist to validate the cost-effectiveness and cost-consequences of the test.
Although other factors contribute to the slow progression of pharmacogenomics (such as developing guidelines for clinical use), the above factors appear to be the most prevalent.

Controversies

Some alleles that vary in frequency between specific populations have been shown to be associated with differential responses to specific drugs. The beta blocker atenolol is an anti-hypertensive medication that is shown to more significantly lower the blood pressure of Caucasian patients than African American patients in the United States. This observation suggests that Caucasian and African American populations have different alleles governing oleic acid biochemistry, which react differentially with atenolol. Similarly, hypersensitivity to the antiretroviral drug abacavir is strongly associated with a single-nucleotide polymorphism that varies in frequency between populations.

The FDA approval of the drug BiDil (isosorbide dinitrate/hydralazine) with a label specifying African-Americans with congestive heart failure, produced a storm of controversy over race-based medicine and fears of genetic stereotyping, even though the label for BiDil did not specify any genetic variants but was based on racial self-identification.

Future

Computational advances in pharmacogenomics has proven to be a blessing in research. As a simple example, for nearly a decade the ability to store more information on a hard drive has enabled us to investigate a human genome sequence cheaper and in more detail with regards to the effects/risks/safety concerns of drugs and other such substances. Such computational advances are expected to continue in the future. The aim is to use the genome sequence data to effectively make decisions in order to minimise the negative impacts on, say, a patient or the health industry in general. A large amount of research in the biomedical sciences regarding Pharmacogenomics as of late stems from combinatorial chemistry, genomic mining, omic technologies and high throughput screening. In order for the field to grow, rich knowledge enterprises and business must work more closely together and adopt simulation strategies. Consequently, more importance must be placed on the role of computational biology with regards to safety and risk assessments. Here, we can find the growing need and importance of being able to manage large, complex data sets, being able to extract information by integrating disparate data so that developments can be made in improving human health.

History of statistics

From Wikipedia, the free encyclopedia

The history of statistics in the modern sense dates from the mid-17th century, with the term statistics itself coined in 1749 in German, although there have been changes to the interpretation of the word over time. The development of statistics is intimately connected on the one hand with the development of sovereign states, particularly European states following the Peace of Westphalia (1648); and the other hand with the development of probability theory, which put statistics on a firm theoretical basis.

In early times, the meaning was restricted to information about states, particularly demographics such as population. This was later extended to include all collections of information of all types, and later still it was extended to include the analysis and interpretation of such data. In modern terms, "statistics" means both sets of collected information, as in national accounts and temperature records, and analytical work which requires statistical inference. Statistical activities are often associated with models expressed using probabilities, hence the connection with probability theory. The large requirements of data processing have made statistics a key application of computing; see history of computing hardware. A number of statistical concepts have an important impact on a wide range of sciences. These include the design of experiments and approaches to statistical inference such as Bayesian inference, each of which can be considered to have their own sequence in the development of the ideas underlying modern statistics.

Introduction

By the 18th century, the term "statistics" designated the systematic collection of demographic and economic data by states. For at least two millennia, these data were mainly tabulations of human and material resources that might be taxed or put to military use. In the early 19th century, collection intensified, and the meaning of "statistics" broadened to include the discipline concerned with the collection, summary, and analysis of data. Today, data is collected and statistics are computed and widely distributed in government, business, most of the sciences and sports, and even for many pastimes. Electronic computers have expedited more elaborate statistical computation even as they have facilitated the collection and aggregation of data. A single data analyst may have available a set of data-files with millions of records, each with dozens or hundreds of separate measurements. These were collected over time from computer activity (for example, a stock exchange) or from computerized sensors, point-of-sale registers, and so on. Computers then produce simple, accurate summaries, and allow more tedious analyses, such as those that require inverting a large matrix or perform hundreds of steps of iteration, that would never be attempted by hand. Faster computing has allowed statisticians to develop "computer-intensive" methods which may look at all permutations, or use randomization to look at 10,000 permutations of a problem, to estimate answers that are not easy to quantify by theory alone.

The term "mathematical statistics" designates the mathematical theories of probability and statistical inference, which are used in statistical practice. The relation between statistics and probability theory developed rather late, however. In the 19th century, statistics increasingly used probability theory, whose initial results were found in the 17th and 18th centuries, particularly in the analysis of games of chance (gambling). By 1800, astronomy used probability models and statistical theories, particularly the method of least squares. Early probability theory and statistics was systematized in the 19th century and statistical reasoning and probability models were used by social scientists to advance the new sciences of experimental psychology and sociology, and by physical scientists in thermodynamics and statistical mechanics. The development of statistical reasoning was closely associated with the development of inductive logic and the scientific method, which are concerns that move statisticians away from the narrower area of mathematical statistics. Much of the theoretical work was readily available by the time computers were available to exploit them. By the 1970s, Johnson and Kotz produced a four-volume Compendium on Statistical Distributions (1st ed., 1969-1972), which is still an invaluable resource.

Applied statistics can be regarded as not a field of mathematics but an autonomous mathematical science, like computer science and operations research. Unlike mathematics, statistics had its origins in public administration. Applications arose early in demography and economics; large areas of micro- and macro-economics today are "statistics" with an emphasis on time-series analyses. With its emphasis on learning from data and making best predictions, statistics also has been shaped by areas of academic research including psychological testing, medicine and epidemiology. The ideas of statistical testing have considerable overlap with decision science. With its concerns with searching and effectively presenting data, statistics has overlap with information science and computer science.

Etymology

The term statistics is ultimately derived from the New Latin statisticum collegium ("council of state") and the Italian word statista ("statesman" or "politician"). The German Statistik, first introduced by Gottfried Achenwall (1749), originally designated the analysis of data about the state, signifying the "science of state" (then called political arithmetic in English). It acquired the meaning of the collection and classification of data generally in the early 19th century. It was introduced into English in 1791 by Sir John Sinclair when he published the first of 21 volumes titled Statistical Account of Scotland.

Thus, the original principal purpose of Statistik was data to be used by governmental and (often centralized) administrative bodies. The collection of data about states and localities continues, largely through national and international statistical services. In particular, censuses provide frequently updated information about the population.

The first book to have 'statistics' in its title was "Contributions to Vital Statistics" (1845) by Francis GP Neison, actuary to the Medical Invalid and General Life Office.

Origins in probability theory

Basic forms of statistics have been used since the beginning of civilization. Early empires often collated censuses of the population or recorded the trade in various commodities. The Roman Empire was one of the first states to extensively gather data on the size of the empire's population, geographical area and wealth.

The use of statistical methods dates back to least to the 5th century BCE. The historian Thucydides in his History of the Peloponnesian War  describes how the Athenians calculated the height of the wall of Platea by counting the number of bricks in an unplastered section of the wall sufficiently near them to be able to count them. The count was repeated several times by a number of soldiers. The most frequent value (in modern terminology - the mode ) so determined was taken to be the most likely value of the number of bricks. Multiplying this value by the height of the bricks used in the wall allowed the Athenians to determine the height of the ladders necessary to scale the walls.

The earliest writing on statistics was found in a 9th-century book entitled: "Manuscript on Deciphering Cryptographic Messages", written by Al-Kindi (801–873 CE). In his book, Al-Kindi gave a detailed description of how to use statistics and frequency analysis to decipher encrypted messages. This text arguably gave rise to the birth of both statistics and cryptanalysis.

The Trial of the Pyx is a test of the purity of the coinage of the Royal Mint which has been held on a regular basis since the 12th century. The Trial itself is based on statistical sampling methods. After minting a series of coins - originally from ten pounds of silver - a single coin was placed in the Pyx - a box in Westminster Abbey. After a given period - now once a year - the coins are removed and weighed. A sample of coins removed from the box are then tested for purity.

The Nuova Cronica, a 14th-century history of Florence by the Florentine banker and official Giovanni Villani, includes much statistical information on population, ordinances, commerce and trade, education, and religious facilities and has been described as the first introduction of statistics as a positive element in history, though neither the term nor the concept of statistics as a specific field yet existed. But this was proven to be incorrect after the rediscovery of Al-Kindi's book on frequency analysis.

The arithmetic mean, although a concept known to the Greeks, was not generalised to more than two values until the 16th century. The invention of the decimal system by Simon Stevin in 1585 seems likely to have facilitated these calculations. This method was first adopted in astronomy by Tycho Brahe who was attempting to reduce the errors in his estimates of the locations of various celestial bodies.

The idea of the median originated in Edward Wright's book on navigation (Certaine Errors in Navigation) in 1599 in a section concerning the determination of location with a compass. Wright felt that this value was the most likely to be the correct value in a series of observations.

Sir William Petty, a 17th-century economist who used early statistical methods to analyse demographic data.

The birth of statistics is often dated to 1662, when John Graunt, along with William Petty, developed early human statistical and census methods that provided a framework for modern demography. He produced the first life table, giving probabilities of survival to each age. His book Natural and Political Observations Made upon the Bills of Mortality used analysis of the mortality rolls to make the first statistically based estimation of the population of London. He knew that there were around 13,000 funerals per year in London and that three people died per eleven families per year. He estimated from the parish records that the average family size was 8 and calculated that the population of London was about 384,000; this is the first known use of a ratio estimator. Laplace in 1802 estimated the population of France with a similar method.

Although the original scope of statistics was limited to data useful for governance, the approach was extended to many fields of a scientific or commercial nature during the 19th century. The mathematical foundations for the subject heavily drew on the new probability theory, pioneered in the 16th century by Gerolamo Cardano, Pierre de Fermat and Blaise Pascal. Christiaan Huygens (1657) gave the earliest known scientific treatment of the subject. Jakob Bernoulli's Ars Conjectandi (posthumous, 1713) and Abraham de Moivre's The Doctrine of Chances (1718) treated the subject as a branch of mathematics. In his book Bernoulli introduced the idea of representing complete certainty as one and probability as a number between zero and one.

A key early application of statistics in the 18th century was to the human sex ratio at birth. John Arbuthnot studied this question in 1710. Arbuthnot examined birth records in London for each of the 82 years from 1629 to 1710. In every year, the number of males born in London exceeded the number of females. Considering more male or more female births as equally likely, the probability of the observed outcome is 0.5^82, or about 1 in 4,8360,0000,0000,0000,0000,0000; in modern terms, the p-value. This is vanishingly small, leading Arbuthnot that this was not due to chance, but to divine providence: "From whence it follows, that it is Art, not Chance, that governs." This is and other work by Arbuthnot is credited as "the first use of significance tests" the first example of reasoning about statistical significance and moral certainty, and "… perhaps the first published report of a nonparametric test …", specifically the sign test.

The formal study of theory of errors may be traced back to Roger Cotes' Opera Miscellanea (posthumous, 1722), but a memoir prepared by Thomas Simpson in 1755 (printed 1756) first applied the theory to the discussion of errors of observation. The reprint (1757) of this memoir lays down the axioms that positive and negative errors are equally probable, and that there are certain assignable limits within which all errors may be supposed to fall; continuous errors are discussed and a probability curve is given. Simpson discussed several possible distributions of error. He first considered the uniform distribution and then the discrete symmetric triangular distribution followed by the continuous symmetric triangle distribution. Tobias Mayer, in his study of the libration of the moon (Kosmographische Nachrichten, Nuremberg, 1750), invented the first formal method for estimating the unknown quantities by generalized the averaging of observations under identical circumstances to the averaging of groups of similar equations.

Roger Joseph Boscovich in 1755 based in his work on the shape of the earth proposed in his book De Litteraria expeditione per pontificiam ditionem ad dimetiendos duos meridiani gradus a PP. Maire et Boscovicli that the true value of a series of observations would be that which minimises the sum of absolute errors. In modern terminology this value is the median. The first example of what later became known as the normal curve was studied by Abraham de Moivre who plotted this curve on November 12, 1733. de Moivre was studying the number of heads that occurred when a 'fair' coin was tossed.

In 1761 Thomas Bayes proved Bayes' theorem and in 1765 Joseph Priestley invented the first timeline charts.

Johann Heinrich Lambert in his 1765 book Anlage zur Architectonic proposed the semicircle as a distribution of errors:
with -1 < x < 1.

Probability density plots for the Laplace distribution.

Pierre-Simon Laplace (1774) made the first attempt to deduce a rule for the combination of observations from the principles of the theory of probabilities. He represented the law of probability of errors by a curve and deduced a formula for the mean of three observations.

Laplace in 1774 noted that the frequency of an error could be expressed as an exponential function of its magnitude once its sign was disregarded. This distribution is now known as the Laplace distribution. Lagrange proposed a parabolic distribution of errors in 1776.

Laplace in 1778 published his second law of errors wherein he noted that the frequency of an error was proportional to the exponential of the square of its magnitude. This was subsequently rediscovered by Gauss (possibly in 1795) and is now best known as the normal distribution which is of central importance in statistics. This distribution was first referred to as the normal distribution by C. S. Peirce in 1873 who was studying measurement errors when an object was dropped onto a wooden base. He chose the term normal because of its frequent occurrence in naturally occurring variables.

Lagrange also suggested in 1781 two other distributions for errors - a raised cosine distribution and a logarithmic distribution.

Laplace gave (1781) a formula for the law of facility of error (a term due to Joseph Louis Lagrange, 1774), but one which led to unmanageable equations. Daniel Bernoulli (1778) introduced the principle of the maximum product of the probabilities of a system of concurrent errors.

In 1786 William Playfair (1759-1823) introduced the idea of graphical representation into statistics. He invented the line chart, bar chart and histogram and incorporated them into his works on economics, the Commercial and Political Atlas. This was followed in 1795 by his invention of the pie chart and circle chart which he used to display the evolution of England's imports and exports. These latter charts came to general attention when he published examples in his Statistical Breviary in 1801.

Laplace, in an investigation of the motions of Saturn and Jupiter in 1787, generalized Mayer's method by using different linear combinations of a single group of equations.

In 1791 Sir John Sinclair introduced the term 'statistics' into English in his Statistical Accounts of Scotland.

In 1802 Laplace estimated the population of France to be 28,328,612. He calculated this figure using the number of births in the previous year and census data for three communities. The census data of these communities showed that they had 2,037,615 persons and that the number of births were 71,866. Assuming that these samples were representative of France, Laplace produced his estimate for the entire population.

Carl Friedrich Gauss, mathematician who developed the method of least squares in 1809.

The method of least squares, which was used to minimize errors in data measurement, was published independently by Adrien-Marie Legendre (1805), Robert Adrain (1808), and Carl Friedrich Gauss (1809). Gauss had used the method in his famous 1801 prediction of the location of the dwarf planet Ceres. The observations that Gauss based his calculations on were made by the Italian monk Piazzi.

The term probable error (der wahrscheinliche Fehler) - the median deviation from the mean - was introduced in 1815 by the German astronomer Frederik Wilhelm Bessel. Antoine Augustin Cournot in 1843 was the first to use the term median (valeur médiane) for the value that divides a probability distribution into two equal halves.

Other contributors to the theory of errors were Ellis (1844), De Morgan (1864), Glaisher (1872), and Giovanni Schiaparelli (1875). Peters's (1856) formula for , the "probable error" of a single observation was widely used and inspired early robust statistics.

In the 19th century authors on statistical theory included Laplace, S. Lacroix (1816), Littrow (1833), Dedekind (1860), Helmert (1872), Laurent (1873), Liagre, Didion, De Morgan and Boole.

Gustav Theodor Fechner used the median (Centralwerth) in sociological and psychological phenomena. It had earlier been used only in astronomy and related fields. Francis Galton used the English term median for the first time in 1881 having earlier used the terms middle-most value in 1869 and the medium in 1880.

Adolphe Quetelet (1796–1874), another important founder of statistics, introduced the notion of the "average man" (l'homme moyen) as a means of understanding complex social phenomena such as crime rates, marriage rates, and suicide rates.

The first tests of the normal distribution were invented by the German statistician Wilhelm Lexis in the 1870s. The only data sets available to him that he was able to show were normally distributed were birth rates.

Development of modern statistics

Although the origins of statistical theory lie in the 18th-century advances in probability, the modern field of statistics only emerged in the late-19th and early-20th century in three stages. The first wave, at the turn of the century, was led by the work of Francis Galton and Karl Pearson, who transformed statistics into a rigorous mathematical discipline used for analysis, not just in science, but in industry and politics as well. The second wave of the 1910s and 20s was initiated by William Gosset, and reached its culmination in the insights of Ronald Fisher. This involved the development of better design of experiments models, hypothesis testing and techniques for use with small data samples. The final wave, which mainly saw the refinement and expansion of earlier developments, emerged from the collaborative work between Egon Pearson and Jerzy Neyman in the 1930s. Today, statistical methods are applied in all fields that involve decision making, for making accurate inferences from a collated body of data and for making decisions in the face of uncertainty based on statistical methodology.

The original logo of the Royal Statistical Society, founded in 1834.

The first statistical bodies were established in the early 19th century. The Royal Statistical Society was founded in 1834 and Florence Nightingale, its first female member, pioneered the application of statistical analysis to health problems for the furtherance of epidemiological understanding and public health practice. However, the methods then used would not be considered as modern statistics today.

The Oxford scholar Francis Ysidro Edgeworth's book, Metretike: or The Method of Measuring Probability and Utility (1887) dealt with probability as the basis of inductive reasoning, and his later works focused on the 'philosophy of chance'. His first paper on statistics (1883) explored the law of error (normal distribution), and his Methods of Statistics (1885) introduced an early version of the t distribution, the Edgeworth expansion, the Edgeworth series, the method of variate transformation and the asymptotic theory of maximum likelihood estimates.

The Norwegian Anders Nicolai Kiær introduced the concept of stratified sampling in 1895. Arthur Lyon Bowley introduced new methods of data sampling in 1906 when working on social statistics. Although statistical surveys of social conditions had started with Charles Booth's "Life and Labour of the People in London" (1889-1903) and Seebohm Rowntree's "Poverty, A Study of Town Life" (1901), Bowley's, key innovation consisted of the use of random sampling techniques. His efforts culminated in his New Survey of London Life and Labour.

Francis Galton is credited as one of the principal founders of statistical theory. His contributions to the field included introducing the concepts of standard deviation, correlation, regression and the application of these methods to the study of the variety of human characteristics - height, weight, eyelash length among others. He found that many of these could be fitted to a normal curve distribution.

Galton submitted a paper to Nature in 1907 on the usefulness of the median. He examined the accuracy of 787 guesses of the weight of an ox at a country fair. The actual weight was 1208 pounds: the median guess was 1198. The guesses were markedly non-normally distributed.


Galton's publication of Natural Inheritance in 1889 sparked the interest of a brilliant mathematician, Karl Pearson, then working at University College London, and he went on to found the discipline of mathematical statistics. He emphasised the statistical foundation of scientific laws and promoted its study and his laboratory attracted students from around the world attracted by his new methods of analysis, including Udny Yule. His work grew to encompass the fields of biology, epidemiology, anthropometry, medicine and social history. In 1901, with Walter Weldon, founder of biometry, and Galton, he founded the journal Biometrika as the first journal of mathematical statistics and biometry.

His work, and that of Galton's, underpins many of the 'classical' statistical methods which are in common use today, including the Correlation coefficient, defined as a product-moment; the method of moments for the fitting of distributions to samples; Pearson's system of continuous curves that forms the basis of the now conventional continuous probability distributions; Chi distance a precursor and special case of the Mahalanobis distance and P-value, defined as the probability measure of the complement of the ball with the hypothesized value as center point and chi distance as radius. He also introduced the term 'standard deviation'.

He also founded the statistical hypothesis testing theory, Pearson's chi-squared test and principal component analysis. In 1911 he founded the world's first university statistics department at University College London.

Ronald Fisher, "A genius who almost single-handedly created the foundations for modern statistical science"
The second wave of mathematical statistics was pioneered by Ronald Fisher who wrote two textbooks, Statistical Methods for Research Workers, published in 1925 and The Design of Experiments in 1935, that were to define the academic discipline in universities around the world. He also systematized previous results, putting them on a firm mathematical footing. In his 1918 seminal paper The Correlation between Relatives on the Supposition of Mendelian Inheritance, the first use to use the statistical term, variance. In 1919, at Rothamsted Experimental Station he started a major study of the extensive collections of data recorded over many years. This resulted in a series of reports under the general title Studies in Crop Variation. In 1930 he published The Genetical Theory of Natural Selection where he applied statistics to evolution.

Over the next seven years, he pioneered the principles of the design of experiments (see below) and elaborated his studies of analysis of variance. He furthered his studies of the statistics of small samples. Perhaps even more important, he began his systematic approach of the analysis of real data as the springboard for the development of new statistical methods. He developed computational algorithms for analyzing data from his balanced experimental designs. In 1925, this work resulted in the publication of his first book, Statistical Methods for Research Workers. This book went through many editions and translations in later years, and it became the standard reference work for scientists in many disciplines. In 1935, this book was followed by The Design of Experiments, which was also widely used.

In addition to analysis of variance, Fisher named and promoted the method of maximum likelihood estimation. Fisher also originated the concepts of sufficiency, ancillary statistics, Fisher's linear discriminator and Fisher information. His article On a distribution yielding the error functions of several well known statistics (1924) presented Pearson's chi-squared test and William Gosset's t in the same framework as the Gaussian distribution, and his own parameter in the analysis of variance Fisher's z-distribution (more commonly used decades later in the form of the F distribution). The 5% level of significance appears to have been introduced by Fisher in 1925. Fisher stated that deviations exceeding twice the standard deviation are regarded as significant. Before this deviations exceeding three times the probable error were considered significant. For a symmetrical distribution the probable error is half the interquartile range. For a normal distribution the probable error is approximately 2/3 the standard deviation. It appears that Fisher's 5% criterion was rooted in previous practice.

Other important contributions at this time included Charles Spearman's rank correlation coefficient that was a useful extension of the Pearson correlation coefficient. William Sealy Gosset, the English statistician better known under his pseudonym of Student, introduced Student's t-distribution, a continuous probability distribution useful in situations where the sample size is small and population standard deviation is unknown.

Egon Pearson (Karl's son) and Jerzy Neyman introduced the concepts of "Type II" error, power of a test and confidence intervals. Jerzy Neyman in 1934 showed that stratified random sampling was in general a better method of estimation than purposive (quota) sampling.

Design of experiments

James Lind carried out the first ever clinical trial in 1747, in an effort to find a treatment for scurvy.

In 1747, while serving as surgeon on HM Bark Salisbury, James Lind carried out a controlled experiment to develop a cure for scurvy. In this study his subjects' cases "were as similar as I could have them", that is he provided strict entry requirements to reduce extraneous variation. The men were paired, which provided blocking. From a modern perspective, the main thing that is missing is randomized allocation of subjects to treatments.

Lind is today often described as a one-factor-at-a-time experimenter. Similar one-factor-at-a-time (OFAT) experimentation was performed at the Rothamsted Research Station in the 1840s by Sir John Lawes to determine the optimal inorganic fertilizer for use on wheat.

A theory of statistical inference was developed by Charles S. Peirce in "Illustrations of the Logic of Science" (1877–1878) and "A Theory of Probable Inference" (1883), two publications that emphasized the importance of randomization-based inference in statistics. In another study, Peirce randomly assigned volunteers to a blinded, repeated-measures design to evaluate their ability to discriminate weights.

Peirce's experiment inspired other researchers in psychology and education, which developed a research tradition of randomized experiments in laboratories and specialized textbooks in the 1800s. Peirce also contributed the first English-language publication on an optimal design for regression-models in 1876. A pioneering optimal design for polynomial regression was suggested by Gergonne in 1815. In 1918 Kirstine Smith published optimal designs for polynomials of degree six (and less).

The use of a sequence of experiments, where the design of each may depend on the results of previous experiments, including the possible decision to stop experimenting, was pioneered by Abraham Wald in the context of sequential tests of statistical hypotheses. Surveys are available of optimal sequential designs, and of adaptive designs. One specific type of sequential design is the "two-armed bandit", generalized to the multi-armed bandit, on which early work was done by Herbert Robbins in 1952.

The term "design of experiments" (DOE) derives from early statistical work performed by Sir Ronald Fisher. He was described by Anders Hald as "a genius who almost single-handedly created the foundations for modern statistical science." Fisher initiated the principles of design of experiments and elaborated on his studies of "analysis of variance". Perhaps even more important, Fisher began his systematic approach to the analysis of real data as the springboard for the development of new statistical methods. He began to pay particular attention to the labour involved in the necessary computations performed by hand, and developed methods that were as practical as they were founded in rigour. In 1925, this work culminated in the publication of his first book, Statistical Methods for Research Workers. This went into many editions and translations in later years, and became a standard reference work for scientists in many disciplines.

A methodology for designing experiments was proposed by Ronald A. Fisher, in his innovative book The Design of Experiments (1935) which also became a standard. As an example, he described how to test the hypothesis that a certain lady could distinguish by flavour alone whether the milk or the tea was first placed in the cup. While this sounds like a frivolous application, it allowed him to illustrate the most important ideas of experimental design: see Lady tasting tea.

Agricultural science advances served to meet the combination of larger city populations and fewer farms. But for crop scientists to take due account of widely differing geographical growing climates and needs, it was important to differentiate local growing conditions. To extrapolate experiments on local crops to a national scale, they had to extend crop sample testing economically to overall populations. As statistical methods advanced (primarily the efficacy of designed experiments instead of one-factor-at-a-time experimentation), representative factorial design of experiments began to enable the meaningful extension, by inference, of experimental sampling results to the population as a whole. But it was hard to decide how representative was the crop sample chosen. Factorial design methodology showed how to estimate and correct for any random variation within the sample and also in the data collection procedures.

Bayesian statistics

Pierre-Simon, marquis de Laplace, one of the main early developers of Bayesian statistics.

The term Bayesian refers to Thomas Bayes (1702–1761), who proved a special case of what is now called Bayes' theorem. However it was Pierre-Simon Laplace (1749–1827) who introduced a general version of the theorem and applied it to celestial mechanics, medical statistics, reliability, and jurisprudence. When insufficient knowledge was available to specify an informed prior, Laplace used uniform priors, according to his "principle of insufficient reason". Laplace assumed uniform priors for mathematical simplicity rather than for philosophical reasons. Laplace also introduced primitive versions of conjugate priors and the theorem of von Mises and Bernstein, according to which the posteriors corresponding to initially differing priors ultimately agree, as the number of observations increases. This early Bayesian inference, which used uniform priors following Laplace's principle of insufficient reason, was called "inverse probability" (because it infers backwards from observations to parameters, or from effects to causes).

After the 1920s, inverse probability was largely supplanted[citation needed] by a collection of methods that were developed by Ronald A. Fisher, Jerzy Neyman and Egon Pearson. Their methods came to be called frequentist statistics. Fisher rejected the Bayesian view, writing that "the theory of inverse probability is founded upon an error, and must be wholly rejected". At the end of his life, however, Fisher expressed greater respect for the essay of Bayes, which Fisher believed to have anticipated his own, fiducial approach to probability; Fisher still maintained that Laplace's views on probability were "fallacious rubbish". Neyman started out as a "quasi-Bayesian", but subsequently developed confidence intervals (a key method in frequentist statistics) because "the whole theory would look nicer if it were built from the start without reference to Bayesianism and priors". The word Bayesian appeared around 1950, and by the 1960s it became the term preferred by those dissatisfied with the limitations of frequentist statistics.

In the 20th century, the ideas of Laplace were further developed in two different directions, giving rise to objective and subjective currents in Bayesian practice. In the objectivist stream, the statistical analysis depends on only the model assumed and the data analysed. No subjective decisions need to be involved. In contrast, "subjectivist" statisticians deny the possibility of fully objective analysis for the general case.

In the further development of Laplace's ideas, subjective ideas predate objectivist positions. The idea that 'probability' should be interpreted as 'subjective degree of belief in a proposition' was proposed, for example, by John Maynard Keynes in the early 1920s. This idea was taken further by Bruno de Finetti in Italy (Fondamenti Logici del Ragionamento Probabilistico, 1930) and Frank Ramsey in Cambridge (The Foundations of Mathematics, 1931). The approach was devised to solve problems with the frequentist definition of probability but also with the earlier, objectivist approach of Laplace. The subjective Bayesian methods were further developed and popularized in the 1950s by L.J. Savage.

Objective Bayesian inference was further developed by Harold Jeffreys at the University of Cambridge. His seminal book "Theory of probability" first appeared in 1939 and played an important role in the revival of the Bayesian view of probability. In 1957, Edwin Jaynes promoted the concept of maximum entropy for constructing priors, which is an important principle in the formulation of objective methods, mainly for discrete problems. In 1965, Dennis Lindley's 2-volume work "Introduction to Probability and Statistics from a Bayesian Viewpoint" brought Bayesian methods to a wide audience. In 1979, José-Miguel Bernardo introduced reference analysis, which offers a general applicable framework for objective analysis. Other well-known proponents of Bayesian probability theory include I.J. Good, B.O. Koopman, Howard Raiffa, Robert Schlaifer and Alan Turing.

In the 1980s, there was a dramatic growth in research and applications of Bayesian methods, mostly attributed to the discovery of Markov chain Monte Carlo methods, which removed many of the computational problems, and an increasing interest in nonstandard, complex applications. Despite growth of Bayesian research, most undergraduate teaching is still based on frequentist statistics. Nonetheless, Bayesian methods are widely accepted and used, such as for example in the field of machine learning.

Delayed-choice quantum eraser

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Delayed-choice_quantum_eraser A delayed-cho...