A Medley of Potpourri

Thursday, March 11, 2021

Computational biology

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Computational_biology

Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modelling and computational simulation techniques to the study of biological, ecological, behavioural, and social systems. The field is broadly defined and includes foundations in biology, applied mathematics, statistics, biochemistry, chemistry, biophysics, molecular biology, genetics, genomics, computer science, and evolution.

Computational biology is different from biological computing, which is a subfield of computer engineering using bioengineering and biology to build computers.

Introduction

Computational biology, which includes many aspects of bioinformatics, is the science of using biological data to develop algorithms or models in order to understand biological systems and relationships. Until recently, biologists did not have access to very large amounts of data. This data has now become commonplace, particularly in molecular biology and genomics. Researchers were able to develop analytical methods for interpreting biological information, but were unable to share them quickly among colleagues.

Bioinformatics began to develop in the early 1970s. It was considered the science of analyzing informatics processes of various biological systems. At this time, research in artificial intelligence was using network models of the human brain in order to generate new algorithms. This use of biological data to develop other fields pushed biological researchers to revisit the idea of using computers to evaluate and compare large data sets. By 1982, information was being shared among researchers through the use of punch cards. The amount of data being shared began to grow exponentially by the end of the 1980s. This required the development of new computational methods in order to quickly analyze and interpret relevant information.

Since the late 1990s, computational biology has become an important part of developing emerging technologies for the field of biology. The terms computational biology and evolutionary computation have a similar name, but are not to be confused. Unlike computational biology, evolutionary computation is not concerned with modeling and analyzing biological data. It instead creates algorithms based on the ideas of evolution across species. Sometimes referred to as genetic algorithms, the research of this field can be applied to computational biology. While evolutionary computation is not inherently a part of computational biology, computational evolutionary biology is a subfield of it.

Computational biology has been used to help sequence the human genome, create accurate models of the human brain, and assist in modeling biological systems.

Subfields

Computational anatomy

Computational anatomy is a discipline focusing on the study of anatomical shape and form at the visible or gross anatomical $50-100\mu$ scale of morphology. It involves the development and application of computational, mathematical and data-analytical methods for modeling and simulation of biological structures. It focuses on the anatomical structures being imaged, rather than the medical imaging devices. Due to the availability of dense 3D measurements via technologies such as magnetic resonance imaging (MRI), computational anatomy has emerged as a subfield of medical imaging and bioengineering for extracting anatomical coordinate systems at the morphome scale in 3D.

The original formulation of computational anatomy is as a generative model of shape and form from exemplars acted upon via transformations. The diffeomorphism group is used to study different coordinate systems via coordinate transformations as generated via the Lagrangian and Eulerian velocities of flow from one anatomical configuration in ${\mathbb {R} }^{3}$ to another. It relates with shape statistics and morphometrics, with the distinction that diffeomorphisms are used to map coordinate systems, whose study is known as diffeomorphometry.

Computational biomodeling

Computational biomodeling is a field concerned with building computer models of biological systems. Computational biomodeling aims to develop and use visual simulations in order to assess the complexity of biological systems. This is accomplished through the use of specialized algorithms, and visualization software. These models allow for prediction of how systems will react under different environments. This is useful for determining if a system is robust. A robust biological system is one that “maintain their state and functions against external and internal perturbations”, which is essential for a biological system to survive. Computational biomodeling generates a large archive of such data, allowing for analysis from multiple users. While current techniques focus on small biological systems, researchers are working on approaches that will allow for larger networks to be analyzed and modeled. A majority of researchers believe that this will be essential in developing modern medical approaches to creating new drugs and gene therapy. A useful modelling approach is to use Petri nets via tools such as esyN

Computational genomics

A partially sequenced genome.

Computational genomics is a field within genomics which studies the genomes of cells and organisms. It is sometimes referred to as Computational and Statistical Genetics and encompasses much of Bioinformatics. The Human Genome Project is one example of computational genomics. This project looks to sequence the entire human genome into a set of data. Once fully implemented, this could allow for doctors to analyze the genome of an individual patient. This opens the possibility of personalized medicine, prescribing treatments based on an individual's pre-existing genetic patterns. This project has created many similar programs. Researchers are looking to sequence the genomes of animals, plants, bacteria, and all other types of life.

One of the main ways that genomes are compared is by sequence homology. Homology is the study of biological structures and nucleotide sequences in different organisms that come from a common ancestor. Research suggests that between 80 and 90% of genes in newly sequenced prokaryotic genomes can be identified this way.

This field is still in development. An untouched project in the development of computational genomics is the analysis of intergenic regions. Studies show that roughly 97% of the human genome consists of these regions. Researchers in computational genomics are working on understanding the functions of non-coding regions of the human genome through the development of computational and statistical methods and via large consortia projects such as ENCODE (The Encyclopedia of DNA Elements) and the Roadmap Epigenomics Project.

Computational neuroscience

Computational neuroscience is the study of brain function in terms of the information processing properties of the structures that make up the nervous system. It is a subset of the field of neuroscience, and looks to analyze brain data to create practical applications. It looks to model the brain in order to examine specific types aspects of the neurological system. Various types of models of the brain include:

Realistic Brain Models: These models look to represent every aspect of the brain, including as much detail at the cellular level as possible. Realistic models provide the most information about the brain, but also have the largest margin for error. More variables in a brain model create the possibility for more error to occur. These models do not account for parts of the cellular structure that scientists do not know about. Realistic brain models are the most computationally heavy and the most expensive to implement.
Simplifying Brain Models: These models look to limit the scope of a model in order to assess a specific physical property of the neurological system. This allows for the intensive computational problems to be solved, and reduces the amount of potential error from a realistic brain model.

It is the work of computational neuroscientists to improve the algorithms and data structures currently used to increase the speed of such calculations.

Computational pharmacology

Computational pharmacology (from a computational biology perspective) is “the study of the effects of genomic data to find links between specific genotypes and diseases and then screening drug data”. The pharmaceutical industry requires a shift in methods to analyze drug data. Pharmacologists were able to use Microsoft Excel to compare chemical and genomic data related to the effectiveness of drugs. However, the industry has reached what is referred to as the Excel barricade. This arises from the limited number of cells accessible on a spreadsheet. This development led to the need for computational pharmacology. Scientists and researchers develop computational methods to analyze these massive data sets. This allows for an efficient comparison between the notable data points and allows for more accurate drugs to be developed.

Analysts project that if major medications fail due to patents, that computational biology will be necessary to replace current drugs on the market. Doctoral students in computational biology are being encouraged to pursue careers in industry rather than take Post-Doctoral positions. This is a direct result of major pharmaceutical companies needing more qualified analysts of the large data sets required for producing new drugs.

Computational evolutionary biology

Computational biology has assisted the field of evolutionary biology in many capacities. This includes:

Using DNA data to reconstruct the tree of life with computational phylogenetics
Fitting population genetics models (either forward time or backward time) to DNA data to make inferences about demographic or selective history
Building population genetics models of evolutionary systems from first principles in order to predict what is likely to evolve.

Cancer computational biology

Cancer computational biology is a field that aims to determine the future mutations in cancer through an algorithmic approach to analyzing data. Research in this field has led to the use of high-throughput measurement. High throughput measurement allows for the gathering of millions of data points using robotics and other sensing devices. This data is collected from DNA, RNA, and other biological structures. Areas of focus include determining the characteristics of tumors, analyzing molecules that are deterministic in causing cancer, and understanding how the human genome relates to the causation of tumors and cancer.

Computational neuropsychiatry

Computational neuropsychiatry is the emerging field that uses mathematical and computer-assisted modeling of brain mechanisms involved in mental disorders. It was already demonstrated by several initiatives that computational modeling is an important contribution to understand neuronal circuits that could generate mental functions and dysfunctions.

Software and tools

Computational Biologists use a wide range of software. These range from command line programs to graphical and web-based programs.

Open source software

Open source software provides a platform to develop computational biological methods. Specifically, open source means that every person and/or entity can access and benefit from software developed in research. PLOS cites four main reasons for the use of open source software including:

Reproducibility: This allows for researchers to use the exact methods used to calculate the relations between biological data.
Faster Development: developers and researchers do not have to reinvent existing code for minor tasks. Instead they can use pre-existing programs to save time on the development and implementation of larger projects.
Increased quality: Having input from multiple researchers studying the same topic provides a layer of assurance that errors will not be in the code.
Long-term availability: Open source programs are not tied to any businesses or patents. This allows for them to be posted to multiple web pages and ensure that they are available in the future.

Conferences

There are several large conferences that are concerned with computational biology. Some notable examples are Intelligent Systems for Molecular Biology (ISMB), European Conference on Computational Biology (ECCB) and Research in Computational Molecular Biology (RECOMB).

Journals

There are numerous journals dedicated to computational biology. Some notable examples include Journal of Computational Biology and PLOS Computational Biology. The PLOS computational biology journal is a peer-reviewed journal that has many notable research projects in the field of computational biology. They provide reviews on software, tutorials for open source software, and display information on upcoming computational biology conferences. PLOS Computational Biology is an open access journal. The publication may be openly used provided the author is cited.

Related fields

Computational biology, bioinformatics and mathematical biology are all interdisciplinary approaches to the life sciences that draw from quantitative disciplines such as mathematics and information science. The NIH describes computational/mathematical biology as the use of computational/mathematical approaches to address theoretical and experimental questions in biology and, by contrast, bioinformatics as the application of information science to understand complex life-sciences data.

Specifically, the NIH defines

Computational biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.

Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.

While each field is distinct, there may be significant overlap at their interface.

Superorganism

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Superorganism

A mound built by cathedral termites

A coral colony

A superorganism or supraorganism is a group of synergetically interacting organisms of the same species. A community of synergetically interacting organisms of different species is called a holobiont.

Concept

The term superorganism is used most often to describe a social unit of eusocial animals, where division of labour is highly specialised and where individuals are not able to survive by themselves for extended periods. Ants are the best-known example of such a superorganism. A superorganism can be defined as "a collection of agents which can act in concert to produce phenomena governed by the collective", phenomena being any activity "the hive wants" such as ants collecting food and avoiding predators, or bees choosing a new nest site. Superorganisms tend to exhibit homeostasis, power law scaling, persistent disequilibrium and emergent behaviours.

The term was coined in 1789 by James Hutton, the "father of geology", to refer to Earth in the context of geophysiology. The Gaia hypothesis of James Lovelock, and Lynn Margulis as well as the work of Hutton, Vladimir Vernadsky and Guy Murchie, have suggested that the biosphere itself can be considered a superorganism, although this has been disputed. This view relates to systems theory and the dynamics of a complex system.

The concept of a superorganism raises the question of what is to be considered an individual. Toby Tyrrell's critique of the Gaia hypothesis argues that Earth's climate system does not resemble an animal's physiological system. Planetary biospheres are not tightly regulated in the same way that animal bodies are: "planets, unlike animals, are not products of evolution. Therefore we are entitled to be highly skeptical (or even outright dismissive) about whether to expect something akin to a 'superorganism'". He concludes that "the superorganism analogy is unwarranted".

Some scientists have suggested that individual human beings can be thought of as "superorganisms"; as a typical human digestive system contains 10¹³ to 10¹⁴ microorganisms whose collective genome, the microbiome studied by the Human Microbiome Project, contains at least 100 times as many genes as the human genome itself. Salvucci wrote that superorganism is another level of integration that it is observed in nature. These levels include the genomic, the organismal and the ecological levels. The genomic structure of organism reveals the fundamental role of integration and gene shuffling along evolution.

In social theory

The nineteenth century thinker Herbert Spencer coined the term super-organic to focus on social organization (the first chapter of his Principles of Sociology is entitled "Super-organic Evolution"), though this was apparently a distinction between the organic and the social, not an identity: Spencer explored the holistic nature of society as a social organism while distinguishing the ways in which society did not behave like an organism. For Spencer, the super-organic was an emergent property of interacting organisms, that is, human beings. And, as has been argued by D. C. Phillips, there is a "difference between emergence and reductionism".

The economist Carl Menger expanded upon the evolutionary nature of much social growth, but without ever abandoning methodological individualism. Many social institutions arose, Menger argued, not as "the result of socially teleological causes, but the unintended result of innumerable efforts of economic subjects pursuing 'individual' interests".

Spencer and Menger both argued that because it is individuals who choose and act, any social whole should be considered less than an organism, though Menger emphasized this more emphatically. Spencer used the organistic idea to engage in extended analysis of social structure, conceding that it was primarily an analogy. So, for Spencer, the idea of the super-organic best designated a distinct level of social reality above that of biology and psychology, and not a one-to-one identity with an organism. Nevertheless, Spencer maintained that "every organism of appreciable size is a society", which has suggested to some that the issue may be terminological.

The term superorganic was adopted by the anthropologist Alfred L. Kroeber in 1917. Social aspects of the superorganism concept are analysed by Alan Marshall in his 2002 book "The Unity of Nature". Finally, recent work in social psychology has offered the superorganism metaphor as a unifying framework to understand diverse aspects of human sociality, such as religion, conformity, and social identity processes.

In cybernetics

Superorganisms are important in cybernetics, particularly biocybernetics. They are capable of the so-called "distributed intelligence", which is a system composed of individual agents that have limited intelligence and information. These are able to pool resources so that they are able to complete goals that are beyond reach of the individuals on their own. Existence of such behavior in organisms has many implications for military and management applications, and is being actively researched.

Superorganisms are also considered dependent upon cybernetic governance and processes. This is based on the idea that a biological system – in order to be effective – needs a sub-system of cybernetic communications and control. This is demonstrated in the way a mole rat colony uses functional synergy and cybernetic processes together.

Joel de Rosnay also introduced a concept called "cybionte" to describe cybernetic superorganism. This notion associate superorganism with chaos theory, multimedia technology, and other new developments.

Human Microbiome Project

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Human_Microbiome_Project

Human Microbiome Project (HMP)

Owner	US National Institutes of Health
Established	2007
Disestablished	2016
Website	hmpdacc.org

The Human Microbiome Project (HMP) was a United States National Institutes of Health (NIH) research initiative to improve understanding of the microbial flora involved in human health and disease. Launched in 2007, the first phase (HMP1) focused on identifying and characterizing human microbial flora. The second phase, known as the Integrative Human Microbiome Project (iHMP) launched in 2014 with the aim of generating resources to characterize the microbiome and elucidating the roles of microbes in health and disease states. The program received $170 million in funding by the NIH Common Fund from 2007 to 2016.

Important components of the HMP were culture-independent methods of microbial community characterization, such as metagenomics (which provides a broad genetic perspective on a single microbial community), as well as extensive whole genome sequencing (which provides a "deep" genetic perspective on certain aspects of a given microbial community, i.e. of individual bacterial species). The latter served as reference genomic sequences — 3000 such sequences of individual bacterial isolates are currently planned — for comparison purposes during subsequent metagenomic analysis. The project also financed deep sequencing of bacterial 16S rRNA sequences amplified by polymerase chain reaction from human subjects.

Introduction

Depiction of prevalences of various classes of bacteria at selected sites on human skin

Prior to the HMP launch, it was often reported in popular media and scientific literature that there are about 10 times as many microbial cells and 100 times as many microbial genes in the human body as there are human cells; this figure was based on estimates that the human microbiome includes around 100 trillion bacterial cells and an adult human typically has around 10 trillion human cells. In 2014 the American Academy of Microbiology published a FAQ that emphasized that the number of microbial cells and the number of human cells are both estimates, and noted that recent research had arrived at a new estimate of the number of human cells at around 37 trillion cells, meaning that the ratio of microbial to human cells is probably about 3:1. In 2016 another group published a new estimate of ratio as being roughly 1:1 (1.3:1, with "an uncertainty of 25% and a variation of 53% over the population of standard 70 kg males").

Despite the staggering number of microbes in and on the human body, little was known about their roles in human health and disease. Many of the organisms that make up the microbiome have not been successfully cultured, identified, or otherwise characterized. Organisms thought to be found in the human microbiome, however, may generally be categorized as bacteria, members of domain Archaea, yeasts, and single-celled eukaryotes as well as various helminth parasites and viruses, the latter including viruses that infect the cellular microbiome organisms (e.g., bacteriophages). The HMP set out to discover and characterize the human microbiome, emphasizing oral, skin, vaginal, gastrointestinal, and respiratory sites.

The HMP will address some of the most inspiring, vexing and fundamental scientific questions today. Importantly, it also has the potential to break down the artificial barriers between medical microbiology and environmental microbiology. It is hoped that the HMP will not only identify new ways to determine health and predisposition to diseases but also define the parameters needed to design, implement and monitor strategies for intentionally manipulating the human microbiota, to optimize its performance in the context of an individual's physiology.

The HMP has been described as "a logical conceptual and experimental extension of the Human Genome Project." In 2007 the HMP was listed on the NIH Roadmap for Medical Research as one of the New Pathways to Discovery. Organized characterization of the human microbiome is also being done internationally under the auspices of the International Human Microbiome Consortium. The Canadian Institutes of Health Research, through the CIHR Institute of Infection and Immunity, is leading the Canadian Microbiome Initiative to develop a coordinated and focused research effort to analyze and characterize the microbes that colonize the human body and their potential alteration during chronic disease states.

Contributing Institutions

The HMP involved participation from many research institutions, including Stanford University, the Broad Institute, Virginia Commonwealth University, Washington University, Northeastern University, MIT, the Baylor College of Medicine, and many others. Contributions included data evaluation, construction of reference sequence data sets, ethical and legal studies, technology development, and more.

Phase One (2007-2014)

The HMP1 included research efforts from many institutions. The HMP1 set the following goals:

Develop a reference set of microbial genome sequences and to perform preliminary characterization of the human microbiome
Explore the relationship between disease and changes in the human microbiome
Develop new technologies and tools for computational analysis
Establish a resource repository
Study the ethical, legal, and social implications of human microbiome research

Phase Two (2014-2016)

In 2014, the NIH launched the second phase of the project, known as the Integrative Human Microbiome Project (iHMP). The goal of the iHMP was to produce resources to create a complete characterization of the human microbiome, with a focus on understanding the presence of microbiota in health and disease states. The project mission, as stated by the NIH, was as follows:

The iHMP will create integrated longitudinal datasets of biological properties from both the microbiome and host from three different cohort studies of microbiome-associated conditions using multiple "omics" technologies.

The project encompassed three sub-projects carried out at multiple institutions. Study methods included 16S rRNA gene profiling, whole metagenome shotgun sequencing, whole genome sequencing, metatranscriptomics, metabolomics/lipidomics, and immunoproteomics. The key findings of the iHMP were published in 2019.

Pregnancy & Preterm Birth

The Vaginal Microbiome Consortium team at Virginia Commonwealth University led research on the Pregnancy & Preterm Birth project with a goal of understanding how the microbiome changes during the gestational period and influences the neonatal microbiome. The project was also concerned with the role of the microbiome in the occurrence of preterm births, which, according to the CDC, account for nearly 10% of all births and constitutes the second leading cause of neonatal death. The project received $7.44 million in NIH funding.

Onset of Inflammatory Bowel Disease (IBD)

The Inflammatory Bowel Disease Multi'omics Data (IBDMDB) team was a multi-institution group of researchers focused on understanding how the gut microbiome changes longitudinally in adults and children suffering from IBD. IBD is an inflammatory autoimmune disorder that manifests as either Crohn's disease or ulcerative colitis and affects about one million Americans. Research participants included cohorts from Massachusetts General Hospital, Emory University Hospital/Cincinnati Children's Hospital, and Cedars-Sinai Medical Center.

Onset of Type 2 Diabetes (T2D)

Researchers from Stanford University and the Jackson Laboratory of Genomic Medicine worked together to perform a longitudinal analysis on the biological processes that occur in the microbiome of patients at risk for Type 2 Diabetes. T2D affects nearly 20 million Americans with at least 79 million pre-diabetic patients, and is partially characterized by marked shifts in the microbiome compared to healthy individuals. The project aimed to identify molecules and signaling pathways that play a role in the etiology of the disease.

Achievements

The impact to date of the HMP may be partially assessed by examination of research sponsored by the HMP. Over 650 peer-reviewed publications were listed on the HMP website from June 2009 to the end of 2017, and had been cited over 70,000 times. At this point the website was archived and is no longer updated, although datasets do continue to be available.

Major categories of work funded by HMP included:

Development of new database systems allowing efficient organization, storage, access, search and annotation of massive amounts of data. These include IMG, the Integrated Microbial Genomes database and comparative analysis system; IMG/M, a related system that integrates metagenome data sets with isolate microbial genomes from the IMG system; CharProtDB, a database of experimentally characterized protein annotations; and the Genomes OnLine Database (GOLD), for monitoring the status of genomic and metagenomic projects worldwide and their associated metadata.
Development of tools for comparative analysis that facilitate the recognition of common patterns, major themes and trends in complex data sets. These include RAPSearch2, a fast and memory-efficient protein similarity search tool for next-generation sequencing data; Boulder ALignment Editor (ALE), a web-based RNA alignment tool; WebMGA, a customizable web server for fast metagenomic sequence analysis; and DNACLUST, a tool for accurate and efficient clustering of phylogenetic marker genes
Development of new methods and systems for assembly of massive sequence data sets. No single assembly algorithm addresses all the known problems of assembling short-length sequences, so next-generation assembly programs such as AMOS are modular, offering a wide range of tools for assembly. Novel algorithms have been developed for improving the quality and utility of draft genome sequences.
Assembly of a catalog of sequenced reference genomes of pure bacterial strains from multiple body sites, against which metagenomic results can be compared. The original goal of 600 genomes has been far surpassed; the current goal is for 3000 genomes to be in this reference catalog, sequenced to at least a high-quality draft stage. As of March 2012, 742 genomes have been cataloged.
Establishment of the Data Analysis and Coordination Center (DACC), which serves as the central repository for all HMP data.
Various studies exploring legal and ethical issues associated with whole genome sequencing research.

Developments funded by HMP included:

New predictive methods for identifying active transcription factor binding sites.
Identification, on the basis of bioinformatic evidence, of a widely distributed, ribosomally produced electron carrier precursor
Time-lapse "moving pictures" of the human microbiome.
Identification of unique adaptations adopted by segmented filamentous bacteria (SFB) in their role as gut commensals. SFB are medically important because they stimulate T helper 17 cells, thought to play a key role in autoimmune disease.
Identification of factors distinguishing the microbiota of healthy and diseased gut.
Identification of a hitherto unrecognized dominant role of Verrucomicrobia in soil bacterial communities.
Identification of factors determining the virulence potential of Gardnerella vaginalis strains in vaginosis.
Identification of a link between oral microbiota and atherosclerosis.
Demonstration that pathogenic species of Neisseria involved in meningitis, sepsis, and sexually transmitted disease exchange virulence factors with commensal species.

Milestones

Reference database established

On 13 June 2012, a major milestone of the HMP was announced by the NIH director Francis Collins. The announcement was accompanied with a series of coordinated articles published in Nature and several journals including the Public Library of Science (PLoS) on the same day. By mapping the normal microbial make-up of healthy humans using genome sequencing techniques, the researchers of the HMP have created a reference database and the boundaries of normal microbial variation in humans.

From 242 healthy U.S. volunteers, more than 5,000 samples were collected from tissues from 15 (men) to 18 (women) body sites such as mouth, nose, skin, lower intestine (stool) and vagina. All the DNA, human and microbial, were analyzed with DNA sequencing machines. The microbial genome data were extracted by identifying the bacterial specific ribosomal RNA, 16S rRNA. The researchers calculated that more than 10,000 microbial species occupy the human ecosystem and they have identified 81 – 99% of the genera. In addition to establishing the human microbiome reference database, the HMP project also discovered several "surprises", which include:

Microbes contribute more genes responsible for human survival than humans' own genes. It is estimated that bacterial protein-coding genes are 360 times more abundant than human genes.
Microbial metabolic activities; for example, digestion of fats; are not always provided by the same bacterial species. The presence of the activities seems to matter more.
Components of the human microbiome change over time, affected by a patient disease state and medication. However, the microbiome eventually returns to a state of equilibrium, even though the composition of bacterial types has changed.

Clinical application

Among the first clinical applications utilizing the HMP data, as reported in several PLoS papers, the researchers found a shift to less species diversity in vaginal microbiome of pregnant women in preparation for birth, and high viral DNA load in the nasal microbiome of children with unexplained fevers. Other studies using the HMP data and techniques include role of microbiome in various diseases in the digestive tract, skin, reproductive organs and childhood disorders.

Pharmaceutical application

Pharmaceutical microbiologists have considered the implications of the HMP data in relation to the presence / absence of 'objectionable' microorganisms in non-sterile pharmaceutical products and in relation to the monitoring of microorganisms within the controlled environments in which products are manufactured. The latter also has implications for media selection and disinfectant efficacy studies.

Epidemiology

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Epidemiology

Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and determinants of health and disease conditions in defined populations.

It is a cornerstone of public health, and shapes policy decisions and evidence-based practice by identifying risk factors for disease and targets for preventive healthcare. Epidemiologists help with study design, collection, and statistical analysis of data, amend interpretation and dissemination of results (including peer review and occasional systematic review). Epidemiology has helped develop methodology used in clinical research, public health studies, and, to a lesser extent, basic research in the biological sciences.

Major areas of epidemiological study include disease causation, transmission, outbreak investigation, disease surveillance, environmental epidemiology, forensic epidemiology, occupational epidemiology, screening, biomonitoring, and comparisons of treatment effects such as in clinical trials. Epidemiologists rely on other scientific disciplines like biology to better understand disease processes, statistics to make efficient use of the data and draw appropriate conclusions, social sciences to better understand proximate and distal causes, and engineering for exposure assessment.

Epidemiology, literally meaning "the study of what is upon the people", is derived from Greek epi 'upon, among', demos 'people, district', and logos 'study, word, discourse', suggesting that it applies only to human populations. However, the term is widely used in studies of zoological populations (veterinary epidemiology), although the term "epizoology" is available, and it has also been applied to studies of plant populations (botanical or plant disease epidemiology).

The distinction between "epidemic" and "endemic" was first drawn by Hippocrates, to distinguish between diseases that are "visited upon" a population (epidemic) from those that "reside within" a population (endemic). The term "epidemiology" appears to have first been used to describe the study of epidemics in 1802 by the Spanish physician Villalba in Epidemiología Española. Epidemiologists also study the interaction of diseases in a population, a condition known as a syndemic.

The term epidemiology is now widely applied to cover the description and causation of not only epidemic disease, but of disease in general, and even many non-disease, health-related conditions, such as high blood pressure, depression and obesity. Therefore, this epidemiology is based upon how the pattern of the disease causes change in the function of human beings.

History

The Greek physician Hippocrates, known as the father of medicine, sought a logic to sickness; he is the first person known to have examined the relationships between the occurrence of disease and environmental influences. Hippocrates believed sickness of the human body to be caused by an imbalance of the four humors (black bile, yellow bile, blood, and phlegm). The cure to the sickness was to remove or add the humor in question to balance the body. This belief led to the application of bloodletting and dieting in medicine. He coined the terms endemic (for diseases usually found in some places but not in others) and epidemic (for diseases that are seen at some times but not others).

Modern era

In the middle of the 16th century, a doctor from Verona named Girolamo Fracastoro was the first to propose a theory that these very small, unseeable, particles that cause disease were alive. They were considered to be able to spread by air, multiply by themselves and to be destroyable by fire. In this way he refuted Galen's miasma theory (poison gas in sick people). In 1543 he wrote a book De contagione et contagiosis morbis, in which he was the first to promote personal and environmental hygiene to prevent disease. The development of a sufficiently powerful microscope by Antonie van Leeuwenhoek in 1675 provided visual evidence of living particles consistent with a germ theory of disease.

A physician ahead of his time, Quinto Tiberio Angelerio, managed the 1582 plague in the town of Alghero, Sardinia. He was fresh from Sicily, which had endured a plague epidemic of its own in 1575. Later he published a manual "ECTYPA PESTILENSIS STATUS ALGHERIAE SARDINIAE", detailing the 57 rules he had imposed upon the city. A second edition, "EPIDEMIOLOGIA, SIVE TRACTATUS DE PESTE" was published in 1598. Some of the rules he instituted, several as unpopular then as they are today, included lockdowns, physical distancing, washing groceries and textiles, restricting shopping to one person per household, quarantines, health passports, and others. Taken from Zaria Gorvett, BBC FUTURE 8th Jan 2021.

During the Ming Dynasty, Wu Youke (1582–1652) developed the idea that some diseases were caused by transmissible agents, which he called Li Qi (戾气 or pestilential factors) when he observed various epidemics rage around him between 1641 and 1644. His book Wen Yi Lun (瘟疫论，Treatise on Pestilence/Treatise of Epidemic Diseases) can be regarded as the main etiological work that brought forward the concept. His concepts were still being considered in analysing SARS outbreak by WHO in 2004 in the context of traditional Chinese medicine.

Another pioneer, Thomas Sydenham (1624–1689), was the first to distinguish the fevers of Londoners in the later 1600s. His theories on cures of fevers met with much resistance from traditional physicians at the time. He was not able to find the initial cause of the smallpox fever he researched and treated.

John Graunt, a haberdasher and amateur statistician, published Natural and Political Observations ... upon the Bills of Mortality in 1662. In it, he analysed the mortality rolls in London before the Great Plague, presented one of the first life tables, and reported time trends for many diseases, new and old. He provided statistical evidence for many theories on disease, and also refuted some widespread ideas on them.

Original map by John Snow showing the clusters of cholera cases in the London epidemic of 1854.

John Snow is famous for his investigations into the causes of the 19th-century cholera epidemics, and is also known as the father of (modern) epidemiology. He began with noticing the significantly higher death rates in two areas supplied by Southwark Company. His identification of the Broad Street pump as the cause of the Soho epidemic is considered the classic example of epidemiology. Snow used chlorine in an attempt to clean the water and removed the handle; this ended the outbreak. This has been perceived as a major event in the history of public health and regarded as the founding event of the science of epidemiology, having helped shape public health policies around the world. However, Snow's research and preventive measures to avoid further outbreaks were not fully accepted or put into practice until after his death due to the prevailing Miasma Theory of the time, a model of disease in which poor air quality was blamed for illness. This was used to rationalize high rates of infection in impoverished areas instead of addressing the underlying issues of poor nutrition and sanitation, and was proven false by his work.

Other pioneers include Danish physician Peter Anton Schleisner, who in 1849 related his work on the prevention of the epidemic of neonatal tetanus on the Vestmanna Islands in Iceland. Another important pioneer was Hungarian physician Ignaz Semmelweis, who in 1847 brought down infant mortality at a Vienna hospital by instituting a disinfection procedure. His findings were published in 1850, but his work was ill-received by his colleagues, who discontinued the procedure. Disinfection did not become widely practiced until British surgeon Joseph Lister 'discovered' antiseptics in 1865 in light of the work of Louis Pasteur.

In the early 20th century, mathematical methods were introduced into epidemiology by Ronald Ross, Janet Lane-Claypon, Anderson Gray McKendrick, and others.

Another breakthrough was the 1954 publication of the results of a British Doctors Study, led by Richard Doll and Austin Bradford Hill, which lent very strong statistical support to the link between tobacco smoking and lung cancer.

In the late 20th century, with the advancement of biomedical sciences, a number of molecular markers in blood, other biospecimens and environment were identified as predictors of development or risk of a certain disease. Epidemiology research to examine the relationship between these biomarkers analyzed at the molecular level and disease was broadly named "molecular epidemiology". Specifically, "genetic epidemiology" has been used for epidemiology of germline genetic variation and disease. Genetic variation is typically determined using DNA from peripheral blood leukocytes.

21st century

Since the 2000s, genome-wide association studies (GWAS) have been commonly performed to identify genetic risk factors for many diseases and health conditions.

While most molecular epidemiology studies are still using conventional disease diagnosis and classification systems, it is increasingly recognized that disease progression represents inherently heterogeneous processes differing from person to person. Conceptually, each individual has a unique disease process different from any other individual ("the unique disease principle"), considering uniqueness of the exposome (a totality of endogenous and exogenous / environmental exposures) and its unique influence on molecular pathologic process in each individual. Studies to examine the relationship between an exposure and molecular pathologic signature of disease (particularly cancer) became increasingly common throughout the 2000s. However, the use of molecular pathology in epidemiology posed unique challenges, including lack of research guidelines and standardized statistical methodologies, and paucity of interdisciplinary experts and training programs. Furthermore, the concept of disease heterogeneity appears to conflict with the long-standing premise in epidemiology that individuals with the same disease name have similar etiologies and disease processes. To resolve these issues and advance population health science in the era of molecular precision medicine, "molecular pathology" and "epidemiology" was integrated to create a new interdisciplinary field of "molecular pathological epidemiology" (MPE), defined as "epidemiology of molecular pathology and heterogeneity of disease". In MPE, investigators analyze the relationships between (A) environmental, dietary, lifestyle and genetic factors; (B) alterations in cellular or extracellular molecules; and (C) evolution and progression of disease. A better understanding of heterogeneity of disease pathogenesis will further contribute to elucidate etiologies of disease. The MPE approach can be applied to not only neoplastic diseases but also non-neoplastic diseases. The concept and paradigm of MPE have become widespread in the 2010s.

By 2012 it was recognized that many pathogens' evolution is rapid enough to be highly relevant to epidemiology, and that therefore much could be gained from an interdisciplinary approach to infectious disease integrating epidemiology and molecular evolution to "inform control strategies, or even patient treatment."

Modern epidemiological studies can use advanced statistics and machine learning to create predictive models as well as to define treatment effects.

Types of studies

Epidemiologists employ a range of study designs from the observational to experimental and generally categorized as descriptive (involving the assessment of data covering time, place, and person), analytic (aiming to further examine known associations or hypothesized relationships), and experimental (a term often equated with clinical or community trials of treatments and other interventions). In observational studies, nature is allowed to "take its course," as epidemiologists observe from the sidelines. Conversely, in experimental studies, the epidemiologist is the one in control of all of the factors entering a certain case study. Epidemiological studies are aimed, where possible, at revealing unbiased relationships between exposures such as alcohol or smoking, biological agents, stress, or chemicals to mortality or morbidity. The identification of causal relationships between these exposures and outcomes is an important aspect of epidemiology. Modern epidemiologists use informatics as a tool.

Observational studies have two components, descriptive and analytical. Descriptive observations pertain to the "who, what, where and when of health-related state occurrence". However, analytical observations deal more with the ‘how’ of a health-related event. Experimental epidemiology contains three case types: randomized controlled trials (often used for new medicine or drug testing), field trials (conducted on those at a high risk of contracting a disease), and community trials (research on social originating diseases).

The term 'epidemiologic triad' is used to describe the intersection of Host, Agent, and Environment in analyzing an outbreak.

Case series

Case-series may refer to the qualitative study of the experience of a single patient, or small group of patients with a similar diagnosis, or to a statistical factor with the potential to produce illness with periods when they are unexposed.

The former type of study is purely descriptive and cannot be used to make inferences about the general population of patients with that disease. These types of studies, in which an astute clinician identifies an unusual feature of a disease or a patient's history, may lead to a formulation of a new hypothesis. Using the data from the series, analytic studies could be done to investigate possible causal factors. These can include case-control studies or prospective studies. A case-control study would involve matching comparable controls without the disease to the cases in the series. A prospective study would involve following the case series over time to evaluate the disease's natural history.

The latter type, more formally described as self-controlled case-series studies, divide individual patient follow-up time into exposed and unexposed periods and use fixed-effects Poisson regression processes to compare the incidence rate of a given outcome between exposed and unexposed periods. This technique has been extensively used in the study of adverse reactions to vaccination and has been shown in some circumstances to provide statistical power comparable to that available in cohort studies.

Case-control studies

Case-control studies select subjects based on their disease status. It is a retrospective study. A group of individuals that are disease positive (the "case" group) is compared with a group of disease negative individuals (the "control" group). The control group should ideally come from the same population that gave rise to the cases. The case-control study looks back through time at potential exposures that both groups (cases and controls) may have encountered. A 2×2 table is constructed, displaying exposed cases (A), exposed controls (B), unexposed cases (C) and unexposed controls (D). The statistic generated to measure association is the odds ratio (OR), which is the ratio of the odds of exposure in the cases (A/C) to the odds of exposure in the controls (B/D), i.e. OR = (AD/BC).

	Cases	Controls
Exposed	A	B
Unexposed	C	D

If the OR is significantly greater than 1, then the conclusion is "those with the disease are more likely to have been exposed," whereas if it is close to 1 then the exposure and disease are not likely associated. If the OR is far less than one, then this suggests that the exposure is a protective factor in the causation of the disease. Case-control studies are usually faster and more cost-effective than cohort studies but are sensitive to bias (such as recall bias and selection bias). The main challenge is to identify the appropriate control group; the distribution of exposure among the control group should be representative of the distribution in the population that gave rise to the cases. This can be achieved by drawing a random sample from the original population at risk. This has as a consequence that the control group can contain people with the disease under study when the disease has a high attack rate in a population.

A major drawback for case control studies is that, in order to be considered to be statistically significant, the minimum number of cases required at the 95% confidence interval is related to the odds ratio by the equation:

{\displaystyle {\text{total cases}}=A+C=1.96^{2}(1+N)\left({\frac {1}{\ln(OR)}}\right)^{2}\left({\frac {OR+2{\sqrt {OR}}+1}{\sqrt {OR}}}\right)\approx 15.5(1+N)\left({\frac {1}{\ln(OR)}}\right)^{2}}

where N is the ratio of cases to controls. As the odds ratio approached 1, approaches 0; rendering case-control studies all but useless for low odds ratios. For instance, for an odds ratio of 1.5 and cases = controls, the table shown above would look like this:

	Cases	Controls
Exposed	103	84
Unexposed	84	103

For an odds ratio of 1.1:

	Cases	Controls
Exposed	1732	1652
Unexposed	1652	1732

Cohort studies

Cohort studies select subjects based on their exposure status. The study subjects should be at risk of the outcome under investigation at the beginning of the cohort study; this usually means that they should be disease free when the cohort study starts. The cohort is followed through time to assess their later outcome status. An example of a cohort study would be the investigation of a cohort of smokers and non-smokers over time to estimate the incidence of lung cancer. The same 2×2 table is constructed as with the case control study. However, the point estimate generated is the relative risk (RR), which is the probability of disease for a person in the exposed group, P_e = A / (A + B) over the probability of disease for a person in the unexposed group, P_u = C / (C + D), i.e. RR = P_e / P_u.

.....	Case	Non-case	Total
Exposed	A	B	(A + B)
Unexposed	C	D	(C + D)

As with the OR, a RR greater than 1 shows association, where the conclusion can be read "those with the exposure were more likely to develop disease."

Prospective studies have many benefits over case control studies. The RR is a more powerful effect measure than the OR, as the OR is just an estimation of the RR, since true incidence cannot be calculated in a case control study where subjects are selected based on disease status. Temporality can be established in a prospective study, and confounders are more easily controlled for. However, they are more costly, and there is a greater chance of losing subjects to follow-up based on the long time period over which the cohort is followed.

Cohort studies also are limited by the same equation for number of cases as for cohort studies, but, if the base incidence rate in the study population is very low, the number of cases required is reduced by ½.

Causal inference

Although epidemiology is sometimes viewed as a collection of statistical tools used to elucidate the associations of exposures to health outcomes, a deeper understanding of this science is that of discovering causal relationships.

"Correlation does not imply causation" is a common theme for much of the epidemiological literature. For epidemiologists, the key is in the term inference. Correlation, or at least association between two variables, is a necessary but not sufficient criterion for inference that one variable causes the other. Epidemiologists use gathered data and a broad range of biomedical and psychosocial theories in an iterative way to generate or expand theory, to test hypotheses, and to make educated, informed assertions about which relationships are causal, and about exactly how they are causal.

Epidemiologists emphasize that the "one cause – one effect" understanding is a simplistic mis-belief. Most outcomes, whether disease or death, are caused by a chain or web consisting of many component causes. Causes can be distinguished as necessary, sufficient or probabilistic conditions. If a necessary condition can be identified and controlled (e.g., antibodies to a disease agent, energy in an injury), the harmful outcome can be avoided (Robertson, 2015). One tool regularly used to conceptualize the multicausality associated with disease is the causal pie model.

Bradford Hill criteria

In 1965, Austin Bradford Hill proposed a series of considerations to help assess evidence of causation, which have come to be commonly known as the "Bradford Hill criteria". In contrast to the explicit intentions of their author, Hill's considerations are now sometimes taught as a checklist to be implemented for assessing causality. Hill himself said "None of my nine viewpoints can bring indisputable evidence for or against the cause-and-effect hypothesis and none can be required sine qua non."

Strength of Association: A small association does not mean that there is not a causal effect, though the larger the association, the more likely that it is causal.
Consistency of Data: Consistent findings observed by different persons in different places with different samples strengthens the likelihood of an effect.
Specificity: Causation is likely if a very specific population at a specific site and disease with no other likely explanation. The more specific an association between a factor and an effect is, the bigger the probability of a causal relationship.
Temporality: The effect has to occur after the cause (and if there is an expected delay between the cause and expected effect, then the effect must occur after that delay).
Biological gradient: Greater exposure should generally lead to greater incidence of the effect. However, in some cases, the mere presence of the factor can trigger the effect. In other cases, an inverse proportion is observed: greater exposure leads to lower incidence.
Plausibility: A plausible mechanism between cause and effect is helpful (but Hill noted that knowledge of the mechanism is limited by current knowledge).
Coherence: Coherence between epidemiological and laboratory findings increases the likelihood of an effect. However, Hill noted that "... lack of such [laboratory] evidence cannot nullify the epidemiological effect on associations".
Experiment: "Occasionally it is possible to appeal to experimental evidence".
Analogy: The effect of similar factors may be considered.

Legal interpretation

Epidemiological studies can only go to prove that an agent could have caused, but not that it did cause, an effect in any particular case:

"Epidemiology is concerned with the incidence of disease in populations and does not address the question of the cause of an individual's disease. This question, sometimes referred to as specific causation, is beyond the domain of the science of epidemiology. Epidemiology has its limits at the point where an inference is made that the relationship between an agent and a disease is causal (general causation) and where the magnitude of excess risk attributed to the agent has been determined; that is, epidemiology addresses whether an agent can cause a disease, not whether an agent did cause a specific plaintiff's disease."

In United States law, epidemiology alone cannot prove that a causal association does not exist in general. Conversely, it can be (and is in some circumstances) taken by US courts, in an individual case, to justify an inference that a causal association does exist, based upon a balance of probability.

The subdiscipline of forensic epidemiology is directed at the investigation of specific causation of disease or injury in individuals or groups of individuals in instances in which causation is disputed or is unclear, for presentation in legal settings.

Population-based health management

Epidemiological practice and the results of epidemiological analysis make a significant contribution to emerging population-based health management frameworks.

Population-based health management encompasses the ability to:

Assess the health states and health needs of a target population;
Implement and evaluate interventions that are designed to improve the health of that population; and
Efficiently and effectively provide care for members of that population in a way that is consistent with the community's cultural, policy and health resource values.

Modern population-based health management is complex, requiring a multiple set of skills (medical, political, technological, mathematical, etc.) of which epidemiological practice and analysis is a core component, that is unified with management science to provide efficient and effective health care and health guidance to a population. This task requires the forward-looking ability of modern risk management approaches that transform health risk factors, incidence, prevalence and mortality statistics (derived from epidemiological analysis) into management metrics that not only guide how a health system responds to current population health issues but also how a health system can be managed to better respond to future potential population health issues.

Examples of organizations that use population-based health management that leverage the work and results of epidemiological practice include Canadian Strategy for Cancer Control, Health Canada Tobacco Control Programs, Rick Hansen Foundation, Canadian Tobacco Control Research Initiative.

Each of these organizations uses a population-based health management framework called Life at Risk that combines epidemiological quantitative analysis with demographics, health agency operational research and economics to perform:

Population Life Impacts Simulations: Measurement of the future potential impact of disease upon the population with respect to new disease cases, prevalence, premature death as well as potential years of life lost from disability and death;
Labour Force Life Impacts Simulations: Measurement of the future potential impact of disease upon the labour force with respect to new disease cases, prevalence, premature death and potential years of life lost from disability and death;
Economic Impacts of Disease Simulations: Measurement of the future potential impact of disease upon private sector disposable income impacts (wages, corporate profits, private health care costs) and public sector disposable income impacts (personal income tax, corporate income tax, consumption taxes, publicly funded health care costs).

Applied field epidemiology

Applied epidemiology is the practice of using epidemiological methods to protect or improve the health of a population. Applied field epidemiology can include investigating communicable and non-communicable disease outbreaks, mortality and morbidity rates, and nutritional status, among other indicators of health, with the purpose of communicating the results to those who can implement appropriate policies or disease control measures.

Humanitarian context

As the surveillance and reporting of diseases and other health factors become increasingly difficult in humanitarian crisis situations, the methodologies used to report the data are compromised. One study found that less than half (42.4%) of nutrition surveys sampled from humanitarian contexts correctly calculated the prevalence of malnutrition and only one-third (35.3%) of the surveys met the criteria for quality. Among the mortality surveys, only 3.2% met the criteria for quality. As nutritional status and mortality rates help indicate the severity of a crisis, the tracking and reporting of these health factors is crucial.

Vital registries are usually the most effective ways to collect data, but in humanitarian contexts these registries can be non-existent, unreliable, or inaccessible. As such, mortality is often inaccurately measured using either prospective demographic surveillance or retrospective mortality surveys. Prospective demographic surveillance requires much manpower and is difficult to implement in a spread-out population. Retrospective mortality surveys are prone to selection and reporting biases. Other methods are being developed, but are not common practice yet.

Validity: precision and bias

Different fields in epidemiology have different levels of validity. One way to assess the validity of findings is the ratio of false-positives (claimed effects that are not correct) to false-negatives (studies which fail to support a true effect). To take the field of genetic epidemiology, candidate-gene studies produced over 100 false-positive findings for each false-negative. By contrast genome-wide association appear close to the reverse, with only one false positive for every 100 or more false-negatives. This ratio has improved over time in genetic epidemiology as the field has adopted stringent criteria. By contrast, other epidemiological fields have not required such rigorous reporting and are much less reliable as a result.

Random error

Random error is the result of fluctuations around a true value because of sampling variability. Random error is just that: random. It can occur during data collection, coding, transfer, or analysis. Examples of random error include: poorly worded questions, a misunderstanding in interpreting an individual answer from a particular respondent, or a typographical error during coding. Random error affects measurement in a transient, inconsistent manner and it is impossible to correct for random error.

There is random error in all sampling procedures. This is called sampling error.

Precision in epidemiological variables is a measure of random error. Precision is also inversely related to random error, so that to reduce random error is to increase precision. Confidence intervals are computed to demonstrate the precision of relative risk estimates. The narrower the confidence interval, the more precise the relative risk estimate.

There are two basic ways to reduce random error in an epidemiological study. The first is to increase the sample size of the study. In other words, add more subjects to your study. The second is to reduce the variability in measurement in the study. This might be accomplished by using a more precise measuring device or by increasing the number of measurements.

Note, that if sample size or number of measurements are increased, or a more precise measuring tool is purchased, the costs of the study are usually increased. There is usually an uneasy balance between the need for adequate precision and the practical issue of study cost.

Systematic error

A systematic error or bias occurs when there is a difference between the true value (in the population) and the observed value (in the study) from any cause other than sampling variability. An example of systematic error is if, unknown to you, the pulse oximeter you are using is set incorrectly and adds two points to the true value each time a measurement is taken. The measuring device could be precise but not accurate. Because the error happens in every instance, it is systematic. Conclusions you draw based on that data will still be incorrect. But the error can be reproduced in the future (e.g., by using the same mis-set instrument).

A mistake in coding that affects all responses for that particular question is another example of a systematic error.

The validity of a study is dependent on the degree of systematic error. Validity is usually separated into two components:

Internal validity is dependent on the amount of error in measurements, including exposure, disease, and the associations between these variables. Good internal validity implies a lack of error in measurement and suggests that inferences may be drawn at least as they pertain to the subjects under study.
External validity pertains to the process of generalizing the findings of the study to the population from which the sample was drawn (or even beyond that population to a more universal statement). This requires an understanding of which conditions are relevant (or irrelevant) to the generalization. Internal validity is clearly a prerequisite for external validity.

Selection bias

Selection bias occurs when study subjects are selected or become part of the study as a result of a third, unmeasured variable which is associated with both the exposure and outcome of interest. For instance, it has repeatedly been noted that cigarette smokers and non smokers tend to differ in their study participation rates. (Sackett D cites the example of Seltzer et al., in which 85% of non smokers and 67% of smokers returned mailed questionnaires.) It is important to note that such a difference in response will not lead to bias if it is not also associated with a systematic difference in outcome between the two response groups.

Information bias

Information bias is bias arising from systematic error in the assessment of a variable. An example of this is recall bias. A typical example is again provided by Sackett in his discussion of a study examining the effect of specific exposures on fetal health: "in questioning mothers whose recent pregnancies had ended in fetal death or malformation (cases) and a matched group of mothers whose pregnancies ended normally (controls) it was found that 28% of the former, but only 20% of the latter, reported exposure to drugs which could not be substantiated either in earlier prospective interviews or in other health records". In this example, recall bias probably occurred as a result of women who had had miscarriages having an apparent tendency to better recall and therefore report previous exposures.

Confounding

Confounding has traditionally been defined as bias arising from the co-occurrence or mixing of effects of extraneous factors, referred to as confounders, with the main effect(s) of interest. A more recent definition of confounding invokes the notion of counterfactual effects. According to this view, when one observes an outcome of interest, say Y=1 (as opposed to Y=0), in a given population A which is entirely exposed (i.e. exposure X = 1 for every unit of the population) the risk of this event will be R_A1. The counterfactual or unobserved risk R_A0 corresponds to the risk which would have been observed if these same individuals had been unexposed (i.e. X = 0 for every unit of the population). The true effect of exposure therefore is: R_A1 − R_A0 (if one is interested in risk differences) or R_A1/R_A0 (if one is interested in relative risk). Since the counterfactual risk R_A0 is unobservable we approximate it using a second population B and we actually measure the following relations: R_A1 − R_B0 or R_A1/R_B0. In this situation, confounding occurs when R_A0 ≠ R_B0. (NB: Example assumes binary outcome and exposure variables.)

Some epidemiologists prefer to think of confounding separately from common categorizations of bias since, unlike selection and information bias, confounding stems from real causal effects.

The profession

Few universities have offered epidemiology as a course of study at the undergraduate level. One notable undergraduate program exists at Johns Hopkins University, where students who major in public health can take graduate level courses, including epidemiology, during their senior year at the Bloomberg School of Public Health.

Although epidemiologic research is conducted by individuals from diverse disciplines, including clinically trained professionals such as physicians, formal training is available through Masters or Doctoral programs including Master of Public Health (MPH), Master of Science of Epidemiology (MSc.), Doctor of Public Health (DrPH), Doctor of Pharmacy (PharmD), Doctor of Philosophy (PhD), Doctor of Science (ScD). Many other graduate programs, e.g., Doctor of Social Work (DSW), Doctor of Clinical Practice (DClinP), Doctor of Podiatric Medicine (DPM), Doctor of Veterinary Medicine (DVM), Doctor of Nursing Practice (DNP), Doctor of Physical Therapy (DPT), or for clinically trained physicians, Doctor of Medicine (MD) or Bachelor of Medicine and Surgery (MBBS or MBChB) and Doctor of Osteopathic Medicine (DO), include some training in epidemiologic research or related topics, but this training is generally substantially less than offered in training programs focused on epidemiology or public health. Reflecting the strong historical tie between epidemiology and medicine, formal training programs may be set in either schools of public health and medical schools.

As public health/health protection practitioners, epidemiologists work in a number of different settings. Some epidemiologists work 'in the field'; i.e., in the community, commonly in a public health/health protection service, and are often at the forefront of investigating and combating disease outbreaks. Others work for non-profit organizations, universities, hospitals and larger government entities such as state and local health departments, various Ministries of Health, Doctors without Borders, the Centers for Disease Control and Prevention (CDC), the Health Protection Agency, the World Health Organization (WHO), or the Public Health Agency of Canada. Epidemiologists can also work in for-profit organizations such as pharmaceutical and medical device companies in groups such as market research or clinical development.

COVID-19

An April 2020 University of Southern California article noted that "The coronavirus epidemic... thrust epidemiology – the study of the incidence, distribution and control of disease in a population – to the forefront of scientific disciplines across the globe and even made temporary celebrities out of some of its practitioners."

On June 8, 2020, The New York Times published results of its survey of 511 epidemiologists asked "when they expect to resume 20 activities of daily life"; 52% of those surveyed expected to stop "routinely wearing a face covering" in one year or more.

Search This Blog

Thursday, March 11, 2021

Computational biology

Introduction

Subfields

Computational anatomy

Computational biomodeling

Computational genomics

Computational neuroscience

Computational pharmacology

Computational evolutionary biology

Cancer computational biology

Computational neuropsychiatry

Software and tools

Open source software

Conferences

Journals

Related fields

Superorganism

Concept

In social theory

In cybernetics

Human Microbiome Project

Introduction

Contributing Institutions

Phase One (2007-2014)

Phase Two (2014-2016)

Pregnancy & Preterm Birth

Onset of Inflammatory Bowel Disease (IBD)

Onset of Type 2 Diabetes (T2D)

Achievements

Milestones

Reference database established

Clinical application

Pharmaceutical application

Epidemiology

History

Modern era

21st century

Types of studies

Case series

Case-control studies

Cohort studies

Causal inference

Bradford Hill criteria

Legal interpretation

Population-based health management

Applied field epidemiology

Humanitarian context

Validity: precision and bias

Random error

Systematic error

Selection bias

Information bias

Confounding

The profession

COVID-19

Cognitive biology