A Medley of Potpourri

Thursday, March 11, 2021

Biostatistics

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Biostatistics

Biostatistics are the development and application of statistical methods to a wide range of topics in biology. It encompasses the design of biological experiments, the collection and analysis of data from those experiments and the interpretation of the results.

History

Biostatistics and Genetics

Biostatistical modeling forms an important part of numerous modern biological theories. Genetics studies, since its beginning, used statistical concepts to understand observed experimental results. Some genetics scientists even contributed with statistical advances with the development of methods and tools. Gregor Mendel started the genetics studies investigating genetics segregation patterns in families of peas and used statistics to explain the collected data. In the early 1900s, after the rediscovery of Mendel's Mendelian inheritance work, there were gaps in understanding between genetics and evolutionary Darwinism. Francis Galton tried to expand Mendel's discoveries with human data and proposed a different model with fractions of the heredity coming from each ancestral composing an infinite series. He called this the theory of "Law of Ancestral Heredity". His ideas were strongly disagreed by William Bateson, who followed Mendel's conclusions, that genetic inheritance were exclusively from the parents, half from each of them. This led to a vigorous debate between the biometricians, who supported Galton's ideas, as Walter Weldon, Arthur Dukinfield Darbishire and Karl Pearson, and Mendelians, who supported Bateson's (and Mendel's) ideas, such as Charles Davenport and Wilhelm Johannsen. Later, biometricians could not reproduce Galton conclusions in different experiments, and Mendel's ideas prevailed. By the 1930s, models built on statistical reasoning had helped to resolve these differences and to produce the neo-Darwinian modern evolutionary synthesis.

Solving these differences also allowed to define the concept of population genetics and brought together genetics and evolution. The three leading figures in the establishment of population genetics and this synthesis all relied on statistics and developed its use in biology.

Ronald Fisher developed several basic statistical methods in support of his work studying the crop experiments at Rothamsted Research, including in his books Statistical Methods for Research Workers (1925) end The Genetical Theory of Natural Selection (1930). He gave many contributions to genetics and statistics. Some of them include the ANOVA, p-value concepts, Fisher's exact test and Fisher's equation for population dynamics. He is credited for the sentence “Natural selection is a mechanism for generating an exceedingly high degree of improbability”.
Sewall G. Wright developed F-statistics and methods of computing them and defined inbreeding coefficient.
J. B. S. Haldane's book, The Causes of Evolution, reestablished natural selection as the premier mechanism of evolution by explaining it in terms of the mathematical consequences of Mendelian genetics. Also developed the theory of primordial soup.

These and other biostatisticians, mathematical biologists, and statistically inclined geneticists helped bring together evolutionary biology and genetics into a consistent, coherent whole that could begin to be quantitatively modeled.

In parallel to this overall development, the pioneering work of D'Arcy Thompson in On Growth and Form also helped to add quantitative discipline to biological study.

Despite the fundamental importance and frequent necessity of statistical reasoning, there may nonetheless have been a tendency among biologists to distrust or deprecate results which are not qualitatively apparent. One anecdote describes Thomas Hunt Morgan banning the Friden calculator from his department at Caltech, saying "Well, I am like a guy who is prospecting for gold along the banks of the Sacramento River in 1849. With a little intelligence, I can reach down and pick up big nuggets of gold. And as long as I can do that, I'm not going to let any people in my department waste scarce resources in placer mining."

Research planning

Any research in life sciences is proposed to answer a scientific question we might have. To answer this question with a high certainty, we need accurate results. The correct definition of the main hypothesis and the research plan will reduce errors while taking a decision in understanding a phenomenon. The research plan might include the research question, the hypothesis to be tested, the experimental design, data collection methods, data analysis perspectives and costs evolved. It is essential to carry the study based on the three basic principles of experimental statistics: randomization, replication, and local control.

Research question

The research question will define the objective of a study. The research will be headed by the question, so it needs to be concise, at the same time it is focused on interesting and novel topics that may improve science and knowledge and that field. To define the way to ask the scientific question, an exhaustive literature review might be necessary. So, the research can be useful to add value to the scientific community.

Hypothesis definition

Once the aim of the study is defined, the possible answers to the research question can be proposed, transforming this question into a hypothesis. The main propose is called null hypothesis (H₀) and is usually based on a permanent knowledge about the topic or an obvious occurrence of the phenomena, sustained by a deep literature review. We can say it is the standard expected answer for the data under the situation in test. In general, H_O assumes no association between _treatments. On the other hand, the alternative hypothesis is the denial of H_O. It assumes some degree of association between the treatment and the outcome. Although, the hypothesis is sustained by question research and its expected and unexpected answers.

As an example, consider groups of similar animals (mice, for example) under two different diet systems. The research question would be: what is the best diet? In this case, H₀ would be that there is no difference between the two diets in mice metabolism (H₀: μ₁ = μ₂) and the alternative hypothesis would be that the diets have different effects over animals metabolism (H₁: μ₁ ≠ μ₂).

The hypothesis is defined by the researcher, according to his/her interests in answering the main question. Besides that, the alternative hypothesis can be more than one hypothesis. It can assume not only differences across observed parameters, but their degree of differences (i.e. higher or shorter).

Sampling

Usually, a study aims to understand an effect of a phenomenon over a population. In biology, a population is defined as all the individuals of a given species, in a specific area at a given time. In biostatistics, this concept is extended to a variety of collections possible of study. Although, in biostatistics, a population is not only the individuals, but the total of one specific component of their organisms, as the whole genome, or all the sperm cells, for animals, or the total leaf area, for a plant, for example.

It is not possible to take the measures from all the elements of a population. Because of that, the sampling process is very important for statistical inference. Sampling is defined as to randomly get a representative part of the entire population, to make posterior inferences about the population. So, the sample might catch the most variability across a population. The sample size is determined by several things, since the scope of the research to the resources available. In clinical research, the trial type, as inferiority, equivalence, and superiority is a key in determining sample size.

Experimental design

Experimental designs sustain those basic principles of experimental statistics. There are three basic experimental designs to randomly allocate treatments in all plots of the experiment. They are completely randomized design, randomized block design, and factorial designs. Treatments can be arranged in many ways inside the experiment. In agriculture, the correct experimental design is the root of a good study and the arrangement of treatments within the study is essential because environment largely affects the plots (plants, livestock, microorganisms). These main arrangements can be found in the literature under the names of “lattices”, “incomplete blocks”, “split plot”, “augmented blocks”, and many others. All of the designs might include control plots, determined by the researcher, to provide an error estimation during inference.

In clinical studies, the samples are usually smaller than in other biological studies, and in most cases, the environment effect can be controlled or measured. It is common to use randomized controlled clinical trials, where results are usually compared with observational study designs such as case–control or cohort.

Data collection

Data collection methods must be considered in research planning, because it highly influences the sample size and experimental design.

Data collection varies according to type of data. For qualitative data, collection can be done with structured questionnaires or by observation, considering presence or intensity of disease, using score criterion to categorize levels of occurrence. For quantitative data, collection is done by measuring numerical information using instruments.

In agriculture and biology studies, yield data and its components can be obtained by metric measures. However, pest and disease injuries in plats are obtained by observation, considering score scales for levels of damage. Especially, in genetic studies, modern methods for data collection in field and laboratory should be considered, as high-throughput platforms for phenotyping and genotyping. These tools allow bigger experiments, while turn possible evaluate many plots in lower time than a human-based only method for data collection. Finally, all data collected of interest must be stored in an organized data frame for further analysis.

Analysis and data interpretation

Descriptive Tools

Data can be represented through tables or graphical representation, such as line charts, bar charts, histograms, scatter plot. Also, measures of central tendency and variability can be very useful to describe an overview of the data. Follow some examples:

Frequency tables

One type of tables are the frequency table, which consists of data arranged in rows and columns, where the frequency is the number of occurrences or repetitions of data. Frequency can be:

Absolute: represents the number of times that a determined value appear;

$N=f_{1}+f_{2}+f_{3}+...+f_{n}$

Relative: obtained by the division of the absolute frequency by the total number;

$n_{i}={\frac {f_{i}}{N}}$

In the next example, we have the number of genes in ten operons of the same organism.

$Genes=2,3,3,4,5,3,3,3,3,4$


Genes number	Absolute frequency	Relative frequency
1	0	0
2	1	0.1
3	6	0.6
4	2	0.2
5	1	0.1

Line graph

Figure A: Line graph example. The birth rate in Brazil (2010–2016); Figure B: Bar chart example. The birth rate in Brazil for the December months from 2010 to 2016; Figure C: Example of Box Plot: number of glycines in the proteome of eight different organisms (A-H); Figure D: Example of a scatter plot.

Line graphs represent the variation of a value over another metric, such as time. In general, values are represented in the vertical axis, while the time variation is represented in the horizontal axis.

Bar chart

A bar chart is a graph that shows categorical data as bars presenting heights (vertical bar) or widths (horizontal bar) proportional to represent values. Bar charts provide an image that could also be represented in a tabular format.

In the bar chart example, we have the birth rate in Brazil for the December months from 2010 to 2016. The sharp fall in December 2016 reflects the outbreak of Zika virus in the birth rate in Brazil.

Histograms

Example of a histogram.

The histogram (or frequency distribution) is a graphical representation of a dataset tabulated and divided into uniform or non-uniform classes. It was first introduced by Karl Pearson.

Scatter Plot

A scatter plot is a mathematical diagram that uses Cartesian coordinates to display values of a dataset. A scatter plot shows the data as a set of points, each one presenting the value of one variable determining the position on the horizontal axis and another variable on the vertical axis. They are also called scatter graph, scatter chart, scattergram, or scatter diagram.

Mean

The arithmetic mean is the sum of a collection of values ( ${x_{1}+x_{2}+x_{3}+\cdots +x_{n}}$ ) divided by the number of items of this collection ( ${n}$ ).

${\bar {x}}={\frac {1}{n}}\left(\sum _{i=1}^{n}{x_{i}}\right)={\frac {x_{1}+x_{2}+\cdots +x_{n}}{n}}$

Median

The median is the value in the middle of a dataset.

Mode

The mode is the value of a set of data that appears most often.

Comparison among mean, median and mode Values = { 2,3,3,3,3,3,4,4,11 }
Type	Example	Result
Mean	( 2 + 3 + 3 + 3 + 3 + 3 + 4 + 4 + 11 ) / 9	4
Median	2, 3, 3, 3, 3, 3, 4, 4, 11	3
Mode	2, 3, 3, 3, 3, 3, 4, 4, 11	3

Box Plot

Box plot is a method for graphically depicting groups of numerical data. The maximum and minimum values are represented by the lines, and the interquartile range (IQR) represent 25–75% of the data. Outliers may be plotted as circles.

Correlation Coefficients

Although correlations between two different kinds of data could be inferred by graphs, such as scatter plot, it is necessary validate this though numerical information. For this reason, correlation coefficients are required. They provide a numerical value that reflects the strength of an association.

Pearson Correlation Coefficient

Scatter diagram that demonstrates the Pearson correlation for different values of ρ.

Pearson correlation coefficient is a measure of association between two variables, X and Y. This coefficient, usually represented by ρ (rho) for the population and r for the sample, assumes values between −1 and 1, where ρ = 1 represents a perfect positive correlation, ρ = -1 represents a perfect negative correlation, and ρ = 0 is no linear correlation.

Inferential Statistics

It is used to make inferences about an unknown population, by estimation and/or hypothesis testing. In other words, it is desirable to obtain parameters to describe the population of interest, but since the data is limited, it is necessary to make use of a representative sample in order to estimate them. With that, it is possible to test previously defined hypotheses and apply the conclusions to the entire population. The standard error of the mean is a measure of variability that is crucial to do inferences.

Hypothesis testing

Hypothesis testing is essential to make inferences about populations aiming to answer research questions, as settled in "Research planning" section. Authors defined four steps to be set:

The hypothesis to be tested: as stated earlier, we have to work with the definition of a null hypothesis (H₀), that is going to be tested, and an alternative hypothesis. But they must be defined before the experiment implementation.
Significance level and decision rule: A decision rule depends on the level of significance, or in other words, the acceptable error rate (α). It is easier to think that we define a critical value that determines the statistical significance when a test statistic is compared with it. So, α also has to be predefined before the experiment.
Experiment and statistical analysis: This is when the experiment is really implemented following the appropriate experimental design, data is collected and the more suitable statistical tests are evaluated.
Inference: Is made when the null hypothesis is rejected or not rejected, based on the evidence that the comparison of p-values and α brings. It is pointed that the failure to reject H₀ just means that there is not enough evidence to support its rejection, but not that this hypothesis is true.

Confidence intervals

A confidence interval is a range of values that can contain the true real parameter value in given a certain level of confidence. The first step is to estimate the best-unbiased estimate of the population parameter. The upper value of the interval is obtained by the sum of this estimate with the multiplication between the standard error of the mean and the confidence level. The calculation of lower value is similar, but instead of a sum, a subtraction must be applied.

Statistical considerations

Power and statistical error

When testing a hypothesis, there are two types of statistic errors possible: Type I error and Type II error. The type I error or false positive is the incorrect rejection of a true null hypothesis and the type II error or false negative is the failure to reject a false null hypothesis. The significance level denoted by α is the type I error rate and should be chosen before performing the test. The type II error rate is denoted by β and statistical power of the test is 1 − β.

p-value

The p-value is the probability of obtaining results as extreme as or more extreme than those observed, assuming the null hypothesis (H₀) is true. It is also called the calculated probability. It is common to confuse the p-value with the significance level (α), but, the α is a predefined threshold for calling significant results. If p is less than α, the null hypothesis (H₀) is rejected.

Multiple testing

In multiple tests of the same hypothesis, the probability of the occurrence of falses positives (familywise error rate) increase and some strategy are used to control this occurrence. This is commonly achieved by using a more stringent threshold to reject null hypotheses. The Bonferroni correction defines an acceptable global significance level, denoted by α* and each test is individually compared with a value of α = α*/m. This ensures that the familywise error rate in all m tests, is less than or equal to α*. When m is large, the Bonferroni correction may be overly conservative. An alternative to the Bonferroni correction is to control the false discovery rate (FDR). The FDR controls the expected proportion of the rejected null hypotheses (the so-called discoveries) that are false (incorrect rejections). This procedure ensures that, for independent tests, the false discovery rate is at most q*. Thus, the FDR is less conservative than the Bonferroni correction and have more power, at the cost of more false positives.

Mis-specification and robustness checks

The main hypothesis being tested (e.g., no association between treatments and outcomes) is often accompanied by other technical assumptions (e.g., about the form of the probability distribution of the outcomes) that are also part of the null hypothesis. When the technical assumptions are violated in practice, then the null may be frequently rejected even if the main hypothesis is true. Such rejections are said to be due to model mis-specification. Verifying whether the outcome of a statistical test does not change when the technical assumptions are slightly altered (so-called robustness checks) is the main way of combating mis-specification.

Model selection criteria

Model criteria selection will select or model that more approximate true model. The Akaike's Information Criterion (AIC) and The Bayesian Information Criterion (BIC) are examples of asymptotically efficient criteria.

Developments and Big Data

Recent developments have made a large impact on biostatistics. Two important changes have been the ability to collect data on a high-throughput scale, and the ability to perform much more complex analysis using computational techniques. This comes from the development in areas as sequencing technologies, Bioinformatics and Machine learning.

Use in high-throughput data

New biomedical technologies like microarrays, next-generation sequencers (for genomics) and mass spectrometry (for proteomics) generate enormous amounts of data, allowing many tests to be performed simultaneously. Careful analysis with biostatistical methods is required to separate the signal from the noise. For example, a microarray could be used to measure many thousands of genes simultaneously, determining which of them have different expression in diseased cells compared to normal cells. However, only a fraction of genes will be differentially expressed.

Multicollinearity often occurs in high-throughput biostatistical settings. Due to high intercorrelation between the predictors (such as gene expression levels), the information of one predictor might be contained in another one. It could be that only 5% of the predictors are responsible for 90% of the variability of the response. In such a case, one could apply the biostatistical technique of dimension reduction (for example via principal component analysis). Classical statistical techniques like linear or logistic regression and linear discriminant analysis do not work well for high dimensional data (i.e. when the number of observations n is smaller than the number of features or predictors p: n < p). As a matter of fact, one can get quite high R²-values despite very low predictive power of the statistical model. These classical statistical techniques (esp. least squares linear regression) were developed for low dimensional data (i.e. where the number of observations n is much larger than the number of predictors p: n >> p). In cases of high dimensionality, one should always consider an independent validation test set and the corresponding residual sum of squares (RSS) and R² of the validation test set, not those of the training set.

Often, it is useful to pool information from multiple predictors together. For example, Gene Set Enrichment Analysis (GSEA) considers the perturbation of whole (functionally related) gene sets rather than of single genes. These gene sets might be known biochemical pathways or otherwise functionally related genes. The advantage of this approach is that it is more robust: It is more likely that a single gene is found to be falsely perturbed than it is that a whole pathway is falsely perturbed. Furthermore, one can integrate the accumulated knowledge about biochemical pathways (like the JAK-STAT signaling pathway) using this approach.

Bioinformatics advances in databases, data mining, and biological interpretation

The development of biological databases enables storage and management of biological data with the possibility of ensuring access for users around the world. They are useful for researchers depositing data, retrieve information and files (raw or processed) originated from other experiments or indexing scientific articles, as PubMed. Another possibility is search for the desired term (a gene, a protein, a disease, an organism, and so on) and check all results related to this search. There are databases dedicated to SNPs (dbSNP), the knowledge on genes characterization and their pathways (KEGG) and the description of gene function classifying it by cellular component, molecular function and biological process (Gene Ontology). In addition to databases that contain specific molecular information, there are others that are ample in the sense that they store information about an organism or group of organisms. As an example of a database directed towards just one organism, but that contains much data about it, is the Arabidopsis thaliana genetic and molecular database – TAIR. Phytozome, in turn, stores the assemblies and annotation files of dozen of plant genomes, also containing visualization and analysis tools. Moreover, there is an interconnection between some databases in the information exchange/sharing and a major initiative was the International Nucleotide Sequence Database Collaboration (INSDC) which relates data from DDBJ, EMBL-EBI, and NCBI.

Nowadays, increase in size and complexity of molecular datasets leads to use of powerful statistical methods provided by computer science algorithms which are developed by machine learning area. Therefore, data mining and machine learning allow detection of patterns in data with a complex structure, as biological ones, by using methods of supervised and unsupervised learning, regression, detection of clusters and association rule mining, among others. To indicate some of them, self-organizing maps and k-means are examples of cluster algorithms; neural networks implementation and support vector machines models are examples of common machine learning algorithms.

Collaborative work among molecular biologists, bioinformaticians, statisticians and computer scientists is important to perform an experiment correctly, going from planning, passing through data generation and analysis, and ending with biological interpretation of the results.

Use of computationally intensive methods

On the other hand, the advent of modern computer technology and relatively cheap computing resources have enabled computer-intensive biostatistical methods like bootstrapping and re-sampling methods.

In recent times, random forests have gained popularity as a method for performing statistical classification. Random forest techniques generate a panel of decision trees. Decision trees have the advantage that you can draw them and interpret them (even with a basic understanding of mathematics and statistics). Random Forests have thus been used for clinical decision support systems.

Applications

Public health

Public health, including epidemiology, health services research, nutrition, environmental health and health care policy & management. In these medicine contents, it's important to consider the design and analysis of the clinical trials. As one example, there is the assessment of severity state of a patient with a prognosis of an outcome of a disease.

With new technologies and genetics knowledge, biostatistics are now also used for Systems medicine, which consists in a more personalized medicine. For this, is made an integration of data from different sources, including conventional patient data, clinico-pathological parameters, molecular and genetic data as well as data generated by additional new-omics technologies.

Quantitative genetics

The study of Population genetics and Statistical genetics in order to link variation in genotype with a variation in phenotype. In other words, it is desirable to discover the genetic basis of a measurable trait, a quantitative trait, that is under polygenic control. A genome region that is responsible for a continuous trait is called Quantitative trait locus (QTL). The study of QTLs become feasible by using molecular markers and measuring traits in populations, but their mapping needs the obtaining of a population from an experimental crossing, like an F2 or Recombinant inbred strains/lines (RILs). To scan for QTLs regions in a genome, a gene map based on linkage have to be built. Some of the best-known QTL mapping algorithms are Interval Mapping, Composite Interval Mapping, and Multiple Interval Mapping.

However, QTL mapping resolution is impaired by the amount of recombination assayed, a problem for species in which it is difficult to obtain large offspring. Furthermore, allele diversity is restricted to individuals originated from contrasting parents, which limit studies of allele diversity when we have a panel of individuals representing a natural population. For this reason, the Genome-wide association study was proposed in order to identify QTLs based on linkage disequilibrium, that is the non-random association between traits and molecular markers. It was leveraged by the development of high-throughput SNP genotyping.

In animal and plant breeding, the use of markers in selection aiming for breeding, mainly the molecular ones, collaborated to the development of marker-assisted selection. While QTL mapping is limited due resolution, GWAS does not have enough power when rare variants of small effect that are also influenced by environment. So, the concept of Genomic Selection (GS) arises in order to use all molecular markers in the selection and allow the prediction of the performance of candidates in this selection. The proposal is to genotype and phenotype a training population, develop a model that can obtain the genomic estimated breeding values (GEBVs) of individuals belonging to a genotyped and but not phenotyped population, called testing population. This kind of study could also include a validation population, thinking in the concept of cross-validation, in which the real phenotype results measured in this population are compared with the phenotype results based on the prediction, what used to check the accuracy of the model.

As a summary, some points about the application of quantitative genetics are:

This has been used in agriculture to improve crops (Plant breeding) and livestock (Animal breeding).
In biomedical research, this work can assist in finding candidates gene alleles that can cause or influence predisposition to diseases in human genetics

Expression data

Studies for differential expression of genes from RNA-Seq data, as for RT-qPCR and microarrays, demands comparison of conditions. The goal is to identify genes which have a significant change in abundance between different conditions. Then, experiments are designed appropriately, with replicates for each condition/treatment, randomization and blocking, when necessary. In RNA-Seq, the quantification of expression uses the information of mapped reads that are summarized in some genetic unit, as exons that are part of a gene sequence. As microarray results can be approximated by a normal distribution, RNA-Seq counts data are better explained by other distributions. The first used distribution was the Poisson one, but it underestimate the sample error, leading to false positives. Currently, biological variation is considered by methods that estimate a dispersion parameter of a negative binomial distribution. Generalized linear models are used to perform the tests for statistical significance and as the number of genes is high, multiple tests correction have to be considered. Some examples of other analysis on genomics data comes from microarray or proteomics experiments. Often concerning diseases or disease stages.

Other studies

Ecology, ecological forecasting
Biological sequence analysis
Systems biology for gene network inference or pathways analysis.
Population dynamics, especially in regards to fisheries science.
Phylogenetics and evolution

Tools

There are a lot of tools that can be used to do statistical analysis in biological data. Most of them are useful in other areas of knowledge, covering a large number of applications (alphabetical). Here are brief descriptions of some of them:

ASReml: Another software developed by VSNi that can be used also in R environment as a package. It is developed to estimate variance components under a general linear mixed model using restricted maximum likelihood (REML). Models with fixed effects and random effects and nested or crossed ones are allowed. Gives the possibility to investigate different variance-covariance matrix structures.
CycDesigN: A computer package developed by VSNi that helps the researchers create experimental designs and analyze data coming from a design present in one of three classes handled by CycDesigN. These classes are resolvable, non-resolvable, partially replicated and crossover designs. It includes less used designs the Latinized ones, as t-Latinized design.
Orange: A programming interface for high-level data processing, data mining and data visualization. Include tools for gene expression and genomics.
R: An open source environment and programming language dedicated to statistical computing and graphics. It is an implementation of S language maintained by CRAN. In addition to its functions to read data tables, take descriptive statistics, develop and evaluate models, its repository contains packages developed by researchers around the world. This allows the development of functions written to deal with the statistical analysis of data that comes from specific applications. In the case of Bioinformatics, for example, there are packages located in the main repository (CRAN) and in others, as Bioconductor. It is also possible to use packages under development that are shared in hosting-services as GitHub.
SAS: A data analysis software widely used, going through universities, services and industry. Developed by a company with the same name (SAS Institute), it uses SAS language for programming.
PLA 3.0: Is a biostatistical analysis software for regulated environments (e.g. drug testing) which supports Quantitative Response Assays (Parallel-Line, Parallel-Logistics, Slope-Ratio) and Dichotomous Assays (Quantal Response, Binary Assays). It also supports weighting methods for combination calculations and the automatic data aggregation of independent assay data.
Weka: A Java software for machine learning and data mining, including tools and methods for visualization, clustering, regression, association rule, and classification. There are tools for cross-validation, bootstrapping and a module of algorithm comparison. Weka also can be run in other programming languages as Perl or R.

Scope and training programs

Almost all educational programmes in biostatistics are at postgraduate level. They are most often found in schools of public health, affiliated with schools of medicine, forestry, or agriculture, or as a focus of application in departments of statistics.

In the United States, where several universities have dedicated biostatistics departments, many other top-tier universities integrate biostatistics faculty into statistics or other departments, such as epidemiology. Thus, departments carrying the name "biostatistics" may exist under quite different structures. For instance, relatively new biostatistics departments have been founded with a focus on bioinformatics and computational biology, whereas older departments, typically affiliated with schools of public health, will have more traditional lines of research involving epidemiological studies and clinical trials as well as bioinformatics. In larger universities around the world, where both a statistics and a biostatistics department exist, the degree of integration between the two departments may range from the bare minimum to very close collaboration. In general, the difference between a statistics program and a biostatistics program is twofold: (i) statistics departments will often host theoretical/methodological research which are less common in biostatistics programs and (ii) statistics departments have lines of research that may include biomedical applications but also other areas such as industry (quality control), business and economics and biological areas other than medicine.

Specialized journals

Biostatistics
International Journal of Biostatistics
Journal of Epidemiology and Biostatistics
Biostatistics and Public Health
Biometrics
Biometrika
Biometrical Journal
Communications in Biometry and Crop Science
Statistical Applications in Genetics and Molecular Biology
Statistical Methods in Medical Research
Pharmaceutical Statistics
Statistics in Medicine

Wednesday, March 10, 2021

Mathematical and theoretical biology

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Mathematical_and_theoretical_biology

Yellow chamomile head showing the Fibonacci numbers in spirals consisting of 21 (blue) and 13 (aqua). Such arrangements have been noticed since the Middle Ages and can be used to make mathematical models of a wide variety of plants.

Mathematical and theoretical biology or, Biomathematics, is a branch of biology which employs theoretical analysis, mathematical models and abstractions of the living organisms to investigate the principles that govern the structure, development and behavior of the systems, as opposed to experimental biology which deals with the conduction of experiments to prove and validate the scientific theories. The field is sometimes called mathematical biology or biomathematics to stress the mathematical side, or theoretical biology to stress the biological side. Theoretical biology focuses more on the development of theoretical principles for biology while mathematical biology focuses on the use of mathematical tools to study biological systems, even though the two terms are sometimes interchanged.

Mathematical biology aims at the mathematical representation and modeling of biological processes, using techniques and tools of applied mathematics. It can be useful in both theoretical and practical research. Describing systems in a quantitative manner means their behavior can be better simulated, and hence properties can be predicted that might not be evident to the experimenter. This requires precise mathematical models.

Because of the complexity of the living systems, theoretical biology employs several fields of mathematics, and has contributed to the development of new techniques.

History

Early history

Mathematics has been used in biology as early as the 13th century, when Fibonacci used the famous Fibonacci series to describe a growing population of rabbits. In the 18th century Daniel Bernoulli applied mathematics to describe the effect of smallpox on the human population. Thomas Malthus' 1789 essay on the growth of the human population was based on the concept of exponential growth. Pierre François Verhulst formulated the logistic growth model in 1836.

Fritz Müller described the evolutionary benefits of what is now called Müllerian mimicry in 1879, in an account notable for being the first use of a mathematical argument in evolutionary ecology to show how powerful the effect of natural selection would be, unless one includes Malthus's discussion of the effects of population growth that influenced Charles Darwin: Malthus argued that growth would be exponential (he uses the word "geometric") while resources (the environment's carrying capacity) could only grow arithmetically.

The term "theoretical biology" was first used by Johannes Reinke in 1901. One founding text is considered to be On Growth and Form (1917) by D'Arcy Thompson, and other early pioneers include Ronald Fisher, Hans Leo Przibram, Nicolas Rashevsky and Vito Volterra.

Recent growth

Interest in the field has grown rapidly from the 1960s onwards. Some reasons for this include:

The rapid growth of data-rich information sets, due to the genomics revolution, which are difficult to understand without the use of analytical tools
Recent development of mathematical tools such as chaos theory to help understand complex, non-linear mechanisms in biology
An increase in computing power, which facilitates calculations and simulations not previously possible
An increasing interest in in silico experimentation due to ethical considerations, risk, unreliability and other complications involved in human and animal research

Areas of research

Several areas of specialized research in mathematical and theoretical biology as well as external links to related projects in various universities are concisely presented in the following subsections, including also a large number of appropriate validating references from a list of several thousands of published authors contributing to this field. Many of the included examples are characterised by highly complex, nonlinear, and supercomplex mechanisms, as it is being increasingly recognised that the result of such interactions may only be understood through a combination of mathematical, logical, physical/chemical, molecular and computational models.

Abstract relational biology

Abstract relational biology (ARB) is concerned with the study of general, relational models of complex biological systems, usually abstracting out specific morphological, or anatomical, structures. Some of the simplest models in ARB are the Metabolic-Replication, or (M,R)--systems introduced by Robert Rosen in 1957-1958 as abstract, relational models of cellular and organismal organization.

Other approaches include the notion of autopoiesis developed by Maturana and Varela, Kauffman's Work-Constraints cycles, and more recently the notion of closure of constraints.

Algebraic biology

Algebraic biology (also known as symbolic systems biology) applies the algebraic methods of symbolic computation to the study of biological problems, especially in genomics, proteomics, analysis of molecular structures and study of genes.

Complex systems biology

An elaboration of systems biology to understanding the more complex life processes was developed since 1970 in connection with molecular set theory, relational biology and algebraic biology.

Computer models and automata theory

A monograph on this topic summarizes an extensive amount of published research in this area up to 1986, including subsections in the following areas: computer modeling in biology and medicine, arterial system models, neuron models, biochemical and oscillation networks, quantum automata, quantum computers in molecular biology and genetics, cancer modelling, neural nets, genetic networks, abstract categories in relational biology, metabolic-replication systems, category theory applications in biology and medicine, automata theory, cellular automata, tessellation models and complete self-reproduction, chaotic systems in organisms, relational biology and organismic theories.

Modeling cell and molecular biology

This area has received a boost due to the growing importance of molecular biology.

Mechanics of biological tissues
Theoretical enzymology and enzyme kinetics
Cancer modelling and simulation
Modelling the movement of interacting cell populations
Mathematical modelling of scar tissue formation
Mathematical modelling of intracellular dynamics
Mathematical modelling of the cell cycle
Mathematical modelling of apoptosis

Modelling physiological systems

Modelling of arterial disease
Multi-scale modelling of the heart
Modelling electrical properties of muscle interactions, as in bidomain and monodomain models

Computational neuroscience

Computational neuroscience (also known as theoretical neuroscience or mathematical neuroscience) is the theoretical study of the nervous system.

Evolutionary biology

Ecology and evolutionary biology have traditionally been the dominant fields of mathematical biology.

Evolutionary biology has been the subject of extensive mathematical theorizing. The traditional approach in this area, which includes complications from genetics, is population genetics. Most population geneticists consider the appearance of new alleles by mutation, the appearance of new genotypes by recombination, and changes in the frequencies of existing alleles and genotypes at a small number of gene loci. When infinitesimal effects at a large number of gene loci are considered, together with the assumption of linkage equilibrium or quasi-linkage equilibrium, one derives quantitative genetics. Ronald Fisher made fundamental advances in statistics, such as analysis of variance, via his work on quantitative genetics. Another important branch of population genetics that led to the extensive development of coalescent theory is phylogenetics. Phylogenetics is an area that deals with the reconstruction and analysis of phylogenetic (evolutionary) trees and networks based on inherited characteristics Traditional population genetic models deal with alleles and genotypes, and are frequently stochastic.

Many population genetics models assume that population sizes are constant. Variable population sizes, often in the absence of genetic variation, are treated by the field of population dynamics. Work in this area dates back to the 19th century, and even as far as 1798 when Thomas Malthus formulated the first principle of population dynamics, which later became known as the Malthusian growth model. The Lotka–Volterra predator-prey equations are another famous example. Population dynamics overlap with another active area of research in mathematical biology: mathematical epidemiology, the study of infectious disease affecting populations. Various models of the spread of infections have been proposed and analyzed, and provide important results that may be applied to health policy decisions.

In evolutionary game theory, developed first by John Maynard Smith and George R. Price, selection acts directly on inherited phenotypes, without genetic complications. This approach has been mathematically refined to produce the field of adaptive dynamics.

Mathematical biophysics

The earlier stages of mathematical biology were dominated by mathematical biophysics, described as the application of mathematics in biophysics, often involving specific physical/mathematical models of biosystems and their components or compartments.

The following is a list of mathematical descriptions and their assumptions.

Deterministic processes (dynamical systems)

A fixed mapping between an initial state and a final state. Starting from an initial condition and moving forward in time, a deterministic process always generates the same trajectory, and no two trajectories cross in state space.

Difference equations/Maps – discrete time, continuous state space.
Ordinary differential equations – continuous time, continuous state space, no spatial derivatives. See also: Numerical ordinary differential equations.
Partial differential equations – continuous time, continuous state space, spatial derivatives. See also: Numerical partial differential equations.
Logical deterministic cellular automata – discrete time, discrete state space. See also: Cellular automaton.

Stochastic processes (random dynamical systems)

A random mapping between an initial state and a final state, making the state of the system a random variable with a corresponding probability distribution.

Non-Markovian processes – generalized master equation – continuous time with memory of past events, discrete state space, waiting times of events (or transitions between states) discretely occur.
Jump Markov process – master equation – continuous time with no memory of past events, discrete state space, waiting times between events discretely occur and are exponentially distributed. See also: Monte Carlo method for numerical simulation methods, specifically dynamic Monte Carlo method and Gillespie algorithm.
Continuous Markov process – stochastic differential equations or a Fokker–Planck equation – continuous time, continuous state space, events occur continuously according to a random Wiener process.

Spatial modelling

One classic work in this area is Alan Turing's paper on morphogenesis entitled The Chemical Basis of Morphogenesis, published in 1952 in the Philosophical Transactions of the Royal Society.

Travelling waves in a wound-healing assay
Swarming behaviour
A mechanochemical theory of morphogenesis
Biological pattern formation
Spatial distribution modeling using plot samples
Turing patterns

Mathematical methods

A model of a biological system is converted into a system of equations, although the word 'model' is often used synonymously with the system of corresponding equations. The solution of the equations, by either analytical or numerical means, describes how the biological system behaves either over time or at equilibrium. There are many different types of equations and the type of behavior that can occur is dependent on both the model and the equations used. The model often makes assumptions about the system. The equations may also make assumptions about the nature of what may occur.

Molecular set theory

Molecular set theory (MST) is a mathematical formulation of the wide-sense chemical kinetics of biomolecular reactions in terms of sets of molecules and their chemical transformations represented by set-theoretical mappings between molecular sets. It was introduced by Anthony Bartholomay, and its applications were developed in mathematical biology and especially in mathematical medicine. In a more general sense, MST is the theory of molecular categories defined as categories of molecular sets and their chemical transformations represented as set-theoretical mappings of molecular sets. The theory has also contributed to biostatistics and the formulation of clinical biochemistry problems in mathematical formulations of pathological, biochemical changes of interest to Physiology, Clinical Biochemistry and Medicine.

Organizational biology

Theoretical approaches to biological organization aim to understand the interdependence between the parts of organisms. They emphasize the circularities that these interdependences lead to. Theoretical biologists developed several concepts to formalize this idea.

For example, abstract relational biology (ARB) is concerned with the study of general, relational models of complex biological systems, usually abstracting out specific morphological, or anatomical, structures. Some of the simplest models in ARB are the Metabolic-Replication, or (M,R)--systems introduced by Robert Rosen in 1957-1958 as abstract, relational models of cellular and organismal organization.

Model example: the cell cycle

The eukaryotic cell cycle is very complex and is one of the most studied topics, since its misregulation leads to cancers. It is possibly a good example of a mathematical model as it deals with simple calculus but gives valid results. Two research groups have produced several models of the cell cycle simulating several organisms. They have recently produced a generic eukaryotic cell cycle model that can represent a particular eukaryote depending on the values of the parameters, demonstrating that the idiosyncrasies of the individual cell cycles are due to different protein concentrations and affinities, while the underlying mechanisms are conserved (Csikasz-Nagy et al., 2006).

By means of a system of ordinary differential equations these models show the change in time (dynamical system) of the protein inside a single typical cell; this type of model is called a deterministic process (whereas a model describing a statistical distribution of protein concentrations in a population of cells is called a stochastic process).

To obtain these equations an iterative series of steps must be done: first the several models and observations are combined to form a consensus diagram and the appropriate kinetic laws are chosen to write the differential equations, such as rate kinetics for stoichiometric reactions, Michaelis-Menten kinetics for enzyme substrate reactions and Goldbeter–Koshland kinetics for ultrasensitive transcription factors, afterwards the parameters of the equations (rate constants, enzyme efficiency coefficients and Michaelis constants) must be fitted to match observations; when they cannot be fitted the kinetic equation is revised and when that is not possible the wiring diagram is modified. The parameters are fitted and validated using observations of both wild type and mutants, such as protein half-life and cell size.

To fit the parameters, the differential equations must be studied. This can be done either by simulation or by analysis. In a simulation, given a starting vector (list of the values of the variables), the progression of the system is calculated by solving the equations at each time-frame in small increments.

In analysis, the properties of the equations are used to investigate the behavior of the system depending on the values of the parameters and variables. A system of differential equations can be represented as a vector field, where each vector described the change (in concentration of two or more protein) determining where and how fast the trajectory (simulation) is heading. Vector fields can have several special points: a stable point, called a sink, that attracts in all directions (forcing the concentrations to be at a certain value), an unstable point, either a source or a saddle point, which repels (forcing the concentrations to change away from a certain value), and a limit cycle, a closed trajectory towards which several trajectories spiral towards (making the concentrations oscillate).

A better representation, which handles the large number of variables and parameters, is a bifurcation diagram using bifurcation theory. The presence of these special steady-state points at certain values of a parameter (e.g. mass) is represented by a point and once the parameter passes a certain value, a qualitative change occurs, called a bifurcation, in which the nature of the space changes, with profound consequences for the protein concentrations: the cell cycle has phases (partially corresponding to G1 and G2) in which mass, via a stable point, controls cyclin levels, and phases (S and M phases) in which the concentrations change independently, but once the phase has changed at a bifurcation event (Cell cycle checkpoint), the system cannot go back to the previous levels since at the current mass the vector field is profoundly different and the mass cannot be reversed back through the bifurcation event, making a checkpoint irreversible. In particular the S and M checkpoints are regulated by means of special bifurcations called a Hopf bifurcation and an infinite period bifurcation.

Conscientiousness

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Conscientiousness

Conscientiousness is the personality trait of being careful, or diligent. Conscientiousness implies a desire to do a task well, and to take obligations to others seriously. Conscientious people tend to be efficient and organized as opposed to easy-going and disorderly. They exhibit a tendency to show self-discipline, act dutifully, and aim for achievement; they display planned rather than spontaneous behavior; and they are generally dependable. It is manifested in characteristic behaviors such as being neat, and systematic; also including such elements as carefulness, thoroughness, and deliberation (the tendency to think carefully before acting).

Conscientiousness is one of the five traits of both the Five Factor Model and the HEXACO model of personality and is an aspect of what has traditionally been referred to as having character. Conscientious individuals are generally hard-working, and reliable. When taken to an extreme, they may also be "workaholics", perfectionists, and compulsive in their behavior. People who score low on conscientiousness tend to be laid back, less goal-oriented, and less driven by success; they also are more likely to engage in antisocial and criminal behavior.

Personality models

Conscientiousness is one of the five major dimensions in the Big Five model (also called Five Factor Model) of personality, which also consists of extraversion, neuroticism, openness to experience, and agreeableness (OCEAN acronym). Two of many personality tests that assess these traits are Costa and McCrae's NEO PI-R and Goldberg's NEO-IPIP. According to these models, conscientiousness is considered to be a continuous dimension of personality, rather than a categorical 'type' of person.

In the NEO framework, Conscientiousness is seen as having six facets: Competence, Order, Dutifulness, Achievement Striving, Self-Discipline, and Deliberation. Other models suggest a smaller set of two "aspects": orderliness and industriousness form an intermediate level of organization, with orderliness associated with the desire to keep things organized and tidy and industriousness being more associated with productivity and work ethic.

Other personality traits ((Low) extraversion, (high) agreeableness, (low) openness and (low) neuroticism) are linked to high conscientiousness along with impulse control. Behaviorally, low conscientiousness is associated with an inability to motivate one's self to perform tasks that the individual desires to accomplish.

Conscientiousness also appears in other models of personality, such as Cloninger's Temperament and Character Inventory, in which it is related to both self-directedness and persistence. It also includes the specific traits of rule consciousness and perfectionism in Cattell's 16 PF model. It is negatively associated with impulsive sensation-seeking in Zuckerman's alternative five model. Traits associated with conscientiousness are frequently assessed by self-report integrity tests given by various corporations to prospective employees.

Origin

Terms such as 'hard-working,' 'reliable,' and 'persevering' describe desirable aspects of character. Because it was once believed to be a moral evaluation, conscientiousness was overlooked as a real psychological attribute. The reality of individual differences in conscientiousness has now been clearly established by studies of cross-observer agreement. Peer and expert ratings confirm the self-reports that people make about their degrees of conscientiousness. Furthermore, both self-reports and observer ratings of conscientiousness predict real-life outcomes such as academic success.

During most of the 20th century, psychologists believed that personality traits could be divided into two categories: temperament and character. Temperament traits were thought to be biologically based, whereas character traits were thought to be learned either during childhood or throughout life. With the advent of the FFM (Five-Factor Model), behavior geneticists began systematic studies of the full range of personality traits, and it soon became clear that all five factors are substantially heritable. Identical twins showed very similar personality traits even when they had been separated at birth and raised apart, and this was equally true for both character traits and temperament traits. Parents and communities influence the ways in which conscientiousness is expressed, but they apparently do not influence its level.

Measurement

A person's level of conscientiousness is generally assessed using self-report measures, although peer-reports and third-party observation can also be used. Self-report measures are either lexical or based on statements. Deciding which measure of either type to use in research is determined by an assessment of psychometric properties and the time and space constraints of the study being undertaken.

Lexical

Lexical measures use individual adjectives that reflect conscientiousness traits, such as efficient and systematic, and are very space and time efficient for research purposes. Goldberg (1992) developed a 20-word measure as part of his 100-word Big Five markers. Saucier (1994) developed a briefer 8-word measure as part of his 40-word mini-markers. Thompson (2008) systematically revised these measures to develop the International English Mini-Markers which has superior validity and reliability in populations both within and outside North America. Internal consistency reliability of the International English Mini-Markers for the Conscientiousness measure for native English-speakers is reported as .90, that for non-native English-speakers is .86.

Statement

Statement measures tend to comprise more words than lexical measures, so hence consume more research instrument space and more respondent time to complete. Respondents are asked the extent to which they, for example, often forget to put things back in their proper place, or are careful to avoid making mistakes. Some statement-based measures of conscientiousness have similarly acceptable psychometric properties in North American populations to lexical measures, but their generally emic development makes them less suited to use in other populations. For instance, statements in colloquial North American English like Often forget to put things back in their proper place or Am careful to avoid making mistakes can be hard for non-native English-speakers to understand, suggesting internationally validated measures might be more appropriate for research conducted with non-North Americans.

Behavior

Development

Currently, little is known about conscientiousness in young children because the self-report inventories typically used to assess it are not appropriate for that age group. It is likely, however, that there are individual differences on this factor at an early age. It is known, for example, that some children have attention deficit/hyperactivity disorder (ADHD may not go away with age; however it is still unclear how neurodevelopmental disorders such as ADHD and autism relate to the development of conscientiousness and other personality traits), which is characterized in part by problems with concentration, organization, and persistence; traits which are related to conscientiousness. Longitudinal and cross-sectional studies suggest that conscientiousness is relatively low among adolescents but increases between 18 and 30 years of age. Research has also shown that conscientiousness generally increases with age from 21 to 60, though the rate of increase is slow.

Individual differences are also strongly preserved, meaning that a careful, neat, and scrupulous 30-year-old is likely to become a careful, neat, and scrupulous 80-year-old.

Daily life

People who score high on the trait of conscientiousness tend to be more organized and less cluttered in their homes and offices. For example, their books tend to be neatly shelved in alphabetical order, or categorized by topic, rather than scattered around the room. Their clothes tend to be folded and arranged in drawers or closets instead of lying on the floor. The presence of planners and to-do lists are also signs of conscientiousness. Their homes tend to have better lighting than the homes of people who score low on this trait. Recently, ten behaviors strongly associated with conscientiousness were scientifically categorized (the number at the end of each behavior is a correlation coefficient; a negative number means conscientious people were less likely to manifest the behavior):

Academic and workplace performance

Conscientiousness is importantly related to successful academic performance in students and workplace performance among managers and workers. Low levels of conscientiousness are strongly associated with procrastination. A considerable amount of research indicates that conscientiousness has a moderate to large positive correlation with performance in the workplace, and indeed that after general mental ability is taken into account, the other four of the Big Five personality traits do not aid in predicting career success.

Conscientious employees are generally more reliable, more motivated, and harder working. They also have lower rates of absenteeism and counterproductive work behaviors such as stealing and fighting with other employees. Furthermore, conscientiousness is the only personality trait that correlates with performance across all categories of jobs. However, agreeableness and emotional stability may also be important, particularly in jobs that involve a significant amount of social interaction. Of all manager/leader types, top executives show the lowest level of rule-following, a conscientious trait. Conscientiousness is not always positively related to job performance, sometimes the opposite is true. Being too conscientious could lead to taking too much time to making urgent decisions and to working too attached to the rules and lack innovation.

Subjective well-being

In general, conscientiousness has a positive relationship with subjective well-being, particularly satisfaction with life, so highly conscientious people tend to be happier with their lives than those who score low on this trait. Although conscientiousness is generally seen as a positive trait to possess, recent research has suggested that in some situations it may be harmful for well-being. In a prospective study of 9570 individuals over four years, highly conscientious people suffered more than twice as much if they became unemployed. The authors suggested this may be due to conscientious people making different attributions about why they became unemployed, or through experiencing stronger reactions following failure. This finding is consistent with perspectives which see no trait as inherently positive or negative, but rather the consequences of the trait being dependent on the situation and concomitant goals and motivations.

Problematic life outcomes

Low conscientiousness has been linked to antisocial and criminal behaviors, as well as unemployment, homelessness, and imprisonment. Low conscientiousness and low agreeableness taken together are also associated with substance abuse. People low in conscientiousness have difficulty saving money and have different borrowing practices than conscientious people. High conscientiousness is associated with more careful planning of shopping trips and less impulse buying of unneeded items. Conscientiousness has been found to be positively correlated with business and white-collar crime.

Health and longevity

According to an 80-year old and ongoing study started in 1921 by psychologist Lewis Terman on over 1,500 gifted adolescent Californians, "The strongest predictor of long life was conscientiousness." Specific behaviors associated with low conscientiousness may explain its influence on longevity. Nine different behaviors that are among the leading causes of mortality—alcohol use, disordered eating (including obesity), drug use, lack of exercise, risky sexual behavior, risky driving, tobacco use, suicide, and violence—are all predicted by low conscientiousness. Health behaviors are more strongly correlated with the conventionality rather than the impulse-control aspect of conscientiousness. Apparently, social norms influence many health-relevant behavior, such as healthy diet and exercise, not smoking and moderate drinking, and highly conscientious people adhere the most strongly to these norms. Additionally, conscientiousness is positively related to health behaviors such as regular visits to a doctor, checking smoke alarms, and adherence to medication regimens. Such behavior may better safeguard health and prevent disease.

Relationships

Relationship quality is positively associated with partners' level of conscientiousness, and highly conscientious people are less likely to get divorced. Conscientiousness is associated with lower rates of behavior associated with divorce, such as extramarital affairs, spousal abuse, and alcohol abuse. Conscientious behaviors may have a direct influence on relationship quality, as people low in conscientiousness are less responsible, less responsive to their partners, are more condescending, and less likely to hold back offensive comments. On the other hand, more conscientious people are better at managing conflict and tend to provoke fewer disagreements, perhaps because they elicit less criticism due to their well-controlled and responsible behavior.

Intelligence

Conscientiousness significantly correlated negatively with abstract reasoning (−0.26) and verbal reasoning (−0.23).

Large unselected studies, however, have found null relationships, and the negative relationship sometimes found in selected samples such as universities may result from students whose low ability would reduce their chance of gaining entrance, but who have higher conscientiousness, gaining their GPA via hard work rather than giftedness.

A large study found that fluid intelligence was significantly negatively correlated with the order (−0.15), self-discipline (−0.08), and deliberation (−0.09) subfactors of conscientiousness (all correlations significant with p < 0.001.).

Political attitudes and obedience to authority

Conscientiousness has a weak relationship with conservative political attitudes. Although right-wing authoritarianism is one of the most powerful predictors of prejudice, a large scale meta-analysis found that conscientiousness itself is uncorrelated with general prejudice. Rebellion against control is significantly negatively correlated with conscientiousness.

Conscientiousness is associated with rule compliance, obedience and integrity.

Creativity

The orderliness/dependability subfactors (order, dutifulness, and deliberation) of conscientiousness correlate negatively with creativity while the industriousness/achievement subfactors correlate positively. Another study showed that people who score high on the order subfactor of conscientiousness show less innovative behavior. Group conscientiousness has a negative effect on group performance during creative tasks. Groups with only conscientious members have difficulty solving open-ended problems.

Adaptability

A study from 2006 found that those scoring low on conscientiousness make better decisions after unanticipated changes in the context of a task. Specifically, the subfactors order, dutifulness, and deliberation negatively correlated with decision-making quality, but not competence, achievement striving, and self-discipline.

Religiosity

General religiosity was mainly related to Agreeableness and Conscientiousness of the big five traits.

Societal Health

Research comparing countries on personality traits has largely found that countries with high average levels of conscientiousness tend to be poorer, less democratic, and to have lower life expectancy compared to their less conscientious counterparts. Less conscientious nations had higher rates of atheism and of alcohol consumption. As discussed earlier, at the individual level, conscientiousness is associated with valuing security, conformity and tradition. Adherence to such values might be more adaptive under harsh living circumstances than more comfortable and prosperous ones.

Geography

United States

Average levels of conscientiousness vary by state in the United States. People living in the central part, including the states of Kansas, Nebraska, Oklahoma, and Missouri, tend to have higher scores on average than people living in other regions. People in the southwestern states of New Mexico, Utah, and Arizona also have relatively high average scores on conscientiousness. Among the eastern states, Florida is the only one that scores in the top ten for this personality trait. The four states with the lowest scores on conscientiousness on average were, in descending order, Rhode Island, Hawaii, Maine, and Alaska.

Great Britain

A large scale survey of residents of Great Britain found that average levels of all the Big Five, including conscientiousness, vary across regional districts in England, Wales and Scotland. High levels of conscientiousness were found throughout much of Southern England, scattered areas of the Midlands, and most of the Scottish Highlands. Low levels of conscientiousness were observed in London, Wales, and parts of the North of England. Higher mean levels of regional conscientiousness were positively correlated with voting for the Conservative Party, and negatively correlated with voting for the Labour Party, in the 2005 and 2010 elections, and also correlated with a higher proportion of married residents, with higher life expectancy for men and women, fewer long-term health problems, and with lower rates of mortality from stroke, cancer, and heart disease. Higher regional conscientiousness was also correlated with lower median annual income in 2011.

Niche differentiation

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Niche_differentiation

In ecology, niche differentiation (also known as niche segregation, niche separation and niche partitioning) refers to the process by which competing species use the environment differently in a way that helps them to coexist. The competitive exclusion principle states that if two species with identical niches (ecological roles) compete, then one will inevitably drive the other to extinction. This rule also states that two species cannot occupy the same exact niche in a habitat and coexist together, at least in a stable manner. When two species differentiate their niches, they tend to compete less strongly, and are thus more likely to coexist. Species can differentiate their niches in many ways, such as by consuming different foods, or using different areas of the environment.

As an example of niche partitioning, several anole lizards in the Caribbean islands share common diets—mainly insects. They avoid competition by occupying different physical locations. Although these lizards might occupy different locations, some species can be found inhabiting the same range, with up to 15 in certain areas. For example, some live on the ground while others are arboreal. Species who live in different areas compete less for food and other resources, which minimizes competition between species. However, species who live in similar areas typically compete with each other.

Detection and quantification

The Lotka–Volterra equation states that two competing species can coexist when intra-specific (within species) competition is greater than inter-specific (between species) competition (Armstrong and McGehee 1981). Since niche differentiation concentrates competition within-species, due to a decrease in between-species competition, the Lotka-Volterra model predicts that niche differentiation of any degree will result in coexistence.

In reality, this still leaves the question of how much differentiation is needed for coexistence (Hutchinson 1959). A vague answer to this question is that the more similar two species are, the more finely balanced the suitability of their environment must be in order to allow coexistence. There are limits to the amount of niche differentiation required for coexistence, and this can vary with the type of resource, the nature of the environment, and the amount of variation both within and between the species.

To answer questions about niche differentiation, it is necessary for ecologists to be able to detect, measure, and quantify the niches of different coexisting and competing species. This is often done through a combination of detailed ecological studies, controlled experiments (to determine the strength of competition), and mathematical models (Strong 1982, Leibold 1995). To understand the mechanisms of niche differentiation and competition, much data must be gathered on how the two species interact, how they use their resources, and the type of ecosystem in which they exist, among other factors. In addition, several mathematical models exist to quantify niche breadth, competition, and coexistence (Bastolla et al. 2005). However, regardless of methods used, niches and competition can be distinctly difficult to measure quantitatively, and this makes detection and demonstration of niche differentiation difficult and complex.

Development

Over time, two competing species can either coexist, through niche differentiation or other means, or compete until one species becomes locally extinct. Several theories exist for how niche differentiation arises or evolves given these two possible outcomes.

Current competition (The Ghost of Competition Present)

Niche differentiation can arise from current competition. For instance, species X has a fundamental niche of the entire slope of a hillside, but its realized niche is only the top portion of the slope because species Y, which is a better competitor but cannot survive on the top portion of the slope, has excluded it from the lower portion of the slope. With this scenario, competition will continue indefinitely in the middle of the slope between these two species. Because of this, detection of the presence of niche differentiation (through competition) will be relatively easy. It is also important to remember that there is no evolutionary change of the individual species in this case; rather this is an ecological effect of species Y out-competing species X within the bounds of species Y's fundamental niche.

Via past extinctions (The Ghost of Competition Past)

Another way by which niche differentiation can arise is via the previous elimination of species without realized niches. This asserts that at some point in the past, several species inhabited an area, and all of these species had overlapping fundamental niches. However, through competitive exclusion, the less competitive species were eliminated, leaving only the species that were able to coexist (i.e. the most competitive species whose realized niches did not overlap). Again, this process does not include any evolutionary change of individual species, but it is merely the product of the competitive exclusion principle. Also, because no species is out-competing any other species in the final community, the presence of niche differentiation will be difficult or impossible to detect.

Evolving differences

Finally, niche differentiation can arise as an evolutionary effect of competition. In this case, two competing species will evolve different patterns of resource use so as to avoid competition. Here too, current competition is absent or low, and therefore detection of niche differentiation is difficult or impossible.

Types

Below is a list of ways that species can partition their niche. This list is not exhaustive, but illustrates several classic examples.

Resource partitioning

Resource partitioning is the phenomenon where two or more species divides out resources like food, space, resting sites etc. to coexist. For example, some lizard species appear to coexist because they consume insects of differing sizes. Alternatively, species can coexist on the same resources if each species is limited by different resources, or differently able to capture resources. Different types of phytoplankton can coexist when different species are differently limited by nitrogen, phosphorus, silicon, and light. In the Galapagos Islands, finches with small beaks are more able to consume small seeds, and finches with large beaks are more able to consume large seeds. If a species' density declines, then the food it most depends on will become more abundant (since there are so few individuals to consume it). As a result, the remaining individuals will experience less competition for food.

Although "resource" generally refers to food, species can partition other non-consumable objects, such as parts of the habitat. For example, warblers are thought to coexist because they nest in different parts of trees. Species can also partition habitat in a way that gives them access to different types of resources. As stated in the introduction, anole lizards appear to coexist because each uses different parts of the forests as perch locations. This likely gives them access to different species of insects.

Predator partitioning

Predator partitioning occurs when species are attacked differently by different predators (or natural enemies more generally). For example, trees could differentiate their niche if they are consumed by different species of specialist herbivores, such as herbivorous insects. If a species density declines, so too will the density of its natural enemies, giving it an advantage. Thus, if each species is constrained by different natural enemies, they will be able to coexist. Early work focused on specialist predators; however, more recent studies have shown that predators do not need to be pure specialists, they simply need to affect each prey species differently. The Janzen–Connell hypothesis represents a form of predator partitioning.

Conditional differentiation

Conditional differentiation (sometimes called temporal niche partitioning) occurs when species differ in their competitive abilities based on varying environmental conditions. For example, in the Sonoran Desert, some annual plants are more successful during wet years, while others are more successful during dry years. As a result, each species will have an advantage in some years, but not others. When environmental conditions are most favorable, individuals will tend to compete most strongly with member of the same species. For example, in a dry year, dry-adapted plants will tend to be most limited by other dry-adapted plants. This can help them to coexist through a storage effect.

Competition-predation trade-off

Species can differentiate their niche via a competition-predation trade-off if one species is a better competitor when predators are absent, and the other is better when predators are present. Defenses against predators, such as toxic compounds or hard shells, are often metabolically costly. As a result, species that produce such defenses are often poor competitors when predators are absent. Species can coexist through a competition-predation trade-off if predators are more abundant when the less defended species is common, and less abundant if the well-defended species is common. This effect has been criticized as being weak, because theoretical models suggest that only two species within a community can coexist because of this mechanism.

Coexistence without niche differentiation: exceptions to the rule

Some competing species have been shown to coexist on the same resource with no observable evidence of niche differentiation and in “violation” of the competitive exclusion principle. One instance is in a group of hispine beetle species (Strong 1982). These beetle species, which eat the same food and occupy the same habitat, coexist without any evidence of segregation or exclusion. The beetles show no aggression either intra- or inter-specifically. Coexistence may be possible through a combination of non-limiting food and habitat resources and high rates of predation and parasitism, though this has not been demonstrated.

This example illustrates that the evidence for niche differentiation is by no means universal. Niche differentiation is also not the only means by which coexistence is possible between two competing species (see Shmida and Ellner 1984). However, niche differentiation is a critically important ecological idea which explains species coexistence, thus promoting the high biodiversity often seen in many of the world's biomes.

Research using mathematical modelling is indeed demonstrating that predation can indeed stabilize lumps of very similar species. Willow warbler and chiffchaff and other very similar warblers can serve as an example. The idea is that it is also a good strategy to be very similar to a successful species or have enough dissimilarity. Also trees in the rain forest can serve as an example of all high canopy species basically following the same strategy. Other examples of nearly identical species clusters occupying the same niche were water beetles, prairie birds and algae. The basic idea is that there can be clusters of very similar species all applying the same successful strategy and between them open spaces. Here the species cluster takes the place of a single species in the classical ecological models.

Search This Blog

Thursday, March 11, 2021

Biostatistics

History

Biostatistics and Genetics

Research planning

Research question

Hypothesis definition

Sampling

Experimental design

Data collection

Analysis and data interpretation

Descriptive Tools

Inferential Statistics

Statistical considerations

Power and statistical error

p-value

Multiple testing

Mis-specification and robustness checks

Model selection criteria

Developments and Big Data

Use in high-throughput data

Bioinformatics advances in databases, data mining, and biological interpretation

Use of computationally intensive methods

Applications

Public health

Quantitative genetics

Expression data

Other studies

Tools

Scope and training programs

Specialized journals

Wednesday, March 10, 2021

Mathematical and theoretical biology

History

Early history

Recent growth

Areas of research

Abstract relational biology

Algebraic biology

Complex systems biology

Computer models and automata theory

Computational neuroscience

Evolutionary biology

Mathematical biophysics

Deterministic processes (dynamical systems)

Stochastic processes (random dynamical systems)

Spatial modelling

Mathematical methods

Molecular set theory

Organizational biology

Model example: the cell cycle

Conscientiousness

Personality models

Origin

Measurement

Lexical

Statement

Behavior

Development

Daily life

Academic and workplace performance

Subjective well-being

Problematic life outcomes

Health and longevity

Relationships

Intelligence

Political attitudes and obedience to authority

Creativity

Adaptability

Religiosity

Societal Health

Geography

United States

Great Britain

Niche differentiation

Detection and quantification

Development

Current competition (The Ghost of Competition Present)

Via past extinctions (The Ghost of Competition Past)