Search This Blog

Monday, December 4, 2023

Gaussian process

From Wikipedia, the free encyclopedia

In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution, i.e. every finite linear combination of them is normally distributed. The distribution of a Gaussian process is the joint distribution of all those (infinitely many) random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

The concept of Gaussian processes is named after Carl Friedrich Gauss because it is based on the notion of the Gaussian distribution (normal distribution). Gaussian processes can be seen as an infinite-dimensional generalization of multivariate normal distributions.

Gaussian processes are useful in statistical modelling, benefiting from properties inherited from the normal distribution. For example, if a random process is modelled as a Gaussian process, the distributions of various derived quantities can be obtained explicitly. Such quantities include the average value of the process over a range of times and the error in estimating the average using sample values at a small set of times. While exact models often scale poorly as the amount of data increases, multiple approximation methods have been developed which often retain good accuracy while drastically reducing computation time.

Definition

A time continuous stochastic process is Gaussian if and only if for every finite set of indices in the index set

is a multivariate Gaussian random variable. That is the same as saying every linear combination of has a univariate normal (or Gaussian) distribution.

Using characteristic functions of random variables, the Gaussian property can be formulated as follows: is Gaussian if and only if, for every finite set of indices , there are real-valued , with such that the following equality holds for all

where denotes the imaginary unit such that .

The numbers and can be shown to be the covariances and means of the variables in the process.

Variance

The variance of a Gaussian process is finite at any time , formally

Stationarity

For general stochastic processes strict-sense stationarity implies wide-sense stationarity but not every wide-sense stationary stochastic process is strict-sense stationary. However, for a Gaussian stochastic process the two concepts are equivalent.

A Gaussian stochastic process is strict-sense stationary if, and only if, it is wide-sense stationary.

Example

There is an explicit representation for stationary Gaussian processes. A simple example of this representation is

where and are independent random variables with the standard normal distribution.

Covariance functions

A key fact of Gaussian processes is that they can be completely defined by their second-order statistics. Thus, if a Gaussian process is assumed to have mean zero, defining the covariance function completely defines the process' behaviour. Importantly the non-negative definiteness of this function enables its spectral decomposition using the Karhunen–Loève expansion. Basic aspects that can be defined through the covariance function are the process' stationarity, isotropy, smoothness and periodicity.

Stationarity refers to the process' behaviour regarding the separation of any two points and . If the process is stationary, the covariance function depends only on . For example, the Ornstein–Uhlenbeck process is stationary.

If the process depends only on , the Euclidean distance (not the direction) between and , then the process is considered isotropic. A process that is concurrently stationary and isotropic is considered to be homogeneous; in practice these properties reflect the differences (or rather the lack of them) in the behaviour of the process given the location of the observer.

Ultimately Gaussian processes translate as taking priors on functions and the smoothness of these priors can be induced by the covariance function. If we expect that for "near-by" input points and their corresponding output points and to be "near-by" also, then the assumption of continuity is present. If we wish to allow for significant displacement then we might choose a rougher covariance function. Extreme examples of the behaviour is the Ornstein–Uhlenbeck covariance function and the squared exponential where the former is never differentiable and the latter infinitely differentiable.

Periodicity refers to inducing periodic patterns within the behaviour of the process. Formally, this is achieved by mapping the input to a two dimensional vector .

Usual covariance functions

The effect of choosing different kernels on the prior function distribution of the Gaussian process. Left is a squared exponential kernel. Middle is Brownian. Right is quadratic.

There are a number of common covariance functions:

  • Constant :
  • Linear:
  • white Gaussian noise:
  • Squared exponential:
  • Ornstein–Uhlenbeck:
  • Matérn:
  • Periodic:
  • Rational quadratic:

Here . The parameter is the characteristic length-scale of the process (practically, "how close" two points and have to be to influence each other significantly), is the Kronecker delta and the standard deviation of the noise fluctuations. Moreover, is the modified Bessel function of order and is the gamma function evaluated at . Importantly, a complicated covariance function can be defined as a linear combination of other simpler covariance functions in order to incorporate different insights about the data-set at hand.

The inferential results are dependent on the values of the hyperparameters (e.g. and ) defining the model's behaviour. A popular choice for is to provide maximum a posteriori (MAP) estimates of it with some chosen prior. If the prior is very near uniform, this is the same as maximizing the marginal likelihood of the process; the marginalization being done over the observed process values . This approach is also known as maximum likelihood II, evidence maximization, or empirical Bayes.

Continuity

For a Gaussian process, continuity in probability is equivalent to mean-square continuity, and continuity with probability one is equivalent to sample continuity. The latter implies, but is not implied by, continuity in probability. Continuity in probability holds if and only if the mean and autocovariance are continuous functions. In contrast, sample continuity was challenging even for stationary Gaussian processes (as probably noted first by Andrey Kolmogorov), and more challenging for more general processes. As usual, by a sample continuous process one means a process that admits a sample continuous modification

Stationary case

For a stationary Gaussian process some conditions on its spectrum are sufficient for sample continuity, but fail to be necessary. A necessary and sufficient condition, sometimes called Dudley–Fernique theorem, involves the function defined by

(the right-hand side does not depend on due to stationarity). Continuity of in probability is equivalent to continuity of at When convergence of to (as ) is too slow, sample continuity of may fail. Convergence of the following integrals matters:
these two integrals being equal according to integration by substitution The first integrand need not be bounded as thus the integral may converge () or diverge (). Taking for example for large that is, for small one obtains when and when In these two cases the function is increasing on but generally it is not. Moreover, the condition

(*)   there exists such that is monotone on

does not follow from continuity of and the evident relations (for all ) and

Theorem 1 — Let be continuous and satisfy (*). Then the condition is necessary and sufficient for sample continuity of

Some history. Sufficiency was announced by Xavier Fernique in 1964, but the first proof was published by Richard M. Dudley in 1967. Necessity was proved by Michael B. Marcus and Lawrence Shepp in 1970.

There exist sample continuous processes such that they violate condition (*). An example found by Marcus and Shepp is a random lacunary Fourier series

where are independent random variables with standard normal distribution; frequencies are a fast growing sequence; and coefficients satisfy The latter relation implies whence almost surely, which ensures uniform convergence of the Fourier series almost surely, and sample continuity of

Autocorrelation of a random lacunary Fourier series

Its autocovariation function

is nowhere monotone (see the picture), as well as the corresponding function

Brownian motion as the integral of Gaussian processes

A Wiener process (also known as Brownian motion) is the integral of a white noise generalized Gaussian process. It is not stationary, but it has stationary increments.

The Ornstein–Uhlenbeck process is a stationary Gaussian process.

The Brownian bridge is (like the Ornstein–Uhlenbeck process) an example of a Gaussian process whose increments are not independent.

The fractional Brownian motion is a Gaussian process whose covariance function is a generalisation of that of the Wiener process.

Driscoll's zero-one law

Driscoll's zero-one law is a result characterizing the sample functions generated by a Gaussian process.

Let be a mean-zero Gaussian process with non-negative definite covariance function . Let be a Reproducing kernel Hilbert space with positive definite kernel .

Then

where and are the covariance matrices of all possible pairs of points, implies

Moreover,

implies 

This has significant implications when , as

As such, almost all sample paths of a mean-zero Gaussian process with positive definite kernel will lie outside of the Hilbert space .

Linearly constrained Gaussian processes

For many applications of interest some pre-existing knowledge about the system at hand is already given. Consider e.g. the case where the output of the Gaussian process corresponds to a magnetic field; here, the real magnetic field is bound by Maxwell's equations and a way to incorporate this constraint into the Gaussian process formalism would be desirable as this would likely improve the accuracy of the algorithm.

A method on how to incorporate linear constraints into Gaussian processes already exists:

Consider the (vector valued) output function which is known to obey the linear constraint (i.e. is a linear operator)

Then the constraint can be fulfilled by choosing , where is modelled as a Gaussian process, and finding such that
Given and using the fact that Gaussian processes are closed under linear transformations, the Gaussian process for obeying constraint becomes
Hence, linear constraints can be encoded into the mean and covariance function of a Gaussian process.

Applications

An example of Gaussian Process Regression (prediction) compared with other regression models.

A Gaussian process can be used as a prior probability distribution over functions in Bayesian inference. Given any set of N points in the desired domain of your functions, take a multivariate Gaussian whose covariance matrix parameter is the Gram matrix of your N points with some desired kernel, and sample from that Gaussian. For solution of the multi-output prediction problem, Gaussian process regression for vector-valued function was developed. In this method, a 'big' covariance is constructed, which describes the correlations between all the input and output variables taken in N points in the desired domain. This approach was elaborated in detail for the matrix-valued Gaussian processes and generalised to processes with 'heavier tails' like Student-t processes.

Inference of continuous values with a Gaussian process prior is known as Gaussian process regression, or kriging; extending Gaussian process regression to multiple target variables is known as cokriging. Gaussian processes are thus useful as a powerful non-linear multivariate interpolation tool.

Gaussian processes are also commonly used to tackle numerical analysis problems such as numerical integration, solving differential equations, or optimisation in the field of probabilistic numerics.

Gaussian processes can also be used in the context of mixture of experts models, for example. The underlying rationale of such a learning framework consists in the assumption that a given mapping cannot be well captured by a single Gaussian process model. Instead, the observation space is divided into subsets, each of which is characterized by a different mapping function; each of these is learned via a different Gaussian process component in the postulated mixture.

In the natural sciences, Gaussian processes have found use as probabilistic models of astronomical time series and as predictors of molecular properties.

Gaussian process prediction, or Kriging

Gaussian Process Regression (prediction) with a squared exponential kernel. Left plot are draws from the prior function distribution. Middle are draws from the posterior. Right is mean prediction with one standard deviation shaded.

When concerned with a general Gaussian process regression problem (Kriging), it is assumed that for a Gaussian process observed at coordinates , the vector of values is just one sample from a multivariate Gaussian distribution of dimension equal to number of observed coordinates . Therefore, under the assumption of a zero-mean distribution, , where is the covariance matrix between all possible pairs for a given set of hyperparameters θ. As such the log marginal likelihood is:

and maximizing this marginal likelihood towards θ provides the complete specification of the Gaussian process f. One can briefly note at this point that the first term corresponds to a penalty term for a model's failure to fit observed values and the second term to a penalty term that increases proportionally to a model's complexity. Having specified θ, making predictions about unobserved values at coordinates x* is then only a matter of drawing samples from the predictive distribution where the posterior mean estimate A is defined as

and the posterior variance estimate B is defined as:
where is the covariance between the new coordinate of estimation x* and all other observed coordinates x for a given hyperparameter vector θ, and are defined as before and is the variance at point x* as dictated by θ. It is important to note that practically the posterior mean estimate of (the "point estimate") is just a linear combination of the observations ; in a similar manner the variance of is actually independent of the observations . A known bottleneck in Gaussian process prediction is that the computational complexity of inference and likelihood evaluation is cubic in the number of points |x|, and as such can become unfeasible for larger data sets. Works on sparse Gaussian processes, that usually are based on the idea of building a representative set for the given process f, try to circumvent this issue. The kriging method can be used in the latent level of a nonlinear mixed-effects model for a spatial functional prediction: this technique is called the latent kriging.

Often, the covariance has the form , where is a scaling parameter. Examples are the Matérn class covariance functions. If this scaling parameter is either known or unknown (i.e. must be marginalized), then the posterior probability, , i.e. the probability for the hyperparameters given a set of data pairs of observations of and , admits an analytical expression.

Bayesian neural networks as Gaussian processes

Bayesian neural networks are a particular type of Bayesian network that results from treating deep learning and artificial neural network models probabilistically, and assigning a prior distribution to their parameters. Computation in artificial neural networks is usually organized into sequential layers of artificial neurons. The number of neurons in a layer is called the layer width. As layer width grows large, many Bayesian neural networks reduce to a Gaussian process with a closed form compositional kernel. This Gaussian process is called the Neural Network Gaussian Process (NNGP). It allows predictions from Bayesian neural networks to be more efficiently evaluated, and provides an analytic tool to understand deep learning models.

Computational issues

In practical applications, Gaussian process models are often evaluated on a grid leading to multivariate normal distributions. Using these models for prediction or parameter estimation using maximum likelihood requires evaluating a multivariate Gaussian density, which involves calculating the determinant and the inverse of the covariance matrix. Both of these operations have cubic computational complexity which means that even for grids of modest sizes, both operations can have a prohibitive computational cost. This drawback led to the development of multiple approximation methods.

Gene duplication

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Gene_duplication

Gene duplication (or chromosomal duplication or gene amplification) is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene. Gene duplications can arise as products of several types of errors in DNA replication and repair machinery as well as through fortuitous capture by selfish genetic elements. Common sources of gene duplications include ectopic recombination, retrotransposition event, aneuploidy, polyploidy, and replication slippage.

Mechanisms of duplication

Ectopic recombination

Duplications arise from an event termed unequal crossing-over that occurs during meiosis between misaligned homologous chromosomes. The chance of it happening is a function of the degree of sharing of repetitive elements between two chromosomes. The products of this recombination are a duplication at the site of the exchange and a reciprocal deletion. Ectopic recombination is typically mediated by sequence similarity at the duplicate breakpoints, which form direct repeats. Repetitive genetic elements such as transposable elements offer one source of repetitive DNA that can facilitate recombination, and they are often found at duplication breakpoints in plants and mammals.

Schematic of a region of a chromosome before and after a duplication event

Replication slippage

Replication slippage is an error in DNA replication that can produce duplications of short genetic sequences. During replication DNA polymerase begins to copy the DNA. At some point during the replication process, the polymerase dissociates from the DNA and replication stalls. When the polymerase reattaches to the DNA strand, it aligns the replicating strand to an incorrect position and incidentally copies the same section more than once. Replication slippage is also often facilitated by repetitive sequences, but requires only a few bases of similarity.

Retrotransposition

Retrotransposons, mainly L1, can occasionally act on cellular mRNA. Transcripts are reverse transcribed to DNA and inserted into random place in the genome, creating retrogenes. Resulting sequence usually lack introns and often contain poly, sequences that are also integrated into the genome. Many retrogenes display changes in gene regulation in comparison to their parental gene sequences, which sometimes results in novel functions. Retrogenes can move between different chromosomes to shape chromosomal evolution.

Aneuploidy

Aneuploidy occurs when nondisjunction at a single chromosome results in an abnormal number of chromosomes. Aneuploidy is often harmful and in mammals regularly leads to spontaneous abortions (miscarriages). Some aneuploid individuals are viable, for example trisomy 21 in humans, which leads to Down syndrome. Aneuploidy often alters gene dosage in ways that are detrimental to the organism; therefore, it is unlikely to spread through populations.

Polyploidy

Polyploidy, or whole genome duplication is a product of nondisjunction during meiosis which results in additional copies of the entire genome. Polyploidy is common in plants, but it has also occurred in animals, with two rounds of whole genome duplication (2R event) in the vertebrate lineage leading to humans. It has also occurred in the hemiascomycete yeasts ~100 mya.

After a whole genome duplication, there is a relatively short period of genome instability, extensive gene loss, elevated levels of nucleotide substitution and regulatory network rewiring. In addition, gene dosage effects play a significant role. Thus, most duplicates are lost within a short period, however, a considerable fraction of duplicates survive. Interestingly, genes involved in regulation are preferentially retained. Furthermore, retention of regulatory genes, most notably the Hox genes, has led to adaptive innovation.

Rapid evolution and functional divergence have been observed at the level of the transcription of duplicated genes, usually by point mutations in short transcription factor binding motifs. Furthermore, rapid evolution of protein phosphorylation motifs, usually embedded within rapidly evolving intrinsically disordered regions is another contributing factor for survival and rapid adaptation/neofunctionalization of duplicate genes. Thus, a link seems to exist between gene regulation (at least at the post-translational level) and genome evolution.

Polyploidy is also a well known source of speciation, as offspring, which have different numbers of chromosomes compared to parent species, are often unable to interbreed with non-polyploid organisms. Whole genome duplications are thought to be less detrimental than aneuploidy as the relative dosage of individual genes should be the same.

As an evolutionary event

Evolutionary fate of duplicate genes

Rate of gene duplication

Comparisons of genomes demonstrate that gene duplications are common in most species investigated. This is indicated by variable copy numbers (copy number variation) in the genome of humans or fruit flies. However, it has been difficult to measure the rate at which such duplications occur. Recent studies yielded a first direct estimate of the genome-wide rate of gene duplication in C. elegans, the first multicellular eukaryote for which such as estimate became available. The gene duplication rate in C. elegans is on the order of 10−7 duplications/gene/generation, that is, in a population of 10 million worms, one will have a gene duplication per generation. This rate is two orders of magnitude greater than the spontaneous rate of point mutation per nucleotide site in this species. Older (indirect) studies reported locus-specific duplication rates in bacteria, Drosophila, and humans ranging from 10−3 to 10−7/gene/generation.

Neofunctionalization

Gene duplications are an essential source of genetic novelty that can lead to evolutionary innovation. Duplication creates genetic redundancy, where the second copy of the gene is often free from selective pressure—that is, mutations of it have no deleterious effects to its host organism. If one copy of a gene experiences a mutation that affects its original function, the second copy can serve as a 'spare part' and continue to function correctly. Thus, duplicate genes accumulate mutations faster than a functional single-copy gene, over generations of organisms, and it is possible for one of the two copies to develop a new and different function. Some examples of such neofunctionalization is the apparent mutation of a duplicated digestive gene in a family of ice fish into an antifreeze gene and duplication leading to a novel snake venom gene and the synthesis of 1 beta-hydroxytestosterone in pigs.

Gene duplication is believed to play a major role in evolution; this stance has been held by members of the scientific community for over 100 years. Susumu Ohno was one of the most famous developers of this theory in his classic book Evolution by gene duplication (1970). Ohno argued that gene duplication is the most important evolutionary force since the emergence of the universal common ancestor. Major genome duplication events can be quite common. It is believed that the entire yeast genome underwent duplication about 100 million years ago. Plants are the most prolific genome duplicators. For example, wheat is hexaploid (a kind of polyploid), meaning that it has six copies of its genome.

Subfunctionalization

Another possible fate for duplicate genes is that both copies are equally free to accumulate degenerative mutations, so long as any defects are complemented by the other copy. This leads to a neutral "subfunctionalization" (a process of constructive neutral evolution) or DDC (duplication-degeneration-complementation) model, in which the functionality of the original gene is distributed among the two copies. Neither gene can be lost, as both now perform important non-redundant functions, but ultimately neither is able to achieve novel functionality.

Subfunctionalization can occur through neutral processes in which mutations accumulate with no detrimental or beneficial effects. However, in some cases subfunctionalization can occur with clear adaptive benefits. If an ancestral gene is pleiotropic and performs two functions, often neither one of these two functions can be changed without affecting the other function. In this way, partitioning the ancestral functions into two separate genes can allow for adaptive specialization of subfunctions, thereby providing an adaptive benefit.

Loss

Often the resulting genomic variation leads to gene dosage dependent neurological disorders such as Rett-like syndrome and Pelizaeus–Merzbacher disease. Such detrimental mutations are likely to be lost from the population and will not be preserved or develop novel functions. However, many duplications are, in fact, not detrimental or beneficial, and these neutral sequences may be lost or may spread through the population through random fluctuations via genetic drift.

Identifying duplications in sequenced genomes

Criteria and single genome scans

The two genes that exist after a gene duplication event are called paralogs and usually code for proteins with a similar function and/or structure. By contrast, orthologous genes present in different species which are each originally derived from the same ancestral sequence. (See Homology of sequences in genetics).

It is important (but often difficult) to differentiate between paralogs and orthologs in biological research. Experiments on human gene function can often be carried out on other species if a homolog to a human gene can be found in the genome of that species, but only if the homolog is orthologous. If they are paralogs and resulted from a gene duplication event, their functions are likely to be too different. One or more copies of duplicated genes that constitute a gene family may be affected by insertion of transposable elements that causes significant variation between them in their sequence and finally may become responsible for divergent evolution. This may also render the chances and the rate of gene conversion between the homologs of gene duplicates due to less or no similarity in their sequences.

Paralogs can be identified in single genomes through a sequence comparison of all annotated gene models to one another. Such a comparison can be performed on translated amino acid sequences (e.g. BLASTp, tBLASTx) to identify ancient duplications or on DNA nucleotide sequences (e.g. BLASTn, megablast) to identify more recent duplications. Most studies to identify gene duplications require reciprocal-best-hits or fuzzy reciprocal-best-hits, where each paralog must be the other's single best match in a sequence comparison.

Most gene duplications exist as low copy repeats (LCRs), rather highly repetitive sequences like transposable elements. They are mostly found in pericentronomic, subtelomeric and interstitial regions of a chromosome. Many LCRs, due to their size (>1Kb), similarity, and orientation, are highly susceptible to duplications and deletions.

Genomic microarrays detect duplications

Technologies such as genomic microarrays, also called array comparative genomic hybridization (array CGH), are used to detect chromosomal abnormalities, such as microduplications, in a high throughput fashion from genomic DNA samples. In particular, DNA microarray technology can simultaneously monitor the expression levels of thousands of genes across many treatments or experimental conditions, greatly facilitating the evolutionary studies of gene regulation after gene duplication or speciation.

Next generation sequencing

Gene duplications can also be identified through the use of next-generation sequencing platforms. The simplest means to identify duplications in genomic resequencing data is through the use of paired-end sequencing reads. Tandem duplications are indicated by sequencing read pairs which map in abnormal orientations. Through a combination of increased sequence coverage and abnormal mapping orientation, it is possible to identify duplications in genomic sequencing data.

Nomenclature

Human karyotype with annotated bands and sub-bands as used for the nomenclature of chromosome abnormalities. It shows dark and white regions as seen on G banding. Each row is vertically aligned at centromere level. It shows 22 homologous autosomal chromosome pairs, both the female (XX) and male (XY) versions of the two sex chromosomes, as well as the mitochondrial genome (at bottom left).

The International System for Human Cytogenomic Nomenclature (ISCN) is an international standard for human chromosome nomenclature, which includes band names, symbols and abbreviated terms used in the description of human chromosome and chromosome abnormalities. Abbreviations include dup for duplications of parts of a chromosome. For example, dup(17p12) causes Charcot–Marie–Tooth disease type 1A.

As amplification

Gene duplication does not necessarily constitute a lasting change in a species' genome. In fact, such changes often don't last past the initial host organism. From the perspective of molecular genetics, gene amplification is one of many ways in which a gene can be overexpressed. Genetic amplification can occur artificially, as with the use of the polymerase chain reaction technique to amplify short strands of DNA in vitro using enzymes, or it can occur naturally, as described above. If it's a natural duplication, it can still take place in a somatic cell, rather than a germline cell (which would be necessary for a lasting evolutionary change).

Role in cancer

Duplications of oncogenes are a common cause of many types of cancer. In such cases the genetic duplication occurs in a somatic cell and affects only the genome of the cancer cells themselves, not the entire organism, much less any subsequent offspring. Recent comprehensive patient-level classification and quantification of driver events in TCGA cohorts revealed that there are on average 12 driver events per tumor, of which 1.5 are amplifications of oncogenes.

Common oncogene amplifications in human cancers
Cancer type Associated gene
amplifications
Prevalence of
amplification
in cancer type
(percent)
Breast cancer MYC 20%
ERBB2 (HER2) 20%
CCND1 (Cyclin D1) 15–20%
FGFR1 12%
FGFR2 12%
Cervical cancer MYC 25–50%
ERBB2 20%
Colorectal cancer HRAS 30%
KRAS 20%
MYB 15–20%
Esophageal cancer MYC 40%
CCND1 25%
MDM2 13%
Gastric cancer CCNE (Cyclin E) 15%
KRAS 10%
MET 10%
Glioblastoma ERBB1 (EGFR) 33–50%
CDK4 15%
Head and neck cancer CCND1 50%
ERBB1 10%
MYC 7–10%
Hepatocellular cancer CCND1 13%
Neuroblastoma MYCN 20–25%
Ovarian cancer MYC 20–30%
ERBB2 15–30%
AKT2 12%
Sarcoma MDM2 10–30%
CDK4 10%
Small cell lung cancer MYC 15–20%


Whole-genome duplications are also frequent in cancers, detected in 30% to 36% of tumors from the most common cancer types. Their exact role in carcinogenesis is unclear, but they in some cases lead to loss of chromatin segregation leading to chromatin conformation changes that in turn lead to oncogenic epigenetic and transcriptional modifications.

Entropy (information theory)

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Entropy_(information_theory) In info...