Search This Blog

Sunday, May 13, 2018

Population genetics

From Wikipedia, the free encyclopedia

Population genetics is a subfield of genetics that deals with genetic differences within and between populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and population structure.[1]

Population genetics was a vital ingredient in the emergence of the modern evolutionary synthesis. Its primary founders were Sewall Wright, J. B. S. Haldane and Ronald Fisher, who also laid the foundations for the related discipline of quantitative genetics. Traditionally a highly mathematical discipline, modern population genetics encompasses theoretical, lab, and field work. Population genetic models are used both for statistical inference from DNA sequence data and for proof/disproof of concept.[2]

What sets population genetics apart today from newer, more phenotypic approaches to modelling evolution, such as evolutionary game theory and adaptive dynamics, is its emphasis on genetic phenomena as dominance, epistasis, and the degree to which genetic recombination breaks up linkage disequilibrium. This makes it appropriate for comparison to population genomics data.

History



Population genetics began as a reconciliation of Mendelian inheritance and biostatistics models.  Natural selection will only cause evolution if there is enough genetic variation in a population. Before the discovery of Mendelian genetics, one common hypothesis was blending inheritance. But with blending inheritance, genetic variance would be rapidly lost, making evolution by natural or sexual selection implausible. The Hardy–Weinberg principle provides the solution to how variation is maintained in a population with Mendelian inheritance. According to this principle, the frequencies of alleles (variations in a gene) will remain constant in the absence of selection, mutation, migration and genetic drift.[3]

The typical white-bodied form of the peppered moth.
Industrial melanism: the black-bodied form of the peppered moth appeared in polluted areas.
The next key step was the work of the British biologist and statistician Ronald Fisher. In a series of papers starting in 1918 and culminating in his 1930 book The Genetical Theory of Natural Selection, Fisher showed that the continuous variation measured by the biometricians could be produced by the combined action of many discrete genes, and that natural selection could change allele frequencies in a population, resulting in evolution. In a series of papers beginning in 1924, another British geneticist, J.B.S. Haldane, worked out the mathematics of allele frequency change at a single gene locus under a broad range of conditions. Haldane also applied statistical analysis to real-world examples of natural selection, such as peppered moth evolution and industrial melanism, and showed that selection coefficients could be larger than Fisher assumed, leading to more rapid adaptive evolution as a camouflage strategy following increased pollution.[4][5]


J.B.S.Haldane

The American biologist Sewall Wright, who had a background in animal breeding experiments, focused on combinations of interacting genes, and the effects of inbreeding on small, relatively isolated populations that exhibited genetic drift. In 1932 Wright introduced the concept of an adaptive landscape and argued that genetic drift and inbreeding could drive a small, isolated sub-population away from an adaptive peak, allowing natural selection to drive it towards different adaptive peaks.

The work of Fisher, Haldane and Wright founded the discipline of population genetics. This integrated natural selection with Mendelian genetics, which was the critical first step in developing a unified theory of how evolution worked.[4][5] John Maynard Smith was Haldane's pupil, whilst W.D. Hamilton was heavily influenced by the writings of Fisher. The American George R. Price worked with both Hamilton and Maynard Smith. American Richard Lewontin and Japanese Motoo Kimura were heavily influenced by Wright.

Modern synthesis

The mathematics of population genetics were originally developed as the beginning of the modern synthesis. Authors such as Beatty[6] have asserted that population genetics defines the core of the modern synthesis. For the first few decades of the 20th century, most field naturalists continued to believe that Lamarckism and orthogenesis provided the best explanation for the complexity they observed in the living world.[7] During the modern synthesis, these ideas were purged, and only evolutionary causes that could be expressed in the mathematical framework of population genetics were retained.[8] Consensus was reached as to which evolutionary factors might influence evolution, but not as to the relative importance of the various factors.[8]

Theodosius Dobzhansky, a postdoctoral worker in T. H. Morgan's lab, had been influenced by the work on genetic diversity by Russian geneticists such as Sergei Chetverikov. He helped to bridge the divide between the foundations of microevolution developed by the population geneticists and the patterns of macroevolution observed by field biologists, with his 1937 book Genetics and the Origin of Species. Dobzhansky examined the genetic diversity of wild populations and showed that, contrary to the assumptions of the population geneticists, these populations had large amounts of genetic diversity, with marked differences between sub-populations. The book also took the highly mathematical work of the population geneticists and put it into a more accessible form. Many more biologists were influenced by population genetics via Dobzhansky than were able to read the highly mathematical works in the original.[9]

In Great Britain E.B. Ford, the pioneer of ecological genetics, continued throughout the 1930s and 1940s to empirically demonstrate the power of selection due to ecological factors including the ability to maintain genetic diversity through genetic polymorphisms such as human blood types. Ford's work, in collaboration with Fisher, contributed to a shift in emphasis during the course of the modern synthesis towards natural selection as the dominant force.[4][5][10][11]

Neutral theory and origin-fixation dynamics

The original, modern synthesis view of population genetics assumes that mutations provide ample raw material, and focuses only on the change in frequency of alleles within populations.[12] The main processes influencing allele frequencies are natural selection, genetic drift, gene flow and recurrent mutation. Fisher and Wright had some fundamental disagreements about the relative roles of selection and drift.[13]

The availability of molecular data on all genetic differences led to the neutral theory of molecular evolution. In this view, many mutations are deleterious and so never observed, and most of the remainder are neutral, i.e. are not under selection. With the fate of each neutral mutation left to chance (genetic drift), the direction of evolutionary change is driven by which mutations occur, and so cannot be captured by models of change in the frequency of (existing) alleles alone.[12][14]

The origin-fixation view of population genetics generalizes this approach beyond strictly neutral mutations, and sees the rate at which a particular change happens as the product of the mutation rate and the fixation probability.[12]

Four processes

Selection

Natural selection, which includes sexual selection, is the fact that some traits make it more likely for an organism to survive and reproduce. Population genetics describes natural selection by defining fitness as a propensity or probability of survival and reproduction in a particular environment. The fitness is normally given by the symbol w=1-s where s is the selection coefficient. Natural selection acts on phenotypes, so population genetic models assume relatively simple relationships to predict the phenotype and hence fitness from the allele at one or a small number of loci. In this way, natural selection converts differences in the fitness of individuals with different phenotypes into changes in allele frequency in a population over successive generations.

Before the advent of population genetics, many biologists doubted that small differences in fitness were sufficient to make a large difference to evolution.[9] Population geneticists addressed this concern in part by comparing selection to genetic drift. Selection can overcome genetic drift when s is greater than 1 divided by the effective population size. When this criterion is met, the probability that a new advantageous mutant becomes fixed is approximately equal to 2s.[15][16] The time until fixation of such an allele depends little on genetic drift, and is approximately proportional to log(sN)/s.[17]

Dominance

Dominance means that the phenotypic and/or fitness effect of one allele at a locus depends on which allele is present in the second copy for that locus. Consider three genotypes at one locus, with the following fitness values[18]

- Genotype: A1A1 A1A2 A2A2 - Relative fitness: 1 1-hs 1-s

s is the selection coefficient and h is the dominance coefficient. The value of h yields the following information:



- h=0 A1 dominant, A2 recessive - h=1 A2 dominant, A1 recessive - 0 incomplete dominance - h<0 td=""> overdominance - h>1 Underdominance

Epistasis


The logarithm of fitness as a function of the number of deleterious mutations. Synergistic epistasis is represented by the red line - each subsequent deleterious mutation has a larger proportionate effect on the organism's fitness. Antagonistic epistasis is in blue. The black line shows the non-epistatic case, where fitness is the product of the contributions from each of its loci.

Epistasis means that the phenotypic and/or fitness effect of an allele at one locus depends on which alleles are present at other loci. Selection does not act on a single locus, but on a phenotype that arises through development from a complete genotype.[19] However, many population genetics models of sexual species are "single locus" models, where the fitness of an individual is calculated as the product of the contributions from each of its loci—effectively assuming no epistasis.

In fact, the genotype to fitness landscape is more complex. Population genetics must either model this complexity in detail, or capture it by some simpler average rule. Empirically, beneficial mutations tend to have a smaller fitness benefit when added to a genetic background that already has high fitness: this is known as diminishing returns epistasis.[20] When deleterious mutations also have a smaller fitness effect on high fitness backgrounds, this is known as "synergistic epistasis". However, the effect of deleterious mutations tends on average to be very close to multiplicative, or can even show the opposite pattern, known as "antagonistic epistasis".[21]

Synergistic epistasis is central to some theories of the purging of mutation load[22] and to the evolution of sexual reproduction.

Mutation


Drosophila melanogaster

Mutation is the ultimate source of genetic variation in the form of new alleles. In addition, mutation may influence the direction of evolution when there is mutation bias, i.e. different probabilities for different mutations to occur. For example, recurrent mutation that tends to be in the opposite direction to selection can lead to mutation-selection balance. At the molecular level, if mutation from G to A happens more often than mutation from A to G, then genotypes with A will tend to evolve.[23] Different insertion vs. deletion mutation biases in different taxa can lead to the evolution of different genome sizes.[24][25] Developmental or mutational biases have also been observed in morphological evolution.[26][27] For example, according to the phenotype-first theory of evolution, mutations can eventually cause the genetic assimilation of traits that were previously induced by the environment.[28][29]

Mutation bias effects are superimposed on other processes. If selection would favor either one out of two mutations, but there is no extra advantage to having both, then the mutation that occurs the most frequently is the one that is most likely to become fixed in a population.[30][31]

Mutation can have no effect, alter the product of a gene, or prevent the gene from functioning. Studies in the fly Drosophila melanogaster suggest that if a mutation changes a protein produced by a gene, this will probably be harmful, with about 70 percent of these mutations having damaging effects, and the remainder being either neutral or weakly beneficial.[32] Most loss of function mutations are selected against. But when selection is weak, mutation bias towards loss of function can affect evolution.[33] For example, pigments are no longer useful when animals live in the darkness of caves, and tend to be lost.[34] This kind of loss of function can occur because of mutation bias, and/or because the function had a cost, and once the benefit of the function disappeared, natural selection leads to the loss. Loss of sporulation ability in a bacterium during laboratory evolution appears to have been caused by mutation bias, rather than natural selection against the cost of maintaining sporulation ability.[35] When there is no selection for loss of function, the speed at which loss evolves depends more on the mutation rate than it does on the effective population size,[36] indicating that it is driven more by mutation bias than by genetic drift.

Mutations can involve large sections of DNA becoming duplicated, usually through genetic recombination.[37] This leads to copy-number variation within a population. Duplications are a major source of raw material for evolving new genes.[38] Other types of mutation occasionally create new genes from previously noncoding DNA.[39][40]

Genetic drift

Genetic drift is a change in allele frequencies caused by random sampling.[41] That is, the alleles in the offspring are a random sample of those in the parents.[42] Genetic drift may cause gene variants to disappear completely, and thereby reduce genetic variability. In contrast to natural selection, which makes gene variants more common or less common depending on their reproductive success,[43] the changes due to genetic drift are not driven by environmental or adaptive pressures, and are equally likely to make an allele more common as less common.
The effect of genetic drift is larger for alleles present in few copies than when an allele is present in many copies. The population genetics of genetic drift are described using either branching processes or a diffusion equation describing changes in allele frequency.[44] These approaches are usually applied to the Wright-Fisher and Moran models of population genetics. Assuming genetic drift is the only evolutionary force acting on an allele, after t generations in many replicated populations, starting with allele frequencies of p and q, the variance in allele frequency across those populations is
V_{t}\approx pq\left(1-\exp \left\{-{\frac {t}{2N_{e}}}\right\}\right).[45]
Ronald Fisher held the view that genetic drift plays at the most a minor role in evolution, and this remained the dominant view for several decades. No population genetics perspective have ever given genetic drift a central role by itself, but some have made genetic drift important in combination with another non-selective force. The shifting balance theory of Sewall Wright held that the combination of population structure and genetic drift was important. Motoo Kimura's neutral theory of molecular evolution claims that most genetic differences within and between populations are caused by the combination of neutral mutations and genetic drift.[46]

The role of genetic drift by means of sampling error in evolution has been criticized by John H Gillespie[47] and Will Provine,[48] who argue that selection on linked sites is a more important stochastic force, doing the work traditionally ascribed to genetic drift by means of sampling error. The mathematical properties of genetic draft are different from those of genetic drift.[49] The direction of the random change in allele frequency is autocorrelated across generations.[41]

Gene flow


Gene flow is the transfer of alleles from one population to another population through immigration of individuals. In this example, one of the birds from population A immigrates to population B, which has fewer of the dominant alleles, and through mating incorporates its alleles into the other population.

The Great Wall of China is an obstacle to gene flow of some terrestrial species.

Because of physical barriers to migration, along with the limited tendency for individuals to move or spread (vagility), and tendency to remain or come back to natal place (philopatry), natural populations rarely all interbreed as may be assumed in theoretical random models (panmixy).[50] There is usually a geographic range within which individuals are more closely related to one another than those randomly selected from the general population. This is described as the extent to which a population is genetically structured.[51] Genetic structuring can be caused by migration due to historical climate change, species range expansion or current availability of habitat. Gene flow is hindered by mountain ranges, oceans and deserts or even man-made structures such as the Great Wall of China, which has hindered the flow of plant genes.[52]

Gene flow is the exchange of genes between populations or species, breaking down the structure. Examples of gene flow within a species include the migration and then breeding of organisms, or the exchange of pollen. Gene transfer between species includes the formation of hybrid organisms and horizontal gene transfer. Population genetic models can be used to identify which populations show significant genetic isolation from one another, and to reconstruct their history.[53]

Subjecting a population to isolation leads to inbreeding depression. Migration into a population can introduce new genetic variants,[54] potentially contributing to evolutionary rescue. If a significant proportion of individuals or gametes migrate, it can also change allele frequencies, e.g. giving rise to migration load.[55]

In the presence of gene flow, other barriers to hybridization between two diverging populations of an outcrossing species are required for the populations to become new species.

Horizontal gene transfer


Current tree of life showing vertical and horizontal gene transfers.

Horizontal gene transfer is the transfer of genetic material from one organism to another organism that is not its offspring; this is most common among prokaryotes.[56] In medicine, this contributes to the spread of antibiotic resistance, as when one bacteria acquires resistance genes it can rapidly transfer them to other species.[57] Horizontal transfer of genes from bacteria to eukaryotes such as the yeast Saccharomyces cerevisiae and the adzuki bean beetle Callosobruchus chinensis may also have occurred.[58][59] An example of larger-scale transfers are the eukaryotic bdelloid rotifers, which appear to have received a range of genes from bacteria, fungi, and plants.[60] Viruses can also carry DNA between organisms, allowing transfer of genes even across biological domains.[61] Large-scale gene transfer has also occurred between the ancestors of eukaryotic cells and prokaryotes, during the acquisition of chloroplasts and mitochondria.[62]

Linkage

If all genes are in linkage equilibrium, the effect of an allele at one locus can be averaged across the gene pool at other loci. In reality, one allele is frequently found in linkage disequilibrium with genes at other loci, especially with genes located nearby on the same chromosome. Recombination breaks up this linkage disequilibrium too slowly to avoid genetic hitchhiking, where an allele at one locus rises to high frequency because it is linked to an allele under selection at a nearby locus. Linkage also slows down the rate of adaptation, even in sexual populations.[63][64][65] The effect of linkage disequilibrium in slowing down the rate of adaptive evolution arises from a combination of the Hill–Robertson effect (delays in bringing beneficial mutations together) and background selection (delays in separating beneficial mutations from deleterious hitchhikers).

Linkage is a problem for population genetic models that treat one gene locus at a time. It can, however, be exploited as a method for detecting the action of natural selection via selective sweeps.

In the extreme case of an asexual population, linkage is complete, and population genetic equations can be derived and solved in terms of a travelling wave of genotype frequencies along a simple fitness landscape.[66] Most microbes, such as bacteria, are asexual. The population genetics of their adaptation have two contrasting regimes. When the product of the beneficial mutation rate and population size is small, asexual populations follow a "successional regime" of origin-fixation dynamics, with adaptation rate strongly dependent on this product. When the product is much larger, asexual populations follow a "concurrent mutations" regime with adaptation rate less dependent on the product, characterized by clonal interference and the appearance of a new beneficial mutation before the last one has fixed.

Applications

Explaining levels of genetic variation

Neutral theory predicts that the level of genetic diversity in a population will be proportional to the product of the population size and the neutral mutation rate. The fact that levels of genetic diversity vary much less than population sizes do is known as the "paradox of variation".[67] While high levels of genetic diversity were one of the original arguments in favor of neutral theory, the paradox of variation has been one of the strongest arguments against neutral theory.

It is clear that levels of genetic diversity vary greatly within a species as a function of local recombination rate, due to both genetic hitchhiking and background selection. Most current solutions to the paradox of variation invoke some level of selection at linked sites.[68] For example, one analysis suggests that larger populations have more selective sweeps, which remove more neutral genetic diversity.[69] A negative correlation between mutation rate and population size may also contribute.[70]

Life history affects genetic diversity more than population history does, e.g. r-strategists have more genetic diversity.[68]

Detecting selection

Population genetics models are used to infer which genes are undergoing selection. One common approach is to look for regions of high linkage disequilibrium and low genetic variance along the chromosome, to detect recent selective sweeps.

A second common approach is the McDonald–Kreitman test. The McDonald–Kreitman test compares the amount of variation within a species (polymorphism) to the divergence between species (substitutions) at two types of sites, one assumed to be neutral. Typically, synonymous sites are assumed to be neutral.[71] Genes undergoing positive selection have an excess of divergent sites relative to polymorphic sites. The test can also be use to obtain a genome-wide estimate of the proportion of substitutions that are fixed by positive selection, α.[72][73] According to the neutral theory of molecular evolution, this number should be near zero. High numbers have therefore been interpreted as a genome-wide falsification of neutral theory.[74]

Demographic inference

The simplest test for population structure in a sexually reproducing, diploid species, is to see whether genotype frequencies follow Hardy-Weinberg proportions as a function of allele frequencies. For example, in the simplest case of a single locus with two alleles denoted A and a at frequencies p and q, random mating predicts freq(AA) = p2 for the AA homozygotes, freq(aa) = q2 for the aa homozygotes, and freq(Aa) = 2pq for the heterozygotes. In the absence of population structure, Hardy-Weinberg proportions are reached within 1-2 generations of random mating. More typically, there is an excess of homozygotes, indicative of population structure. The extent of this excess can be quantified as the inbreeding coefficient, F. When individuals can be assigned to different subpopulations, the degree of population structure is usually calculated using FST, which is a measure of the proportion of genetic variance that can be explained by population structure.

Coalescent theory relates genetic diversity in a sample to demographic history of the population from which it was taken. It normally assumes neutrality, and so sequences from more neutrally-evolving portions of genomes are therefore selected for such analyses. It can be used to infer both the relationships between species (phylogenetics) and the population structure, demographic history (e.g. population bottlenecks or population growth), and introgression within a species.

Another approach to demographic inference relies on the allele frequency spectrum.[75]

Evolution of genetic systems

By assuming that there are loci that control the genetic system itself, population genetic models are created to describe the evolution of dominance and other forms of robustness, the evolution of sexual reproduction and recombination rates, the evolution of mutation rates, the evolution of evolutionary capacitors, the evolution of costly signalling traits, the evolution of ageing, and the evolution of co-operation. For example, most mutations are deleterious, so the optimal mutation rate for a species may be a trade-off between the damage from a high deleterious mutation rate and the metabolic costs of maintaining systems to reduce the mutation rate, such as DNA repair enzymes.[76]

One important aspect of such models is that selection is only strong enough to purge deleterious mutations and hence overpower mutational bias towards degradation if the selection coefficient s is greater than the inverse of the effective population size. This is known as the drift barrier and is related to the nearly neutral theory of molecular evolution. Drift barrier theory predicts that species with large effective population sizes will have highly streamlined, efficient genetic systems, while those with small population sizes will have bloated and complex genomes containing for example introns and transposable elements.[77] However, somewhat paradoxically, species with large population sizes might be so tolerant to the consequences of certain types of errors that they evolve higher error rates, e.g. in transcription and translation, than small populations.[78]

Molecular evolution

From Wikipedia, the free encyclopedia

Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.

History

The history of molecular evolution starts in the early 20th century with comparative biochemistry, and the use of "fingerprinting" methods such as immune assays, gel electrophoresis and paper chromatography in the 1950s to explore homologous proteins.[1][2] The field of molecular evolution came into its own in the 1960s and 1970s, following the rise of molecular biology. The advent of protein sequencing allowed molecular biologists to create phylogenies based on sequence comparison, and to use the differences between homologous sequences as a molecular clock to estimate the time since the last universal common ancestor.[1] In the late 1960s, the neutral theory of molecular evolution provided a theoretical basis for the molecular clock,[3] though both the clock and the neutral theory were controversial, since most evolutionary biologists held strongly to panselectionism, with natural selection as the only important cause of evolutionary change. After the 1970s, nucleic acid sequencing allowed molecular evolution to reach beyond proteins to highly conserved ribosomal RNA sequences, the foundation of a reconceptualization of the early history of life.[1]

Forces in molecular evolution

The content and structure of a genome is the product of the molecular and population genetic forces which act upon that genome. Novel genetic variants will arise through mutation and will spread and be maintained in populations due to genetic drift or natural selection.

Mutation

This hedgehog has no pigmentation due to a mutation.

Mutations are permanent, transmissible changes to the genetic material (DNA or RNA) of a cell or virus. Mutations result from errors in DNA replication during cell division and by exposure to radiation, chemicals, and other environmental stressors, or viruses and transposable elements. Most mutations that occur are single nucleotide polymorphisms which modify single bases of the DNA sequence, resulting in point mutations. Other types of mutations modify larger segments of DNA and can cause duplications, insertions, deletions, inversions, and translocations.

Most organisms display a strong bias in the types of mutations that occur with strong influence in GC-content. Transitions (A ↔ G or C ↔ T) are more common than transversions (purine (adenine or guanine)) ↔ pyrimidine (cytosine or thymine, or in RNA, uracil))[4] and are less likely to alter amino acid sequences of proteins.

Mutations are stochastic and typically occur randomly across genes. Mutation rates for single nucleotide sites for most organisms are very low, roughly 10−9 to 10−8 per site per generation, though some viruses have higher mutation rates on the order of 10−6 per site per generation. Among these mutations, some will be neutral or beneficial and will remain in the genome unless lost via genetic drift, and others will be detrimental and will be eliminated from the genome by natural selection.

Because mutations are extremely rare, they accumulate very slowly across generations. While the number of mutations which appears in any single generation may vary, over very long time periods they will appear to accumulate at a regular pace. Using the mutation rate per generation and the number of nucleotide differences between two sequences, divergence times can be estimated effectively via the molecular clock.

Recombination

Recombination involves the breakage and rejoining of two chromosomes (M and F) to produce two re-arranged chromosomes (C1 and C2).

Recombination is a process that results in genetic exchange between chromosomes or chromosomal regions. Recombination counteracts physical linkage between adjacent genes, thereby reducing genetic hitchhiking. The resulting independent inheritance of genes results in more efficient selection, meaning that regions with higher recombination will harbor fewer detrimental mutations, more selectively favored variants, and fewer errors in replication and repair. Recombination can also generate particular types of mutations if chromosomes are misaligned.

Gene conversion

Gene conversion is a type of recombination that is the product of DNA repair where nucleotide damage is corrected using an homologous genomic region as a template. Damaged bases are first excised, the damaged strand is then aligned with an undamaged homolog, and DNA synthesis repairs the excised region using the undamaged strand as a guide. Gene conversion is often responsible for homogenizing sequences of duplicate genes over long time periods, reducing nucleotide divergence.

Genetic drift

Genetic drift is the change of allele frequencies from one generation to the next due to stochastic effects of random sampling in finite populations. Some existing variants have no effect on fitness and may increase or decrease in frequency simply due to chance. "Nearly neutral" variants whose selection coefficient is close to a threshold value of 1 / the effective population size will also be affected by chance as well as by selection and mutation. Many genomic features have been ascribed to accumulation of nearly neutral detrimental mutations as a result of small effective population sizes.[5] With a smaller effective population size, a larger variety of mutations will behave as if they are neutral due to inefficiency of selection.

Selection

Selection occurs when organisms with greater fitness, i.e. greater ability to survive or reproduce, are favored in subsequent generations, thereby increasing the instance of underlying genetic variants in a population. Selection can be the product of natural selection, artificial selection, or sexual selection.  Natural selection is any selective process that occurs due to the fitness of an organism to its environment. In contrast sexual selection is a product of mate choice and can favor the spread of genetic variants which act counter to natural selection but increase desirability to the opposite sex or increase mating success. Artificial selection, also known as selective breeding, is imposed by an outside entity, typically humans, in order to increase the frequency of desired traits.

The principles of population genetics apply similarly to all types of selection, though in fact each may produce distinct effects due to clustering of genes with different functions in different parts of the genome, or due to different properties of genes in particular functional classes. For instance, sexual selection could be more likely to affect molecular evolution of the sex chromosomes due to clustering of sex specific genes on the X, Y, Z or W.

Selection can operate at the gene level at the expense of organismal fitness, resulting in a selective advantage for selfish genetic elements in spite of a host cost. Examples of such selfish elements include transposable elements, meiotic drivers, killer X chromosomes, selfish mitochondria, and self-propagating introns. (See Intragenomic conflict.)

Genome architecture

Genome size

Genome size is influenced by the amount of repetitive DNA as well as number of genes in an organism. The C-value paradox refers to the lack of correlation between organism 'complexity' and genome size. Explanations for the so-called paradox are two-fold. First, repetitive genetic elements can comprise large portions of the genome for many organisms, thereby inflating DNA content of the haploid genome. Secondly, the number of genes is not necessarily indicative of the number of developmental stages or tissue types in an organism. An organism with few developmental stages or tissue types may have large numbers of genes that influence non-developmental phenotypes, inflating gene content relative to developmental gene families.

Neutral explanations for genome size suggest that when population sizes are small, many mutations become nearly neutral. Hence, in small populations repetitive content and other 'junk' DNA can accumulate without placing the organism at a competitive disadvantage. There is little evidence to suggest that genome size is under strong widespread selection in multicellular eukaryotes. Genome size, independent of gene content, correlates poorly with most physiological traits and many eukaryotes, including mammals, harbor very large amounts of repetitive DNA.

However, birds likely have experienced strong selection for reduced genome size, in response to changing energetic needs for flight. Birds, unlike humans, produce nucleated red blood cells, and larger nuclei lead to lower levels of oxygen transport. Bird metabolism is far higher than that of mammals, due largely to flight, and oxygen needs are high. Hence, most birds have small, compact genomes with few repetitive elements. Indirect evidence suggests that non-avian theropod dinosaur ancestors of modern birds [6] also had reduced genome sizes, consistent with endothermy and high energetic needs for running speed. Many bacteria have also experienced selection for small genome size, as time of replication and energy consumption are so tightly correlated with fitness.

Repetitive elements

Transposable elements are self-replicating, selfish genetic elements which are capable of proliferating within host genomes. Many transposable elements are related to viruses, and share several proteins in common.

Chromosome number and organization

The number of chromosomes in an organism's genome also does not necessarily correlate with the amount of DNA in its genome. The ant Myrmecia pilosula has only a single pair of chromosomes[7] whereas the Adders-tongue fern Ophioglossum reticulatum has up to 1260 chromosomes.[8] Cilliate genomes house each gene in individual chromosomes, resulting in a genome which is not physically linked. Reduced linkage through creation of additional chromosomes should effectively increase the efficiency of selection.

Changes in chromosome number can play a key role in speciation, as differing chromosome numbers can serve as a barrier to reproduction in hybrids. Human chromosome 2 was created from a fusion of two chimpanzee chromosomes and still contains central telomeres as well as a vestigial second centromere. Polyploidy, especially allopolyploidy, which occurs often in plants, can also result in reproductive incompatibilities with parental species. Agrodiatus blue butterflies have diverse chromosome numbers ranging from n=10 to n=134 and additionally have one of the highest rates of speciation identified to date.[9]

Gene content and distribution

Different organisms house different numbers of genes within their genomes as well as different patterns in the distribution of genes throughout the genome. Some organisms, such as most bacteria, Drosophila, and Arabidopsis have particularly compact genomes with little repetitive content or non-coding DNA. Other organisms, like mammals or maize, have large amounts of repetitive DNA, long introns, and substantial spacing between different genes. The content and distribution of genes within the genome can influence the rate at which certain types of mutations occur and can influence the subsequent evolution of different species. Genes with longer introns are more likely to recombine due to increased physical distance over the coding sequence. As such, long introns may facilitate ectopic recombination, and result in higher rates of new gene formation.

Organelles

In addition to the nuclear genome, endosymbiont organelles contain their own genetic material typically as circular plasmids. Mitochondrial and chloroplast DNA varies across taxa, but membrane-bound proteins, especially electron transport chain constituents are most often encoded in the organelle. Chloroplasts and mitochondria are maternally inherited in most species, as the organelles must pass through the egg. In a rare departure, some species of mussels are known to inherit mitochondria from father to son.

Origins of new genes

New genes arise from several different genetic mechanisms including gene duplication, de novo origination, retrotransposition, chimeric gene formation, recruitment of non-coding sequence, and gene truncation.

Gene duplication initially leads to redundancy. However, duplicated gene sequences can mutate to develop new functions or specialize so that the new gene performs a subset of the original ancestral functions. In addition to duplicating whole genes, sometimes only a domain or part of a protein is duplicated so that the resulting gene is an elongated version of the parental gene.

Retrotransposition creates new genes by copying mRNA to DNA and inserting it into the genome. Retrogenes often insert into new genomic locations, and often develop new expression patterns and functions.

Chimeric genes form when duplication, deletion, or incomplete retrotransposition combine portions of two different coding sequences to produce a novel gene sequence. Chimeras often cause regulatory changes and can shuffle protein domains to produce novel adaptive functions.

De novo origin. Novel genes can also arise from previously non-coding DNA.[10] For instance, Levine and colleagues reported the origin of five new genes in the D. melanogaster genome from noncoding DNA.[11][12] Similar de novo origin of genes has been also shown in other organisms such as yeast,[13] rice[14] and humans.[15] De novo genes may evolve from transcripts that are already expressed at low levels.[16] Mutation of a stop codon to a regular codon or a frameshift may cause an extended protein that includes a previously non-coding sequence.

De novo evolution of genes can also be simulated in the laboratory. Donnelly et al. have shown that semi-random gene sequences can be selected for specific functions. More specifically, they selected sequences from a library that could complement a gene deletion in E. coli. The deleted gene encodes ferric enterobactin esterase (Fes), which releases iron from an iron chelator, enterobactin. While Fes is a 400 amino acid protein, the newly selected gene was only 100 amino acids in length and unrelated in sequence to Fes.[17]

In vitro molecular evolution experiments

Principles of molecular evolution have also been discovered, and others elucidated and tested using experimentation involving amplification, variation and selection of rapidly proliferating and genetically varying molecular species outside cells. Since the pioneering work of Sol Spiegelmann in 1967 [ref], involving RNA that replicates itself with the aid of an enzyme extracted from the Qß virus [ref], several groups (such as Kramers [ref] and Biebricher/Luce/Eigen [ref]) studied mini and micro variants of this RNA in the 1970s and 1980s that replicate on the timescale of seconds to a minute, allowing hundreds of generations with large population sizes (e.g. 10^14 sequences) to be followed in a single day of experimentation. The chemical kinetic elucidation of the detailed mechanism of replication [ref, ref] meant that this type of system was the first molecular evolution system that could be fully characterised on the basis of physical chemical kinetics, later allowing the first models of the genotype to phenotype map based on sequence dependent RNA folding and refolding to be produced [ref, ref]. Subject to maintaining the function of the multicomponent Qß enzyme, chemical conditions could be varied significantly, in order to study the influence of changing environments and selection pressures [ref]. Experiments with in vitro RNA quasi species included the characterisation of the error threshold for information in molecular evolution [ref], the discovery of de novo evolution [ref] leading to diverse replicating RNA species and the discovery of spatial travelling waves as ideal molecular evolution reactors [ref, ref]. Later experiments employed novel combinations of enzymes to elucidate novel aspects of interacting molecular evolution involving population dependent fitness, including work with artificially designed molecular predator prey and cooperative systems of multiple RNA and DNA [ref, ref]. Special evolution reactors were designed for these studies, starting with serial transfer machines, flow reactors such as cell-stat machines, capillary reactors, and microreactors including line flow reactors and gel slice reactors. These studies were accompanied by theoretical developments and simulations involving RNA folding and replication kinetics that elucidated the importance of the correlation structure between distance in sequence space and fitness changes [ref], including the role of neutral networks and structural ensembles in evolutionary optimisation.

Molecular phylogenetics

Molecular systematics is the product of the traditional fields of systematics and molecular genetics. It uses DNA, RNA, or protein sequences to resolve questions in systematics, i.e. about their correct scientific classification or taxonomy from the point of view of evolutionary biology.
Molecular systematics has been made possible by the availability of techniques for DNA sequencing, which allow the determination of the exact sequence of nucleotides or bases in either DNA or RNA. At present it is still a long and expensive process to sequence the entire genome of an organism, and this has been done for only a few species. However, it is quite feasible to determine the sequence of a defined area of a particular chromosome. Typical molecular systematic analyses require the sequencing of around 1000 base pairs.

The driving forces of evolution

Depending on the relative importance assigned to the various forces of evolution, three perspectives provide evolutionary explanations for molecular evolution.[18][19]
Selectionist hypotheses argue that selection is the driving force of molecular evolution. While acknowledging that many mutations are neutral, selectionists attribute changes in the frequencies of neutral alleles to linkage disequilibrium with other loci that are under selection, rather than to random genetic drift.[20] Biases in codon usage are usually explained with reference to the ability of even weak selection to shape molecular evolution.[21]

Neutralist hypotheses emphasize the importance of mutation, purifying selection, and random genetic drift.[22] The introduction of the neutral theory by Kimura,[23] quickly followed by King and Jukes' own findings,[24] led to a fierce debate about the relevance of neodarwinism at the molecular level. The Neutral theory of molecular evolution proposes that most mutations in DNA are at locations not important to function or fitness. These neutral changes drift towards fixation within a population. Positive changes will be very rare, and so will not greatly contribute to DNA polymorphisms.[25] Deleterious mutations do not contribute much to DNA diversity because they negatively affect fitness and so are removed from the gene pool before long.[26] This theory provides a framework for the molecular clock.[25] The fate of neutral mutations are governed by genetic drift, and contribute to both nucleotide polymorphism and fixed differences between species.[27][28]

In the strictest sense, the neutral theory is not accurate.[29] Subtle changes in DNA very often have effects, but sometimes these effects are too small for natural selection to act on.[29] Even synonymous mutations are not necessarily neutral [29] because there is not a uniform amount of each codon. The nearly neutral theory expanded the neutralist perspective, suggesting that several mutations are nearly neutral, which means both random drift and natural selection is relevant to their dynamics.[29] The main difference between the neutral theory and nearly neutral theory is that the latter focuses on weak selection, not strictly neutral.[26]

Mutationists hypotheses emphasize random drift and biases in mutation patterns.[30] Sueoka was the first to propose a modern mutationist view. He proposed that the variation in GC content was not the result of positive selection, but a consequence of the GC mutational pressure.[31]

Protein evolution

This chart compares the sequence identity of different lipase
proteins throughout the human body. It demonstrates how
proteins evolve, keeping some regions conserved while others
change dramatically.

Evolution of proteins is studied by comparing the sequences and structures of proteins from many organisms representing distinct evolutionary clades. If the sequences/structures of two proteins are similar indicating that the proteins diverged from a common origin, these proteins are called as homologous proteins. More specifically, homologous proteins that exist in two distinct species are called as orthologs. Whereas, homologous proteins encoded by the genome of a single species are called paralogs.

The phylogenetic relationships of proteins are examined by multiple sequence comparisons. Phylogenetic trees of proteins can be established by the comparison of sequence identities among protoeins. Such phylogenetic trees have established that the sequence similarities among proteins reflect closely the evolutionary relationships among organisms.[32][33]

Protein evolution describes the changes over time in protein shape, function, and composition. Through quantitative analysis and experimentation, scientists have strived to understand the rate and causes of protein evolution. Using the amino acid sequences of hemoglobin and cytochrome c from multiple species, scientists were able to derive estimations of protein evolution rates. What they found was that the rates were not the same among proteins.[26] Each protein has its own rate, and that rate is constant across phylogenies (i.e., hemoglobin does not evolve at the same rate as cytochrome c, but hemoglobins from humans, mice, etc. do have comparable rates of evolution.). Not all regions within a protein mutate at the same rate; functionally important areas mutate more slowly and amino acid substitutions involving similar amino acids occurs more often than dissimilar substitutions.[26] Overall, the level of polymorphisms in proteins seems to be fairly constant. Several species (including humans, fruit flies, and mice) have similar levels of protein polymorphism.[25]

Relation to nucleic acid evolution

Protein evolution is inescapably tied to changes and selection of DNA polymorphisms and mutations because protein sequences change in response to alterations in the DNA sequence. Amino acid sequences and nucleic acid sequences do not mutate at the same rate. Due to the degenerate nature of DNA, bases can change without affecting the amino acid sequence. For example, there are six codons that code for leucine. Thus, despite the difference in mutation rates, it is essential to incorporate nucleic acid evolution into the discussion of protein evolution. At the end of the 1960s, two groups of scientists—Kimura (1968) and King and Jukes (1969)—independently proposed that a majority of the evolutionary changes observed in proteins were neutral.[25][26] Since then, the neutral theory has been expanded upon and debated.[26]

Discordance with morphological evolution

There are sometimes discordances between molecular and morphological evolution, which are reflected in molecular and morphological systematic studies, especially of bacteria, archaea and eukaryotic microbes. These discordances can be categorized as two types: (i) one morphology, multiple lineages (e.g. morphological convergence, cryptic species) and (ii) one lineage, multiple morphologies (e.g. phenotypic plasticity, multiple life-cycle stages). Neutral evolution possibly could explain the incongruences in some cases.[34]

Journals and societies

The Society for Molecular Biology and Evolution publishes the journals "Molecular Biology and Evolution" and "Genome Biology and Evolution" and holds an annual international meeting. Other journals dedicated to molecular evolution include Journal of Molecular Evolution and Molecular Phylogenetics and Evolution. Research in molecular evolution is also published in journals of genetics, molecular biology, genomics, systematics, and evolutionary biology.

Entropy (information theory)

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Entropy_(information_theory) In info...