A Medley of Potpourri

Sunday, March 8, 2020

Behavioural genetics

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Behavioural_genetics

Behavioural genetics, also referred to as behaviour genetics, is a field of scientific research that uses genetic methods to investigate the nature and origins of individual differences in behaviour. While the name "behavioural genetics" connotes a focus on genetic influences, the field broadly investigates genetic and environmental influences, using research designs that allow removal of the confounding of genes and environment. Behavioural genetics was founded as a scientific discipline by Francis Galton in the late 19th century, only to be discredited through association with eugenics movements before and during World War II. In the latter half of the 20th century, the field saw renewed prominence with research on inheritance of behaviour and mental illness in humans (typically using twin and family studies), as well as research on genetically informative model organisms through selective breeding and crosses. In the late 20th and early 21st centuries, technological advances in molecular genetics made it possible to measure and modify the genome directly. This led to major advances in model organism research (e.g., knockout mice) and in human studies (e.g., genome-wide association studies), leading to new scientific discoveries.

Findings from behavioural genetic research have broadly impacted modern understanding of the role of genetic and environmental influences on behaviour. These include evidence that nearly all researched behaviors are under a significant degree of genetic influence, and that influence tends to increase as individuals develop into adulthood. Further, most researched human behaviours are influenced by a very large number of genes and the individual effects of these genes are very small. Environmental influences also play a strong role, but they tend to make family members more different from one another, not more similar.

History

Selective breeding and the domestication of animals is perhaps the earliest evidence that humans considered the idea that individual differences in behaviour could be due to natural causes. Plato and Aristotle each speculated on the basis and mechanisms of inheritance of behavioural characteristics. Plato, for example, argued in The Republic that selective breeding among the citizenry to encourage the development of some traits and discourage others, what today might be called eugenics, was to be encouraged in the pursuit of an ideal society. Behavioural genetic concepts also existed during the English renaissance, where William Shakespeare perhaps first coined the terms "nature" versus "nurture" in The Tempest, where he wrote in Act IV, Scene I, that Caliban was "A devil, a born devil, on whose nature Nurture can never stick".

Modern-day behavioural genetics began with Sir Francis Galton, a nineteenth-century intellectual and cousin of Charles Darwin. Galton was a polymath who studied many subjects, including the heritability of human abilities and mental characteristics. One of Galton's investigations involved a large pedigree study of social and intellectual achievement in the English upper class. In 1869, 10 years after Darwin's On the Origin of Species, Galton published his results in Hereditary Genius. In this work, Galton found that the rate of "eminence" was highest among close relatives of eminent individuals, and decreased as the degree of relationship to eminent individuals decreased. While Galton could not rule out the role of environmental influences on eminence, a fact which he acknowledged, the study served to initiate an important debate about the relative roles of genes and environment on behavioural characteristics. Through his work, Galton also "introduced multivariate analysis and paved the way towards modern Bayesian statistics" that are used throughout the sciences—launching what has been dubbed the "Statistical Enlightenment".

Galton in his later years

The field of behavioural genetics, as founded by Galton, was ultimately undermined by another of Galton's intellectual contributions, the founding of the eugenics movement in 20th century society. The primary idea behind eugenics was to use selective breeding combined with knowledge about the inheritance of behaviour to improve the human species. The eugenics movement was subsequently discredited by scientific corruption and genocidal actions in Nazi Germany. Behavioural genetics was thereby discredited through its association to eugenics. The field once again gained status as a distinct scientific discipline through the publication of early texts on behavioural genetics, such as Calvin S. Hall's 1951 book chapter on behavioural genetics, in which he introduced the term "psychogenetics", which enjoyed some limited popularity in the 1960s and 1970s. However, it eventually disappeared from usage in favour of "behaviour genetics".

The start of behavior genetics as a well-identified field was marked by the publication in 1960 of the book Behavior Genetics by John L. Fuller and William Robert (Bob) Thompson. It is widely accepted now that many if not most behaviours in animals and humans are under significant genetic influence, although the extent of genetic influence for any particular trait can differ widely. A decade later, in February 1970, the first issue of the journal Behavior Genetics was published and in 1972 the Behavior Genetics Association was formed with Theodosius Dobzhansky elected as the association's first president. The field has since grown and diversified, touching many scientific disciplines.

Methods

The primary goal of behavioural genetics is to investigate the nature and origins of individual differences in behaviour. A wide variety of different methodological approaches are used in behavioral genetic research, only a few of which are outlined below.

Animal studies

Animal behavior genetic studies are considered more reliable than are studies on humans, because animal experiments allow for more variables to be manipulated in the laboratory. In animal research selection experiments have often been employed. For example, laboratory house mice have been bred for open-field behaviour, thermoregulatory nesting, and voluntary wheel-running behaviour. A range of methods in these designs are covered on those pages. Behavioural geneticists using model organisms employ a range of molecular techniques to alter, insert, or delete genes. These techniques include knockouts, floxing, gene knockdown, or genome editing using methods like CRISPR-Cas9. These techniques allow behavioural geneticists different levels of control in the model organism's genome, to evaluate the molecular, physiological, or behavioural outcome of genetic changes. Animals commonly used as model organisms in behavioral genetics include mice, zebra fish, and the nematode species C. elegans.

Twin and family studies

Pedigree chart showing an inheritance pattern consistent with autosomal dominant transmission. Behavioural geneticists have used pedigree studies to investigate the genetic and environmental basis of behaviour.

Some research designs used in behavioural genetic research are variations on family designs (also known as pedigree designs), including twin studies and adoption studies. Quantitative genetic modelling of individuals with known genetic relationships (e.g., parent-child, sibling, dizygotic and monozygotic twins) allows one to estimate to what extent genes and environment contribute to phenotypic differences among individuals. The basic intuition of the twin study is that monozygotic twins share 100% of their genome and dizygotic twins share, on average, 50% of their segregating genome. Thus, differences between the two members of a monozygotic twin pair can only be due to differences in their environment, whereas dizygotic twins will differ from one another due to environment as well as genes. Under this simplistic model, if dizygotic twins differ more than monozygotic twins it can only be attributable to genetic influences. An important assumption of the twin model is the equal environment assumption that monozygotic twins have the same shared environmental experiences as dizygotic twins. If, for example, monozygotic twins tend to have more similar experiences than dizygotic twins—and these experiences themselves are not genetically mediated through gene-environment correlation mechanisms—then monozygotic twins will tend to be more similar to one another than dizygotic twins for reasons that have nothing to do with genes.

Twin studies of monozygotic and dizygotic twins use a biometrical formulation to describe the influences on twin similarity and to infer heritability. The formulation rests on the basic observation that the variance in a phenotype is due to two sources, genes and environment. More formally,

Var(P)=g+(g\times \epsilon )+\epsilon

, where

P

is the phenotype,

g

is the effect of genes,

\epsilon

is the effect of the environment, and

(g\times \epsilon )

is a gene by environment interaction. The

g

term can be expanded to include additive (

a^{2}

), dominance (

d^{2}

), and epistatic (

i^{2}

) genetic effects. Similarly, the environmental term

\epsilon

can be expanded to include shared environment (

c^{2}

) and non-shared environment (

e^{2}

), which includes any measurement error. Dropping the gene by environment interaction for simplicity (typical in twin studies) and fully decomposing the

g

and

\epsilon

terms, we now have

Var(P)=(a^{2}+d^{2}+i^{2})+(c^{2}+e^{2})

. Twin research then models the similarity in monozygotic twins and dizogotic twins using simplified forms of this decomposition, shown in the table.

Decomposing the genetic and environmental contributions to twin similarity.
Type of relationship	Full decomposition	Falconer's decomposition
Perfect similarity between siblings	$1.0=a^{2}+d^{2}+i^{2}+c^{2}+e^{2}$	$1.0=a^{2}+c^{2}+e^{2}$
Monozygotic twin correlation( $r_{MZ}$ )	$r_{MZ}=a^{2}+d^{2}+i^{2}+c^{2}$	$r_{MZ}=a^{2}+c^{2}$
Dizygotic twin correlation ( $r_{DZ}$ )	$r_{DZ}={\frac {1}{2}}a^{2}+{\frac {1}{4}}d^{2}+(k)i^{2}+c^{2}$	$r_{DZ}={\frac {1}{2}}a^{2}+c^{2}$

Where $k$ is an unknown (probably very small) quantity.

The simplified Falconer formulation can then be used to derive estimates of

a^{2}

c^{2}

, and

e^{2}

. Rearranging and substituting the

r_{MZ}

and

r_{DZ}

equations one can obtain an estimate of the additive genetic variance, or heritability,

a^{2}=2(r_{MZ}-r_{DZ})

, the non-shared environmental effect

e^{2}=1-r_{MZ}

and, finally, the shared environmental effect

c^{2}=r_{MZ}-a^{2}

. The Falconer formulation is presented here to illustrate how the twin model works. Modern approaches use maximum likelihood to estimate the genetic and environmental variance components.

Measured genetic variants

The Human Genome Project has allowed scientists to directly genotype the sequence of human DNA nucleotides. Once genotyped, genetic variants can be tested for association with a behavioural phenotype, such as mental disorder, cognitive ability, personality, and so on.

Candidate Genes. One popular approach has been to test for association candidate genes with behavioural phenotypes, where the candidate gene is selected based on some a priori theory about biological mechanisms involved in the manifestation of a behavioural trait or phenotype. In general, such studies have proven difficult to broadly replicate and there has been concern raised that the false positive rate in this type of research is high.
Genome-wide association studies. In genome-wide association studies, researchers test the relationship of millions of genetic polymorphisms with behavioural phenotypes across the genome. This approach to genetic association studies is largely atheoretical, and typically not guided by a particular biological hypothesis regarding the phenotype. Genetic association findings for behavioural traits and psychiatric disorders have been found to be highly polygenic (involving many small genetic effects).
SNP heritability and co-heritability. Recently, researchers have begun to use similarity between classically unrelated people at their measured single nucleotide polymorphisms (SNPs) to estimate genetic variation or covariation that is tagged by SNPs, using mixed effects models implemented in software such as Genome-wide complex trait analysis (GCTA). To do this, researchers find the average genetic relatedness over all SNPs between all individuals in a (typically large) sample, and use Haseman-Elston regression or restricted maximum likelihood to estimate the genetic variation that is "tagged" by, or predicted by, the SNPs. The proportion of phenotypic variation that is accounted for by the genetic relatedness has been called "SNP heritability". Intuitively, SNP heritability increases to the degree that phenotypic similarity is predicted by genetic similarity at measured SNPs, and is expected to be lower than the true narrow-sense heritability to the degree that measured SNPs fail to tag (typically rare) causal variants. The value of this method is that it is an independent way to estimate heritability that does not require the same assumptions as those in twin and family studies, and that it gives insight into the allelic frequency spectrum of the causal variants underlying trait variation.

Quasi-experimental designs

Some behavioural genetic designs are useful not to understand genetic influences on behaviour, but to control for genetic influences to test environmentally-mediated influences on behaviour. Such behavioural genetic designs may be considered a subset of natural experiments, quasi-experiments that attempt to take advantage of naturally occurring situations that mimic true experiments by providing some control over an independent variable. Natural experiments can be particularly useful when experiments are infeasible, due to practical or ethical limitations.

A general limitation of observational studies is that the relative influences of genes and environment are confounded. A simple demonstration of this fact is that measures of 'environmental' influence are heritable. Thus, observing a correlation between an environmental risk factor and a health outcome is not necessarily evidence for environmental influence on the health outcome. Similarly, in observational studies of parent-child behavioural transmission, for example, it is impossible to know if the transmission is due to genetic or environmental influences, due to the problem of passive gene-environment correlation. The simple observation that the children of parents who use drugs are more likely to use drugs as adults does not indicate why the children are more likely to use drugs when they grow up. It could be because the children are modelling their parents' behaviour. Equally plausible, it could be that the children inherited drug-use-predisposing genes from their parent, which put them at increased risk for drug use as adults regardless of their parents' behaviour. Adoption studies, which parse the relative effects of rearing environment and genetic inheritance, find a small to negligible effect of rearing environment on smoking, alcohol, and marijuana use in adopted children, but a larger effect of rearing environment on harder drug use.

Other behavioural genetic designs include discordant twin studies, children of twins designs, and Mendelian randomization.

General findings

There are many broad conclusions to be drawn from behavioural genetic research about the nature and origins of behaviour. Three major conclusions include: 1) all behavioural traits and disorders are influenced by genes; 2) environmental influences tend to make members of the same family more different, rather than more similar; and 3) the influence of genes tends to increase in relative importance as individuals age.

Genetic influences on behaviour are pervasive

It is clear from multiple lines of evidence that all researched behavioural traits and disorders are influenced by genes; that is, they are heritable. The single largest source of evidence comes from twin studies, where it is routinely observed that monozygotic (identical) twins are more similar to one another than are same-sex dizygotic (fraternal) twins.

The conclusion that genetic influences are pervasive has also been observed in research designs that do not depend on the assumptions of the twin method. Adoption studies show that adoptees are routinely more similar to their biological relatives than their adoptive relatives for a wide variety of traits and disorders. In the Minnesota Study of Twins Reared Apart, monozygotic twins separated shortly after birth were reunited in adulthood. These adopted, reared-apart twins were as similar to one another as were twins reared together on a wide range of measures including general cognitive ability, personality, religious attitudes, and vocational interests, among others. Approaches using genome-wide genotyping have allowed researchers to measure genetic relatedness between individuals and estimate heritability based on millions of genetic variants. Methods exist to test whether the extent of genetic similarity (aka, relatedness) between nominally unrelated individuals (individuals who are not close or even distant relatives) is associated with phenotypic similarity. Such methods do not rely on the same assumptions as twin or adoption studies, and routinely find evidence for heritability of behavioural traits and disorders.

Nature of environmental influence

Just as all researched human behavioural phenotypes are influenced by genes (i.e., are heritable), all such phenotypes are also influenced by the environment. The basic fact that monozygotic twins are genetically identical but are never perfectly concordant for psychiatric disorder or perfectly correlated for behavioural traits, indicates that the environment shapes human behaviour.

The nature of this environmental influence, however, is such that it tends to make individuals in the same family more different from one another, not more similar to one another. That is, estimates of shared environmental effects (

c^{2}

) in human studies are small, negligible, or zero for the vast majority of behavioural traits and psychiatric disorders, whereas estimates of non-shared environmental effects (

e^{2}

) are moderate to large. From twin studies

c^{2}

is typically estimated at 0 because the correlation (

r_{MZ}

) between monozygotic twins is at least twice the correlation (

r_{DZ}

) for dizygotic twins. When using the Falconer variance decomposition (

1.0=a^{2}+c^{2}+e^{2}

) this difference between monozygotic and dizygotic twin similarity results in an estimated

c^{2}=0

. It is important to note that the Falconer decomposition is simplistic. It removes the possible influence of dominance and epistatic effects which, if present, will tend to make monozygotic twins more similar than dizygotic twins and mask the influence of shared environmental effects. This is a limitation of the twin design for estimating

c^{2}

. However, the general conclusion that shared environmental effects are negligible does not rest on twin studies alone. Adoption research also fails to find large (

c^{2}

) components; that is, adoptive parents and their adopted children tend to show much less resemblance to one another than the adopted child and his or her non-rearing biological parent. In studies of adoptive families with at least one biological child and one adopted child, the sibling resemblance also tends be nearly zero for most traits that have been studied.

Similarity in twins and adoptees indicates a small role for shared environment in personality.

The figure provides an example from personality research, where twin and adoption studies converge on the conclusion of zero to small influences of shared environment on broad personality traits measured by the Multidimensional Personality Questionnaire including positive emotionality, negative emotionality, and constraint.

Given the conclusion that all researched behavioural traits and psychiatric disorders are heritable, biological siblings will always tend to be more similar to one another than will adopted siblings. However, for some traits, especially when measured during adolescence, adopted siblings do show some significant similarity (e.g., correlations of .20) to one another. Traits that have been demonstrated to have significant shared environmental influences include internalizing and externalizing psychopathology, substance use and dependence, and intelligence.

Nature of genetic influence

Genetic effects on human behavioural outcomes can be described in multiple ways. One way to describe the effect is in terms of how much variance in the behaviour can be accounted for by alleles in the genetic variant, otherwise known as the coefficient of determination or

R^{2}

. An intuitive way to think about

R^{2}

is that it describes the extent to which the genetic variant makes individuals, who harbour different alleles, different from one another on the behavioural outcome. A complementary way to describe effects of individual genetic variants is in how much change one expects on the behavioural outcome given a change in the number of risk alleles an individual harbours, often denoted by the Greek letter

\beta

(denoting the slope in a regression equation), or, in the case of binary disease outcomes by the odds ratio

OR

of disease given allele status. Note the difference:

R^{2}

describes the population-level effect of alleles within a genetic variant;

\beta

OR

describe the effect of having a risk allele on the individual who harbours it, relative to an individual who does not harbour a risk allele.

When described on the

R^{2}

metric, the effects of individual genetic variants on complex human behavioural traits and disorders are vanishingly small, with each variant accounting for

{\displaystyle R^{2}<0 .3="" annotation="">

of variation in the phenotype. This fact has been discovered primarily through genome-wide association studies of complex behavioural phenotypes, including results on substance use, personality, fertility, schizophrenia, depression, and endophenotypes including brain structure and function. There are a small handful of replicated and robustly studied exceptions to this rule, including the effect of APOE on Alzheimer's disease, and CHRNA5 on smoking behaviour, and ALDH2 (in individuals of East Asian ancestry) on alcohol use.

On the other hand, when assessing effects according to the

\beta

metric, there are a large number of genetic variants that have very large effects on complex behavioural phenotypes. The risk alleles within such variants are exceedingly rare, such that their large behavioural effects impact only a small number of individuals. Thus, when assessed at a population level using the

R^{2}

metric, they account for only a small amount of the differences in risk between individuals in the population. Examples include variants within APP that result in familial forms of severe early onset Alzheimer's disease but affect only relatively few individuals. Compare this to risk alleles within APOE, which pose much smaller risk compared to APP, but are far more common and therefore affect a much greater proportion of the population.

Finally, there are classical behavioural disorders that are genetically simple in their etiology, such as Huntington's disease. Huntington's is caused by a single autosomal dominant variant in the HTT gene, which is the only variant that accounts for any differences among individuals in their risk for developing the disease, assuming they live long enough. In the case of genetically simple and rare diseases such as Huntington's, the variant

R^{2}

and the

OR

are simultaneously large.

Additional general findings

In response to general concerns about the replicability of psychological research, behavioral geneticists Robert Plomin, John C. DeFries, Valerie Knopik, and Jenae Neiderhiser published a review of the ten most well-replicated findings from behavioral genetics research. The ten findings were:

"All psychological traits show significant and substantial genetic influence."
"No traits are 100% heritable."
"Heritability is caused by many genes of small effect."
"Phenotypic correlations between psychological traits show significant and substantial genetic mediation."
"The heritability of intelligence increases throughout development."
"Age-to-age stability is mainly due to genetics."
"Most measures of the 'environment' show significant genetic influence."
"Most associations between environmental measures and psychological traits are significantly mediated genetically."
"Most environmental effects are not shared by children growing up in the same family."
"Abnormal is normal."

Criticisms and controversies

Behavioural genetic research and findings have at times been controversial. Some of this controversy has arisen because behavioural genetic findings can challenge societal beliefs about the nature of human behaviour and abilities. Major areas of controversy have included genetic research on topics such as racial differences, intelligence, violence, and human sexuality. Other controversies have arisen due to misunderstandings of behavioural genetic research, whether by the lay public or the researchers themselves. For example, the notion of heritability is easily misunderstood to imply causality, or that some behavior or condition is determined by one's genetic endowment. When behavioral genetics researchers say that a behavior is X% heritable, that does not mean that genetics causes, determines, or fixes up to X% of the behavior. Instead, heritability is a statement about population level correlations.

Historically, perhaps the most controversial subject has been on race and genetics, where fringe research groups have claimed that observed racial differences on a behavioral trait are a product of racial differences in allele frequencies. Such claims are made most frequently to differences between White and Black racial groups. These are complicated issues that are extremely difficult to resolve due to the confounding of the racial group and environmental experience, such as discrimination and oppression. Indeed, race is a social construct that is not very useful for genetic research. Instead, geneticists use concepts such as ancestry, which is more rigorously defined. For example, a so-called "Black" race may include all individuals of relatively recent African descent ("recent" because all humans are descended from African ancestors). However, there is more genetic diversity in Africa than the rest of the world combined, so speaking of a "Black" race is without a precise genetic meaning.

Qualitative research has fostered arguments that behavioural genetics is an ungovernable field without scientific norms or consensus, which fosters controversy. The argument continues that this state of affairs has led to controversies including race, intelligence, instances where variation within a single gene was found to very strongly influence a controversial phenotype (e.g., the "gay gene" controversy), and others. This argument further states that because of the persistence of controversy in behavior genetics and the failure of disputes to be resolved, behavior genetics does not conform to the standards of good science.

The scientific assumptions on which parts of behavioral genetic research are based have also been criticized as flawed. Genome wide association studies are often implemented with simplifying statistical assumptions, such as additivity, which may be statistically robust but unrealistic for some behaviors. Critics further contend that, in humans, behavior genetics represents a misguided form of genetic reductionism based on inaccurate interpretations of statistical analyses. Studies comparing monozygotic (MZ) and dizygotic (DZ) twins assume that environmental influences will be the same in both types of twins, but this assumption may also be unrealistic. In reality MZ twins are treated more alike than DZ twins, which itself may be an example of evocative gene-environment correlation, suggesting that one's genes influence their treatment by others. It is also not possible in twin studies to completely eliminate effects of the shared womb environment, although studies comparing twins who experience monochorionic and dichorionic environments in utero do exist, and indicate limited impact. Studies of twins separated in early life include children who were separated not at birth but part way through childhood. The effect of early rearing environment can therefore be evaluated to some extent in such a study, by comparing twin similarity for those twins separated early and those separated later.

Pan-genome

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Pan-genome

In the fields of molecular biology and genetics, a pan-genome (or supragenome) is the entire set of genes for all strains within a clade. The pan-genome includes: the core genome containing genes present in all strains within the clade, the accessory genome containing 'dispensable' genes present in a subset of the strains, and strain-specific genes. The study of the pan-genome is called pangenomics.

Some species have open (or extensive) pan-genomes, while others have closed pan-genomes. For species with a closed pan-genome, very few genes are added per sequenced genome (after sequencing many strains), and the size of the full pan-genome can be theoretically predicted. Species with an open pan-genome have enough genes added per additional sequenced genome that predicting the size of the full pan-genome is impossible. Population size and niche versatility have been suggested as the most influential factors in determining pan-genome size.

Pan-genomes were originally constructed for species of bacteria and archaea, but more recently eukaryotic pan-genomes have been developed, particularly for plant species. Plant studies have shown that pan-genome dynamics are linked to transposable elements. The significance of the pan-genome arises in an evolutionary context, especially with relevance to metagenomics, but is also used in a broader genomics context.

History

Etymology

The term ‘pan-genome’ was defined with its current meaning by Tettelin et al. in 2005; it derives 'pan' from the Greek word παν, meaning 'whole' or 'everything', while genome is a commonly used term to describe an organism's complete genetic material. Tettelin et al. applied the term specifically to bacteria, whose pan-genome "includes a core genome containing genes present in all strains and a dispensable genome composed of genes absent from one or more strains and genes that are unique to each strain."

Original concept

The S. pneumoniae pan-genome. (a) Number of new genes as a function of the number of sequenced genomes. The predicted number of new genes drops sharply to zero when the number of genomes exceeds 50. (b) Number of core genes as a function of the number of sequenced genomes. The number of core genes converges to 1,647 for number of genomes n→∞. From Donati et al.

The original pan-genome concept was developed by Tettelin et al. when they sequenced six strains of Streptococcus agalactiae which could be described as a core genome shared by all isolates, accounting for approximately 80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Extrapolation suggested that the gene reservoir in the S. agalactiae pan-genome is vast and that new unique genes would continue to be identified even after sequencing hundreds of genomes.

Examples

A similar pattern was found in Streptococcus pneumoniae when 44 strains were sequenced (see figure). With each new genome sequenced fewer new genes were discovered. In fact, the predicted number of new genes dropped to zero when the number of genomes exceeds 50 (note, however, that this is not a pattern found in all species). This would mean that S. pneumoniae has a 'closed pan-genome'. The main source of new genes in S. pneumoniae was Streptococcus mitis from which genes were transferred horizontally. The pan-genome size of S. pneumoniae increased logarithmically with the number of strains and linearly with the number of polymorphic sites of the sampled genomes, suggesting that acquired genes accumulate proportionately to the age of clones.

Another example for the latter can be seen in a comparison of the sizes of the core and the pan-genome of Prochlorococcus. The core genome set is logically much smaller than the pan-genome, which is used by different ecotypes of Prochlorococcus. A 2015 study on Prevotella bacteria isolated from humans, compared the gene repertoires of its species derived from different body sites of human. It also reported an open pan- genome showing vast diversity of gene pool.

Software tools

As interest in pan-genomes increased, there have been a number of software tools developed to help analyze this kind of data. In 2015, a group reviewed the different kinds of analyses and tools a researcher may have available. There are seven kinds of analyses software developed to analyze pangenomes: cluster homologous genes; identify SNPs; plot pangenomic profiles; build phylogenetic relationships of orthologous genes/families of strains/isolates; function-based searching; annotation and/or curation; and visualizations.

The two most cited software tools at the end of 2014 were Panseq and the pan-genomes analysis pipeline (PGAP). Other options include BPGA – A Pan-Genome Analysis Pipeline for prokaryotic genomes, GET_HOMOLOGUES , Roary and PanDelos.

A review focused on plant pan-genomes was published in 2015. The first software designed for plant pan-genomes was GET_HOMOLOGUES-EST.

Omics

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Omics

Diagram illustrating genomics

The English-language neologism omics informally refers to a field of study in biology ending in -omics, such as genomics, proteomics or metabolomics. Omics aims at the collective characterization and quantification of pools of biological molecules that translate into the structure, function, and dynamics of an organism or organisms.

The related suffix -ome is used to address the objects of study of such fields, such as the genome, proteome or metabolome respectively. The suffix -ome as used in molecular biology refers to a totality of some sort; it is an example of a "neo-suffix" formed by abstraction from various Greek terms in -ωμα, a sequence that does not form an identifiable suffix in Greek.

Functional genomics aims at identifying the functions of as many genes as possible of a given organism. It combines different -omics techniques such as transcriptomics and proteomics with saturated mutant collections.

Origin

"Omicum": Building of the Estonian Biocentre which houses the Estonian Genome Centre and Institute of Molecular and Cell Biology at the University of Tartu in Tartu, Estonia.

The Oxford English Dictionary (OED) distinguishes three different fields of application for the -ome suffix:

in medicine, forming nouns with the sense "swelling, tumour"
in botany or zoology, forming nouns in the sense "a part of an animal or plant with a specified structure"
in cellular and molecular biology, forming nouns with the sense "all constituents considered collectively"

The -ome suffix originated as a variant of -oma, and became productive in the last quarter of the 19th century. It originally appeared in terms like sclerome or rhizome. All of these terms derive from Greek words in -ωμα, a sequence that is not a single suffix, but analyzable as -ω-μα, the -ω- belonging to the word stem (usually a verb) and the -μα being a genuine Greek suffix forming abstract nouns.

The OED suggests that its third definition originated as a back-formation from mitome, Early attestations include biome (1916) and genome (first coined as German Genom in 1920).

The association with chromosome in molecular biology is by false etymology. The word chromosome derives from the Greek stems χρωμ(ατ)- "colour" and σωμ(ατ)- "body". While σωμα "body" genuinely contains the -μα suffix, the preceding -ω- is not a stem-forming suffix but part of the word's root. Because genome refers to the complete genetic makeup of an organism, a neo-suffix -ome suggested itself as referring to "wholeness" or "completion".

Bioinformaticians and molecular biologists figured amongst the first scientists to apply the "-ome" suffix widely. Early advocates included bioinformaticians in Cambridge, UK, where there were many early bioinformatics labs such as the MRC centre, Sanger centre, and EBI (European Bioinformatics Institute). For example, the MRC centre carried out the first genome and proteome projects.

Kinds of omics studies

Genomics

Genomics: Study of the genomes of organisms.
- Cognitive genomics: Study of the changes in cognitive processes associated with genetic profiles.
- Comparative genomics: Study of the relationship of genome structure and function across different biological species or strains.
- Functional genomics: Describes gene and protein functions and interactions (often uses transcriptomics).
- Metagenomics: Study of metagenomes, i.e., genetic material recovered directly from environmental samples.
- Neurogenomics: Study of genetic influences on the development and function of the nervous system.
- Pangenomics: Study of the entire collection of genes or genomes found within a given species.
- Personal genomics: Branch of genomics concerned with the sequencing and analysis of the genome of an individual. Once the genotypes are known, the individual's genotype can be compared with the published literature to determine likelihood of trait expression and disease risk. Helps in Personalized Medicine

Epigenomics

The epigenome is the supporting structure of genome, including protein and RNA binders, alternative DNA structures, and chemical modifications on DNA.

Epigenomics: Modern technologies include chromosome conformation by Hi-C, various ChIP-seq and other sequencing methods combined with proteomic fractionations, and sequencing methods that find chemical modification of cytosines, like bisulfite sequencing.
Nucleomics: Study of the complete set of genomic components which form "the cell nucleus as a complex, dynamic biological system, referred to as the nucleome". The 4D Nucleome Consortium officially joined the IHEC (International Human Epigenome Consortium) in 2017.

Lipidomics

Lipidome is the entire complement of cellular lipids, including the modifications made to a particular set of lipids, produced by an organism or system.

Lipidomics: Large-scale study of pathways and networks of lipids. Mass spectrometry techniques are used.

Proteomics

Proteome is the entire complement of proteins, including the modifications made to a particular set of proteins, produced by an organism or system.

Proteomics: Large-scale study of proteins, particularly their structures and functions. Mass spectrometry techniques are used.
- Immunoproteomics: study of large sets of proteins (proteomics) involved in the immune response
- Nutriproteomics: Identifying the molecular targets of nutritive and non-nutritive components of the diet. Uses proteomics mass spectrometry data for protein expression studies
- Proteogenomics: An emerging field of biological research at the intersection of proteomics and genomics. Proteomics data used for gene annotations.
- Structural genomics: Study of 3-dimensional structure of every protein encoded by a given genome using a combination of experimental and modeling approaches.

Glycomics

Glycomics is the comprehensive study of the glycome i.e. sugars and carbohydrates.

Foodomics

Foodomics was defined in 2009 as "a discipline that studies the Food and Nutrition domains through the application and integration of advanced -omics technologies to improve consumer's well-being, health, and knowledge"

Transcriptomics

Transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non-coding RNA, produced in one or a population of cells.

Transcriptomics: Study of transcriptomes, their structures and functions.

Metabolism

Metabolomics: Scientific study of chemical processes involving metabolites. It is a "systematic study of the unique chemical fingerprints that specific cellular processes leave behind", the study of their small-molecule metabolite profiles
Metabonomics: The quantitative measurement of the dynamic multiparametric metabolic response of living systems to pathophysiological stimuli or genetic modification

Nutrition, pharmacology, and toxicology

Nutritional genomics: A science studying the relationship between human genome, nutrition and health.
- Nutrigenetics studies the effect of genetic variations on the interaction between diet and health with implications to susceptible subgroups
- Nutrigenomics: Study of the effects of foods and food constituents on gene expression. Studies the effect of nutrients on the genome, proteome, and metabolome
Pharmacogenomics investigates the effect of the sum of variations within the human genome on drugs;
Pharmacomicrobiomics investigates the effect of variations within the human microbiome on drugs and vice versa.
Toxicogenomics: a field of science that deals with the collection, interpretation, and storage of information about gene and protein activity within particular cell or tissue of an organism in response to toxic substances.

Culture

Inspired by foundational questions in evolutionary biology, a Harvard team around Jean-Baptiste Michel and Erez Lieberman Aiden created the American neologism culturomics for the application of big data collection and analysis to cultural studies.

Miscellaneous

Mitointeractome
Psychogenomics: Process of applying the powerful tools of genomics and proteomics to achieve a better understanding of the biological substrates of normal behavior and of diseases of the brain that manifest themselves as behavioral abnormalities. Applying psychogenomics to the study of drug addiction, the ultimate goal is to develop more effective treatments for these disorders as well as objective diagnostic tools, preventive measures, and eventually cures.
Stem cell genomics: Helps in stem cell biology. Aim is to establish stem cells as a leading model system for understanding human biology and disease states and ultimately to accelerate progress toward clinical translation.
Connectomics: The study of the connectome, the totality of the neural connections in the brain.
Microbiomics: the study of the genomes of the communities of microorganisms that live in the digestive tracts of animals.
Cellomics: Is the quantitative cell analysis and study using bioimaging methods and bioinformatics.
Tomomics: A combination of tomography and omics methods to understand tissue or cell biochemistry at high spatial resolution, typically using imaging mass spectrometry data.
Ethomics: Is the high-throughput machine measurement of animal behaviour.
Videomics (or vide-omics): A video analysis paradigm inspired by genomics principles, where a continuous image sequence (or video) can be interpreted as the capture of a single image evolving through time through mutations revealing ‘a scene’.
Multiomics: Integration of different omics in a single study or analysis pipeline.

Unrelated words in -omics

The word “comic” does not use the "omics" suffix; it derives from Greek “κωμ(ο)-” (merriment) + “-ικ(ο)-” (an adjectival suffix), rather than presenting a truncation of “σωμ(ατ)-”.

Similarly, the word “economy” is assembled from Greek “οικ(ο)-” (household) + “νομ(ο)-” (law or custom), and “economic(s)” from “οικ(ο)-” + “νομ(ο)-” + “-ικ(ο)-”. The suffix -omics is sometimes used to create names for schools of economics, such as Reaganomics.

Current usage

Many “omes” beyond the original “genome” have become useful and have been widely adopted by research scientists. “Proteomics” has become well-established as a term for studying proteins at a large scale. "Omes" can provide an easy shorthand to encapsulate a field; for example, an interactomics study is clearly recognisable as relating to large-scale analyses of gene-gene, protein-protein, or protein-ligand interactions. Researchers are rapidly taking up omes and omics, as shown by the explosion of the use of these terms in PubMed since the mid '90s.

Search This Blog

Sunday, March 8, 2020

Behavioural genetics

History

Methods

Animal studies

Twin and family studies

Measured genetic variants

Quasi-experimental designs

General findings

Genetic influences on behaviour are pervasive

Nature of environmental influence

Nature of genetic influence

Additional general findings

Criticisms and controversies

Pan-genome

History

Etymology

Original concept

Examples

Software tools

Omics

Origin

Kinds of omics studies

Genomics

Epigenomics

Lipidomics

Proteomics

Glycomics

Foodomics

Transcriptomics

Metabolism

Nutrition, pharmacology, and toxicology

Culture

Miscellaneous

Unrelated words in -omics

Current usage

Prime number