Search This Blog

Thursday, December 10, 2020

Ancient pathogen genomics

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Ancient pathogen genomics is a scientific field related to the study of pathogen genomes recovered from ancient human, plant or animal remains. Ancient pathogens are microorganisms, now extinct, that in the past centuries caused several epidemics and deaths worldwide. Their genome, to which we referred as ancient DNA (aDNA), is isolated from the burial's remains (bones and teeth) of victims of the pandemics caused by these pathogens.

The analysis of the genomic features of ancient pathogen genomes allows researchers to understand the evolution of modern microbial strains that can hypothetically generate new pandemics or outbreaks. The analysis of aDNA is carried out by bioinformatic tools and molecular biology techniques to compare ancient pathogens with the modern descendants. The comparison also provides phylogenetic information of these strains.

Reconstructing ancient pathogen genomes through NGS technologies

Pathogen DNA detection in ancient remains can be achieved with laboratory or computational methods. In both cases, the procedure starts with the extraction of DNA from ancient specimens. The laboratory methods are based on the construction of NGS libraries and the subsequent capture-based screening. Computational tools are used to map the reads obtained by NGS against a single- or multi-genome reference (targeted approach); alternatively, metagenomic profiling or taxonomic assignment of shotgun NGS reads methods can be applied (broad approach).

Isolating ancient DNA

The limited preservation and thus low abundance, the highly fragmented and damaged state and the presence of modern DNA contamination and environmental DNA background makes the retrieval of ancient DNA (aDNA) a challenging procedure.

In order to efficiently recover aDNA, DNA is generally isolated from tissues that contain a high quantity of aDNA, like bone and teeth, which are abundant in archaeological record. The preservation of pathogens across different anatomical elements is very variable according to the type of pathogen and its tissue tropism, its route of entry into the body and the resulting disease. Pathogens that cause chronic infections in their hosts typically produce diagnostic bone changes as opposed to acute blood-borne infections. Therefore, for that infections that have caused the death of the host in the acute phase, the preferred sampling material is the inner chamber of the teeth since this is a tissue that is highly vascularized during life.

aDNA is characterised by damages that are accumulated over the course of time: the evaluation of DNA 'damage pattern' through computational tools is useful to authenticate ancient pathogen DNA since the same pattern is not found in modern contaminants.

The most represented chemical damage that affects the DNA post-mortem is the hydrolytic deamination of cytosines, converting them in uracils, which are then read as thymines. Due to this reaction, ancient DNA contains an unexpected proportion of cytosine to thymine transitions, in particular at the ends of the molecules. Other common DNA modifications, besides the deamination of cytosine into thymine (this occurs when cytosines were methylated), is the presence of abasic sites and single-strand breaks.

aDNA is extensively fragmented (most of the fragments are less than 100 base pairs long): this tendency can be used as a quantitative measure of authenticity, as modern contaminant molecules are expected to be longer. To exploit this characteristic feature of ancient DNA, improved silica-based extraction protocols with modified volume and composition of the DNA-binding buffer were introduced.

Construction of DNA libraries

In order to be sequenced with second generation sequencing methods, template molecules have to be modified through ligation of adaptors. Both the steps of library construction and the PCR amplification that follows are subject to errors. In particular, adaptor binding biases can occur and the relative efficacy of PCR enzymes in amplifying the construct can be variable.

There are three most common types of aDNA libraries. The double-stranded DNA library uses double-stranded DNA templates and firstly requires a step for the repair of the ends of aDNA fragments. Then, fragments are ligated to double-stranded adaptors and the resultant nicks are filled in. This method has some limitations, like the presence of a fraction of constructs that do not contain both the different adaptors and the possible formation of adaptor dimers.

To overcome this latter problem, a method for the construction of an A-tailed library was developed. In this method, aDNA is end-repaired and then an adenine residue is added to the 3' ends of the strands, which can facilitate the ligation of the template with adaptors that contain a tailor of thymine. Furthermore, the use of these T-tailed adaptors prevents the formation of adaptor dimers. The type of adaptor that is typically used is double-stranded and has a Y shape, which means that it has a region located at the T-tailed end where it is complementary and a region at the other end where it is non-complementary. The use of this type of adaptors allows to generate a template of aDNA flanked by different non-complementary adaptor sequences at each end that are useful for the unidirectional sequencing.

Another strategy is based on the use of single-stranded DNA libraries. In this method, DNA is first denatured to generate a single strand through heat and then ligated to a single-stranded biotinylated adaptor. The DNA strand is then used as a template by a DNA-polymerase which produces the complementary strand. Subsequently, a second adaptor is ligated at the 3' end of the complementary strand and the full construct is amplified through PCR and then sequenced. The purification step is performed using streptavidin-coated paramagnetic beads which allow minimising the DNA loss during this phase of the procedure.

Enriching libraries for aDNA

Different methods (called enrichment methods) have been developed to improve accessibility to endogenous DNA in ancient remains. These approaches can mainly be divided into three types: those used during library construction, by preferentially incorporating aDNA fragments characterised by the high level of damage, those applied after library construction, by separating exogenous and endogenous fractions through annealing to pre-defined sets of probes (in solution or on microarrays), or those based on targeted digestion of environmental microbial DNA using restriction enzymes and primer extension capture (PEC).

Selective uracil enrichment

During the construction of the library, the ssDNA fragments are bound through a biotinylated adaptor to streptavidin-coated beads. In the polymerase extension step, the DNA strand complementary to the original template is generated. In this kind of enrichment, the constructs undergo phosphorylation at the 5' end, to enable the ligation of a non-phosphorylated adaptor (ligation between the 3' end of the adaptor and the 5' end of the newly synthesized strand). DNA is then treated with uracil DNA glycosylase (UDG) and endonuclease VIII (USER mix): UDG generates abasic sites at cytosine that were deaminated into uracils post-mortem, endo VIII cuts at the resulting abasic site. This cleavage generates new 3' termini, which are then dephosphorylated, resulting in 3'OH ends that can be used as starting points for a new step of extension. This results in the elongation of the damaged strand, from the damaged region towards the bounded bead: while the new DNA molecule is synthesised, the original fragment is displaced. As a result, the dsDNA molecules newly formed no longer contain the adaptor bound to the beads, leaving in the supernatant a dsDNA library of the strands that originally harboured deaminated cytosines, available for further amplification and sequencing. The undamaged DNA template fraction remains attached to the paramagnetic beads.

Extension-free target enrichment in solution

This approach is based on in solution target-probe hybridization to screen for only a single microorganism, after the construction of the library. It is a species-specific assay that requires heat denaturation of DNA libraries and the construction of a probe DNA library using long-range PCR if fresh DNA material from closely related species is available, or through custom design and synthesis of oligonucleotides. This method is useful when the microorganism to target is known, for example, when the hypothesis exists for the causative agent of an epidemic or in presence of skeletal lesions in the studied individuals.

Solid-phase target enrichment

Another enriching strategy applied after constructing the library is the direct application of microarrays. They are applied for a wide laboratory-based pathogen screening that searches simultaneously for various pathogenic microorganisms. This kind of approach is favourable for those pathogens that leave no physical skeletal evidence and whose presence cannot be easily hypothesized a priori. The probes are designed to represent conserved or unique regions from a range of pathogenic viruses, parasites or bacteria.

Since microarrays contain sequences derived from modern strains of ancient pathogens, the limits of this method are the poor detection of the most divergent genomic regions and the omission of regions with important genomic rearrangements or unknown additional plasmids.

Whole-genome enrichment

The whole-genome in-solution capture (WISC) allows the characterization of the entire genome sequence of ancient individuals. This technique is based on the use of a genome-wide biotinylated RNA probe library generated through in vitro transcription of fresh modern DNA extracts from species closely related to the target aDNA sample. The heat-denatured aDNA library is then annealed to the RNA probes. To improve stringency and reduce enrichment for highly repetitive regions, low-complexity DNA and adaptor-blocking RNA oligonucleotides are added. The library fraction of interest in then recovered through elution from streptavidin-coated paramagnetic beads (to which the RNA probes are bound).

Computational analysis

The analysis of sequence data obtained by NGS relies on the same computational approaches used for modern DNA, with some peculiarities. A widely used tool to align reads from aDNA against reference genomes is the PALEOMIX package, which can quantify DNA damage levels through mapDamage2 and perform phylogenomic and metagenomic analyses. It is important to consider that the alignment will always exhibit substantial fractions of nucleotides mismatched that do not result from sequencing errors or polymorphisms but from the presence of damaged bases. For this reason, the acceptance threshold for read-to-reference edit distance should be chosen according to the phylogenetic distance to the reference genome. Probabilistic aligners that take into account the damage pattern of aDNA have been developed to improve alignments.

MALT

Studies of the ancient DNA of pathogens is restricted to skeletal collections that change their appearance as a result of infections. A pathogen linked to a known epidemiological context is identified through screening without prior knowledge of its presence. Methods include broad-spectrum molecular approaches focused on pathogen detection via fluorescence hybridization-based microarray technology, identification via DNA enrichment of certain microbial regions or computational screening of non-enriched sequence data against human microbiome data sets. These approaches offer improvements but remain biased in the bacterial taxa used for species-level assignments.

MEGAN alignment tool (MALT) is a new program for the fast alignment and taxonomic assignment method to the identification of ancient DNA. MALT is similar to BLAST as it computes local alignments between highly conserved sequences and references. MALT can also calculate semi-global alignments where reads are aligned end-to-end. All references, complete bacterial genomes, are contained in a database called National Center for Biotechnology Information (NCBI) RefSeq. MALT consists of two programs: malt-build and malt-run. Malt-build is used to construct an index for the given database of reference sequences. Instead, malt-run is used to align a set of query sequences against the reference database. The program then computes the bit-score and the expected value (E-value) of the alignment and decides whether to keep or discard the alignment depending on user-specified thresholds for the bit-score, the E-value or the per cent identity. The bit-score is the requires size of a sequence database in which the current match could be found just by chance. The higher the bit-score, the better the sequence similarity. E-value is the number of expected hits of similar quality (score) that could be found just by chance. The smaller is the E-value, the better is the match.

MALT allows the screening of non-enriched sequence data in the search for unknown candidate bacterial pathogens that are involved in past disease outbreaks and for the exclusion of the environmental bacterial background. MALT is very important because it offers the advantage of genome-level screening without selection of a particular target organism, avoiding errors that are common to other screening approaches. To authenticate the candidate taxonomic assignments complete alignments are needed, but the target DNA is often present in a low amount so a small number of a marked region may not be sufficient for identification. This approach can detect only bacterial DNA and viral DNA, so it is not possible to identify other infectious agents that may be present in a population. This method is useful for studies dealing with the identification of pathogens responsible for ancient and modern disease, especially in cases for which candidate organisms are not known a priori.

Applications

Ancient pathogen genomics as a tool against future epidemics

One interesting application of the different sequencing techniques available nowadays is the investigation of historical disease outbreaks to provide an answer to important and long-standing questions in epidemiology, pathogen evolution and also human history.

So, much effort is spent to find more and more information about the aetiology of infectious diseases of historical importance, such as plague and the cocoliztli epidemic, to describe the geographic spread of viruses and to try defining the pathogenic mechanism of these infectious agents that are actually active elements of the evolutionary process. Today Y.pestis and S. enterica seem to be so far from us and no more dangerous at all, but scientists are still interested in the long-term tracing of genetic adaptation of these bacteria and accurate quantification of rates of their evolutionary change. This is because they can extract from this knowledge of the past the right ideas to develop a strategy against future epidemics.

Being perfectly aware of the fact that bacteria and viruses are one of most variable elements in nature, prone to unlimited mutational events, and taking for granted that it is impossible to manage all the external factors that can influence the development of a pathogenic virus, nobody is talking about defeating a new possible outbreak of plague or any other infective agent of the past: here the aim is to define a strategy, a "guideline", to be more prepared when a new dangerous pathogen will come. The contribution of the environment in infections is to be defined and factors such as human migration, climate change, overcrowding in cities or animal domestication are some of the major causes that contribute to the emergence and spread of disease. Of course, these factors are unpredictable and this is a reason why researchers are trying to bring relevant information from the past, that can be useful, today and tomorrow. While they continue to develop strategies to defeat emerging threats using diagnostic, molecular and advanced tools, they are still looking back at how ancient pathogens have evolved and adapted through historical events. The more it's known about the genomic basis of virulence in historical diseases, the more it can be understood about the emergence and re-emergence of infectious diseases today and in the future.

Ancient infections and human evolution

The analyses of phylogenic relationships between the human host and viral pathogens suggest that many diseases have been coevolving with humans for millennia, since the very start of human history in Africa.

In particular, the long-term interaction with pathogens is considered a selection that can be very strong since not all the individuals could survive in touch with all infectious agents that they had met over the years: the natural selection by pathogens is implicated in the evolution of species. This interaction has been already used to track human population movements and to reconstruct human migration flows within and out of Africa.

A pretty new application and interpretation of this feature is using aDNA to better understand human evolution. Many tropical infections probably played a significant role in the human evolutionary process. The correlation between humans and viruses can be understood if it is seen as a "fight" that continues for millennia and that is not still won by anyone: when viruses have changed their features in order to be infective for the other "fighters", humans had to find a strategy to increase their fitness and survived among changes.

In this continuous challenge through the years, next to infective diseases and other illnesses afflicting modern human society, cancer recently represents one of the most enigmatical ailments. Scientists are investigating if neoplastic diseases are restricted to postindustrial human society or if their origins can be found further back in time, maybe into prehistory. The difficulty is that cancer, lethal and fast, leaves very few indications in skeletons in those cases that succumb to death shortly, and even no signs of existence at all, in the case of extraskeletal tumours. Anyway, the knowledge about the aetiology of cancer is incomplete and microorganisms are taking their part with the role of their infection: migration movements in the past could have brought with them viruses, so possible reservoir of tropical disease as well as predisposition to cancer. For this reason, molecular analytical techniques are applied to archaeological remains to study hominin evolution, but also to improve the research in understanding the epidemiology and aetiology of tumours. Information derived from the aDNA can be used to anchor pathogen mutations and reconstruct back from the presence of microorganisms the evolutionary process, it can be useful to develop new vaccines or to discover possible future pathogenic threats.

Past pandemics are much more than just ancient history

What happened in the past is not all history, there is something hidden that can still drive human genetic diversity and natural selection, something that went in contact with humankind hundreds of years ago but that can still have an impact on global human health. Since epidemics are one of the most frequent phenomena that have affected and potentially devastated human populations, it is important to detect, prevent and control potential infective agents. After all, archaeologists, geneticists, and medical scientists are concerned in exploring the influences of pathogens that can contribute, threatening or improving, human health and longevity.

Evolution and phylogenesis of Yersinia pestis

Yersinia pestis is a gram-negative bacterium and belongs to the family of Enterobatteriaceae. Its closest relatives are Yersinia pseudotuberculosis and Yersinia enterocolitica, which are environmental species.

Y. pestis bacillus.

They all possess the plasmid pCD1, which encodes for a type III secretory system. Among chromosomal protein-coding genes, their nucleotide genomic identity rates 97%. They are different in terms of their virulence potential and transmission mechanisms.

Y. pestis is not a human-adapted bacteria. Its main reservoirs are rodents (like marmots, mice, great gerbils, voles and prairie dogs) and it is transmitted to humans by the flea. One of the most studied vectors of this pathogen is Xenopsylla cheopis.

After the bite of an infected flea, the bacteria enter into the host organism and travel to the closest lymph node, where bacteria replicate causing the large swellings called buboes. Bacteria can also disseminate into the bloodstream (causing septicaemia) and to the lung (causing pneumonia). The pulmonary disease has a direct human-to-human transmission.

It has been determined that Y. pestis became so dangerous because of the acquisition of ymt (yersina murine toxin). This gene is present on the pMT1 plasmid and allowed the survival of the bacterium in the flea vector and facilitated colonization of the midgut in arthropod, giving rise to the past millennium large-scale pandemics.

Early evolution and divergence from Yersinia pseudotuberculosis

Y. pestis is distinguishable from the other two species because of its pathogenicity and transmission mechanism. These differences are given by two plasmids: pPCP1, that confer to the bacterium its invasive properties in humans and pMT1, which is involved in flea colonisation (along with some loss of function on bacterial chromosomal genes).

Samples dated on the Late Neolithic and Bronze Age allowed identifying a first genetic divergence between Y. pseudotuberculosis and Y. pestis ancestors. The characteristics that confer to Y. pestis its virulence were absent in these strains: they lack of ymt, a gene necessary to the colonization of the vector; also, they presented an active form of genes required for biofilm formation (inactive in the pathogen Y. pestis) and an active flagellin gene, that is an inducer of immune response (is a pseudogene in Y. pestis).

The comparison of a draft of the genome and the two plasmids (pCD1 and pMT1) with samples of Black Death victims (1348-1349) in the East Smithfield burial ground underlined a very high genetic conservation of the sequence: only 97 single-nucleotide differences over 660 years.

Y. pestis microevolution

The London 6330 individual strain owns mutations absent in other isolates of the same period (1348-1350): the reason may be either the presence of multiple strains circulating in Europe at the same time or the microevolution of one single strain during the pandemic.

Three major outbreaks of plague

There are three pandemic outbreaks of Y. Pestis:

  1. The first is known as the Plague of Justinian, it first occurred in Egypt in 541-543 and then spread to Constantinople and neighbouring regions. It had outbreaks in Europe until 750 CE. Phylogenetic analysis showed that both genomes belong to a lineage that is extinct today and is closely related to stains from modern-day China, which suggest the possibility of an East Asian origin of the first pandemic.
  2. The second pandemic is known as the Black Death or as Great Pestilence. It occurred in 1346-1352 in Europe and had a lot of resurgences of plague, it continued until the 18th century. It could be possible that in this pandemic there were two different strains of Y. pestis that entered the continent through different pulses.
  3. The third pandemic started in China in 1860. It has a fast spread to other countries, due to the use of railways and steamships.

The strains associated with the Justinian Plague appear on a novel branch, which is phylogenetically distinct from the second and the third plague pandemics. The first strain of Y. pestis found during the second outbreak survives and give rise to modern branch 1 strains associated with the third pandemic outbreak.

The first plague bacteria and the second and third plague strain have a common ancestor.

Linkage between 2nd and 3rd pandemics

In a recent study, genomes of Y. pestis from three samples resumed in Barcelona (deceased between 1300-1420), Ellwangen (between 1486-1627) and Bolgar city (between 1298-1388). The date of death of the individuals analysed was determined thanks to radiocarbon dates; the last one was confirmed by the presence of a coin produced only after the year 1362. Of 223 samples from 178 individuals, only one for each site had a suitable amount of DNA and was finally selected for the whole genome sequencing of the bacillus (through a genome-capture assay, using as a draft Y. pseudotuberculosis genome and pMT1 and pCD1 from Y. pestis).

The alignment with a Y. pestis phylogeny tree created with previously know ancient genomes revealed an increase genetic diversity outside of China in comparison to what was previously thought; all the three new genomes mapped in Branch 1 and possess two SNPs associated to the Black Death (all the genomes of Y. pestis dated to the Black Death map in Branch 1). The Barcelona strain has no differences with the London strain; the two individuals from which the genome was obtained died of plague with a distance of some months (spring and autumn 1348) underlining the presence in Europe of a single wave of plague with low genetic diversity. The Ellwangen strain maps in a sub-branch of Branch 1 and is ancestral to a previously sequenced strain (L'Observance). it descends from the one circulating in London and Barcelona during the Black Death but also have additional mutations. Is therefore considered a lineage diverged from Branch 1 before the 16th century (Ellwangen outbreak) and with no known modern descendants.

In comparison with isolates from the Black Death, the Bolgar city strain presents:

  • p3 and p4, shared by the "London individual 6330";
  • p6, shared with all modern Branch 1 strains;
  • p7, unique of this strain;

The Bolgar City strain possesses SNPs associated to the Black Death and can be an evidence of a movement of plague towards east; These findings give credit to one of the models that try to explain the linkage between 2nd and 3rd pandemic: in this scenario, there was a single exit of plague to Europe (causing the Black Death) that after a radiation event, travelled eastward to establish in former soviet union and Asia, from which it spread in the 18th century to give raise to the 3rd pandemic.

Another hypothesis is that the 3rd pandemic's lineage may have been generated by a pre-existing genetic variability in Y. pestis strains in China: this hypothesis is actually supported by the correlation between following waves of the pandemic in Europe and climatic fluctuations that would have allowed its spread in the continent. This model can't explain the genetic diversity of the Black Death (four different lineages at least, that would have required the introduction from Asia of four different strains).

Again, there are two models that try to explain the multiple plague outbreaks in Europe following the black death:

  • Repeated introduction of plague from Asia. This scenario is compatible with the 2nd theory discussed before that sees a genetic variability of Y. pestis in China;
  • Presence in Europe of a reservoir (now extinct) that caused continue outbreaks until the 18th century;

Both models can be valid and nowadays we're not able to demonstrate one over the other. However, the Ellwangen strain genome sequenced in this study may be considered a proof of the second hypothesis due to the geographical position of the city that tends to exclude the possibility of an introduction of plague from eastward.

Modern Y. pestis strains

Sequencing of Y. pestis genomes allowed to discover a variation event preceding Black Death that gave rise to many strains that circulate today.

Salmonella enterica genomes analysis

During the 16th an epidemic occurred in Mexico and it caused high mortality in Indigenous populations of the Americas. This high mortality has been the consequence of the influence of the demographic collapse of many indigenous populations. This epidemic has been called "cocoliztli" by the native Aztec because of the symptoms, in particular, high fever and bleeding.

This pestilence is considered one of the worst epidemics in the history of Mexico and the cause of this outbreaks is remained a mystery for over 500 years.

A group of scientists from Harvard and Max-Planck Institute published a study in the journal of Nature ecology and Evolution, and they suggest Salmonella enterica as a good candidate for the strong epidemic in Mexico during the 16th century. Many studies suggest that this bacterium has been introduced in the Indigenous populations by Europeans.

The group of scientists analyzed the aDNA extracted from the teeth of 24 skeletons buried in a cemetery in the city of Teposcolula-Yucundaa and they found in 10 of the 24 skeletons aDNA traces of Salmonella enterica. Also, to demonstrate that the bacterium was introduced in Mexico by the Europeans, they analyzed five individuals that were buried before the influx of Europeans. The results revealed that there was no evidence of Salmonella enterica in the pre-contact era.

Analysis of Salmonella enterica genomes

The scientists performed the extraction of the aDNA from the teeth of 24 indigenous individuals' remains from the contact era epidemic cemetery and of 5 individuals buried in the pre-contact era cemetery. The extraction was performed according to the protocol for aDNA extraction. The group of researchers examined, in parallel, also a soil sample of the cemeteries to get an overview of the environment microorganisms that could have penetrated the samples.

After the extraction, the genomes were sequenced using Illumina genome analyzer. Then, using a bioinformatic tool, called MALT, the researchers performed an analysis of metagenomic sequences data. This program lets the researchers align the extracted sequences with a reference without specifying a precise target. The researchers performed MALT run two times: one using the complete bacterial genomes that were available through NCBI (National Center for Biotechnology Information) RefSeq as a reference, and the second run was carried out using the full NCBI Nucleotide database to screen for viral DNA.

The result of the screening process was positive for the presence of Salmonella enterica DNA in 10 sequences up to 24 collected from the samples and three tooth sample had a high amount of reads assigned to S. enterica. In particular, the major S. enterica strain present in the samples is the S. Paratyphi C. This strain causes enteric fever in human individuals. In the pre-contact era samples, they did not find any evidence of S. enterica, supporting the hypothesis that S. enterica was not a local bacteria.

A further analysis was carried out to identify the classical pattern of damage of aDNA in the three positive tooth samples and this was conducted by mapping the data sets to the S. Paratyphy C genome reference. The results were positive and supported the thesis of S.enterica as the cause of cocolitzli.

To go in-depth with the analyses and confirm the thesis, the researchers conducted further experiments and computational analysis. They performed a whole-genome target array and in-solution hybridization capture using probes that include the modern S. enterica genome differences and using S. Paratyphi C as reference. The hybridization was successful for the ten positive samples, while the other samples resulted negative for the ancient DNA.

 

Ancient DNA

From Wikipedia, the free encyclopedia
 
Cross-linked DNA extracted from the 4,000-year-old liver of the ancient Egyptian priest Nekht-Ankh.

Ancient DNA (aDNA) is DNA isolated from ancient specimens. Due to degradation processes (including cross-linking, deamination and fragmentation) ancient DNA is more degraded in comparison with contemporary genetic material. Even under the best preservation conditions, there is an upper boundary of 0.4–1.5 million years for a sample to contain sufficient DNA for sequencing technologies. 

Genetic material has been recovered from paleo/archaeological and historical skeletal material, mummified tissues, archival collections of non-frozen medical specimens, preserved plant remains, ice and from permafrost cores, marine and lake sediments and excavation dirt.

History of ancient DNA studies

1980s

The first study of what would come to be called aDNA was conducted in 1984, when Russ Higuchi and colleagues at the University of California, Berkeley reported that traces of DNA from a museum specimen of the Quagga not only remained in the specimen over 150 years after the death of the individual, but could be extracted and sequenced. Over the next two years, through investigations into natural and artificially mummified specimens, Svante Pääbo confirmed that this phenomenon was not limited to relatively recent museum specimens but could apparently be replicated in a range of mummified human samples that dated as far back as several thousand years.

The laborious processes that were required at that time to sequence such DNA (through bacterial cloning) were an effective brake on the development of the field of ancient DNA (aDNA). However, with the development of the Polymerase Chain Reaction (PCR) in the late 1980s, the field began to progress rapidly. Double primer PCR amplification of aDNA (jumping-PCR) can produce highly skewed and non-authentic sequence artifacts. Multiple primer, nested PCR strategy was used to overcome those shortcomings.

1990s

The post-PCR era heralded a wave of publications as numerous research groups claimed success in isolating aDNA. Soon a series of incredible findings had been published, claiming authentic DNA could be extracted from specimens that were millions of years old, into the realms of what Lindahl (1993b) has labelled Antediluvian DNA. The majority of such claims were based on the retrieval of DNA from organisms preserved in amber. Insects such as stingless bees, termites, and wood gnats, as well as plant and bacterial sequences were said to have been extracted from Dominican amber dating to the Oligocene epoch. Still older sources of Lebanese amber-encased weevils, dating to within the Cretaceous epoch, reportedly also yielded authentic DNA. Claims of DNA retrieval were not limited to amber.

Reports of several sediment-preserved plant remains dating to the Miocene were published. Then in 1994, Woodward et al. reported what at the time was called the most exciting results to date— mitochondrial cytochrome b sequences that had apparently been extracted from dinosaur bones dating to more than 80 million years ago. When in 1995 two further studies reported dinosaur DNA sequences extracted from a Cretaceous egg, it seemed that the field would revolutionize knowledge of the Earth's evolutionary past. Even these extraordinary ages were topped by the claimed retrieval of 250-million-year-old halobacterial sequences from halite.

As the field developed a better understanding of the kinetics of DNA preservation, the risks of sample contamination and other complicating factors, along with the failure of attempts to replicate many of the findings, all of the decade's claims of multi-million year old aDNA would come to be dismissed as inauthentic results.

2000s

Single primer extension amplification was introduced in 2007 to address postmortem DNA modification damage. Since 2009 the field of aDNA studies has been revolutionized with the introduction of much cheaper research-techniques, leading to new insights in human migrations. The use of high-throughput Next Generation Sequencing (NGS) techniques in the field of ancient DNA research has been essential for reconstructing the genomes of ancient or extinct organisms. A single-stranded DNA (ssDNA) library preparation method has sparked great interest among ancient DNA (aDNA) researchers.

In addition to these technical innovations, the start of the decade saw the field begin to develop better standards and criteria for evaluating DNA results, as well as a better understanding of the potential pitfalls.

Problems and errors

Degradation processes

Due to degradation processes (including cross-linking, deamination and fragmentation) ancient DNA is of lower quality in comparison with modern genetic material. The damage characteristics and ability of aDNA to survive through time restricts possible analyses and places an upper limit on the age of successful samples.  There is a theoretical correlation between time and DNA degradation, although differences in environmental conditions complicates things. Samples subjected to different conditions are unlikely to predictably align to a uniform age-degradation relationship. The environmental effects may even matter after excavation, as DNA decay rates may increase, particularly under fluctuating storage conditions. Even under the best preservation conditions, there is an upper boundary of 0.4–1.5 million years for a sample to contain sufficient DNA for contemporary sequencing technologies.

Research into the decay of mitochondrial and nuclear DNA in Moa bones has modelled mitochondrial DNA degradation to an average length of 1 base pair after 6,830,000 years at −5 °C. The decay kinetics have been measured by accelerated aging experiments further displaying the strong influence of storage temperature and humidity on DNA decay. Nuclear DNA degrades at least twice as fast as mtDNA. As such, early studies that reported recovery of much older DNA, for example from Cretaceous dinosaur remains, may have stemmed from contamination of the sample.

Age limit

A critical review of ancient DNA literature through the development of the field highlights that few studies after about 2002 have succeeded in amplifying DNA from remains older than several hundred thousand years. A greater appreciation for the risks of environmental contamination and studies on the chemical stability of DNA have resulted in concerns being raised over previously reported results. The alleged dinosaur DNA was later revealed to be human Y-chromosome, while the DNA reported from encapsulated halobacteria has been criticized based on its similarity to modern bacteria, which hints at contamination. A 2007 study also suggests that these bacterial DNA samples may not have survived from ancient times, but may instead be the product of long-term, low-level metabolic activity.

aDNA may contain a large number of postmortem mutations, increasing with time. Some regions of polynucleotite are more susceptible to this degradation, so sequence data can bypass statistical filters used to check the validity of data. Due to sequencing errors, great caution should be applied to interpretation of population size. Substitutions resulting from deamination cytosine residues are vastly over-represented in the ancient DNA sequences. Miscoding of C to T and G to A accounts for the majority of errors.

Contamination

Another problem with ancient DNA samples is contamination by modern human DNA and by microbial DNA (most of which is also ancient). New methods have emerged in recent years to prevent possible contamination of aDNA samples, including conducting extractions under extreme sterile conditions, using special adapters to identify endogenous molecules of the sample (over ones that may have been introduced during analysis), and applying bioinformatics to resulting sequences based on known reads in order approximate rates of contamination.

Non-human aDNA

Despite the problems associated with 'antediluvian' DNA, a wide and ever-increasing range of aDNA sequences have now been published from a range of animal and plant taxa. Tissues examined include artificially or naturally mummified animal remains, bone, paleofaeces, alcohol preserved specimens, rodent middens, dried plant remains, and recently, extractions of animal and plant DNA directly from soil samples.

In June 2013, a group of researchers including Eske Willerslev, Marcus Thomas Pius Gilbert and Orlando Ludovic of the Centre for Geogenetics, Natural History Museum of Denmark at the University of Copenhagen, announced that they had sequenced the DNA of a 560–780 thousand year old horse, using material extracted from a leg bone found buried in permafrost in Canada's Yukon territory.

In 2013, a German team reconstructed the mitochondrial genome of an Ursus deningeri more than 300,000 years old, proving that authentic ancient DNA can be preserved for hundreds of thousand years outside of permafrost.

Researchers in 2016 measured chloroplast DNA in marine sediment cores, and found diatom DNA dating back to 1.4 million years. This DNA had a half-life significantly longer than previous research, of up to 15,000 years. Kirkpatrick's team also found that DNA only decayed along a half-life rate until about 100 thousand years, at which point it followed a slower, power-law decay rate.

Human aDNA

Due to the considerable anthropological, archaeological, and public interest directed toward human remains, they have received considerable attention from the DNA community. There are also more profound contamination issues, since the specimens belong to the same species as the researchers collecting and evaluating the samples.

Sources

Due to the morphological preservation in mummies, many studies from the 1990s and 2000s used mummified tissue as a source of ancient human DNA. Examples include both naturally preserved specimens, for example, those preserved in ice, such as the Ötzi the Iceman, or through rapid desiccation, such as high-altitude mummies from the Andes, as well as various sources of artificially preserved tissue (such as the chemically treated mummies of ancient Egypt). However, mummified remains are a limited resource. The majority of human aDNA studies have focused on extracting DNA from two sources that are much more common in the archaeological recordbone and teeth. The bone that is most often used for DNA extraction is the petrous bone, since its dense structure provides good conditions for DNA preservation. Several other sources have also yielded DNA, including paleofaeces, and hair. Contamination remains a major problem when working on ancient human material.

Ancient pathogen DNA has been successfully retrieved from samples dating to more than 5,000 years old in humans and as long as 17,000 years ago in other species. In addition to the usual sources of mummified tissue, bones and teeth, such studies have also examined a range of other tissue samples, including calcified pleura, tissue embedded in paraffin, and formalin-fixed tissue. Efficient computational tools have been developed for pathogen and microorganism aDNA analyses in a small (QIIME) and large scale (FALCON).

Results

Taking preventative measures in their procedure against such contamination though, a 2012 study analyzed bone samples of a Neanderthal group in the El Sidrón cave, finding new insights on potential kinship and genetic diversity from the aDNA. In November 2015, scientists reported finding a 110,000-year-old tooth containing DNA from the Denisovan hominin, an extinct species of human in the genus Homo.

The research has added new complexity to the peopling of Eurasia. It has also revealed new information about links between the ancestors of Central Asians and the indigenous peoples of the Americas. In Africa, older DNA degrades quickly due to the warmer tropical climate, although, in September 2017, ancient DNA samples, as old as 8,100 years old, have been reported.

Moreover, ancient DNA has helped researchers to estimate modern human divergence. By sequencing African genomes from three Stone Age hunter gatherers (2000 years old) and four Iron Age farmers (300 to 500 years old), Schlebusch and colleagues were able to push back the date of the earliest divergence between human populations to 350,000 to 260,000 years ago.

Archaeogenetics

From Wikipedia, the free encyclopedia

Archaeogenetics is the study of ancient DNA using various molecular genetic methods and DNA resources. This form of genetic analysis can be applied to human, animal, and plant specimens. Ancient DNA can be extracted from various fossilized specimens including bones, eggshells, and artificially preserved tissues in human and animal specimens. In plants, Ancient DNA can be extracted from seeds, tissue, and in some cases, feces. Archaeogenetics provides us with genetic evidence of ancient population group migrations, domestication events, and plant and animal evolution. The ancient DNA cross referenced with the DNA of relative modern genetic populations allows researchers to run comparison studies that provide a more complete analysis when ancient DNA is compromised.

Archaeogenetics receives its name from the Greek word arkhaios, meaning "ancient", and the term genetics, meaning "the study of heredity". The term archaeogenetics was conceived by archaeologist Colin Renfrew.

Early work

Ludwik Hirszfeld (1884–1954)

Ludwik Hirszfeld was a Polish microbiologist and serologist who was the President of the Blood Group Section of the Second International Congress of Blood Transfusion. He founded blood group inheritance with Erich von Dungern in 1910, and contributed to it greatly throughout his life. He studied ABO blood groups. In one of his studies in 1919, Hirszfeld documented the ABO blood groups and hair color of people at the Macedonian front, leading to his discovery that the hair color and blood type had no correlation. In addition to that he observed that there was a decrease of blood group A from western Europe to India and the opposite for blood group B. He hypothesized that the east-to-west blood group ratio stemmed from two blood groups consisting of mainly A or B mutating from blood group O, and mixing through migration or intermingling. A majority of his work was researching the links of blood types to sex, disease, climate, age, social class, and race. His work led him to discover that peptic ulcer was more dominant in blood group O, and that AB blood type mothers had a high male-to-female birth ratio.

Arthur Mourant (1904–1994)

Arthur Mourant was a British hematologist and chemist. He received many awards, most notably Fellowship of the Royal Society. His work included organizing the existing data on blood group gene frequencies, and largely contributing to the genetic map of the world through his investigation of blood groups in many populations. Mourant discovered the new blood group antigens of the Lewis, Henshaw, Kell, and Rhesus systems, and analyzed the association of blood groups and various other diseases. He also focused on the biological significance of polymorphisms. His work provided the foundation for archaeogenetics because it facilitated the separation of genetic evidence for biological relationships between people. This genetic evidence was previously used for that purpose. It also provided material that could be used to appraise the theories of population genetics.

William Boyd (1903–1983)

William Boyd was an American immunochemist and biochemist who became famous for his research on the genetics of race in the 1950s. During the 1940s, Boyd and Karl O. Renkonen independently discovered that lectins react differently to various blood types, after finding that the crude extracts of the lima bean and tufted vetch agglutinated the red blood cells from blood type A but not blood types B or O. This ultimately led to the disclosure of thousands of plants that contained these proteins. In order to examine racial differences and the distribution and migration patterns of various racial groups, Boyd systematically collected and classified blood samples from around the world, leading to his discovery that blood groups are not influenced by the environment, and are inherited. In his book Genetics and the Races of Man (1950), Boyd categorized the world population into 13 distinct races, based on their different blood type profiles and his idea that human races are populations with differing alleles. One of the most abundant information sources regarding inheritable traits linked to race remains the study of blood groups.

Methods

Fossil DNA preservation

Fossil retrieval starts with selecting an excavation site. Potential excavation sites are usually identified with the mineralogy of the location and visual detection of bones in the area. However, there are more ways to discover excavation zones using technology such as field portable x-ray fluorescence and Dense Stereo Reconstruction. Tools used include knives, brushes, and pointed trowels which assist in the removal of fossils from the earth.

To avoid contaminating the ancient DNA, specimens are handled with gloves and stored in -20 °C immediately after being unearthed. Ensuring that the fossil sample is analyzed in a lab that has not been used for other DNA analysis could prevent contamination as well. Bones are milled to a powder and treated with a solution before the polymerase chain reaction (PCR) process. Samples for DNA amplification may not necessarily be fossil bones. Preserved skin, salt- preserved or air-dried, can also be used in certain situations.

DNA preservation is difficult because the bone fossilisation degrades and DNA is chemically modified, usually by bacteria and fungi in the soil. The best time to extract DNA from a fossil is when it is freshly out of the ground as it contains six times the DNA when compared to stored bones. The temperature of extraction site also affects the amount of obtainable DNA, evident by a decrease in success rate for DNA amplification if the fossil is found in warmer regions. A drastic change of a fossil's environment also affects DNA preservation. Since excavation causes an abrupt change in the fossil's environment, it may lead to physiochemical change in the DNA molecule. Moreover, DNA preservation is also affected by other factors such as the treatment of the unearthed fossil like (e.g. washing, brushing and sun drying), pH, irradiation, the chemical composition of bone and soil, and hydrology. There are three perseveration diagenetic phases. The first phase is bacterial putrefaction, which is estimated to cause a 15-fold degradation of DNA. Phase 2 is when bone chemically degrades, mostly by depurination. The third diagenetic phase occurs after the fossil is excavated and stored, in which bone DNA degradation occurs most rapidly.

Methods of DNA extraction

Once a specimen is collected from an archaeological site, DNA can be extracted through a series of processes. One of the more common methods utilizes silica and takes advantage of polymerase chain reactions in order to collect ancient DNA from bone samples.

There are several challenges that add to the difficulty when attempting to extract ancient DNA from fossils and prepare it for analysis. DNA is continuously being split up. While the organism is alive these splits are repaired; however, once an organism has died, the DNA will begin to deteriorate without repair. This results in samples having strands of DNA measuring around 100 base pairs in length. Contamination is another significant challenge at multiple steps throughout the process. Often other DNA, such as bacterial DNA, will be present in the original sample. To avoid contamination it is necessary to take many precautions such as separate ventilation systems and workspaces for ancient DNA extraction work. The best samples to use are fresh fossils as uncareful washing can lead to mold growth. DNA coming from fossils also occasionally contains a compound that inhibits DNA replication. Coming to a consensus on which methods are best at mitigating challenges is also difficult due to the lack of repeatability caused by the uniqueness of specimens.

Silica-based DNA extraction is a method used as a purification step to extract DNA from archaeological bone artifacts and yield DNA that can be amplified using polymerase chain reaction (PCR) techniques. This process works by using silica as a means to bind DNA and separate it from other components of the fossil process that inhibit PCR amplification. However, silica itself is also a strong PCR inhibitor, so careful measures must be taken to ensure that silica is removed from the DNA after extraction. The general process for extracting DNA using the silica-based method is outlined by the following:

  1. Bone specimen is cleaned and the outer layer is scraped off
  2. Sample is collected from preferably compact section
  3. Sample is ground to fine powder and added to an extraction solution to release DNA
  4. Silica solution is added and centrifuged to facilitate DNA binding
  5. Binding solution is removed and a buffer is added to the solution to release the DNA from the silica

One of the main advantages of silica-based DNA extraction is that it is relatively quick and efficient, requiring only a basic laboratory setup and chemicals. It is also independent of sample size, as the process can be scaled to accommodate larger or smaller quantities. Another benefit is that the process can be executed at room temperature. However, this method does contain some drawbacks. Mainly, silica-based DNA extraction can only be applied to bone and teeth samples; they cannot be used on soft tissue. While they work well with a variety of different fossils, they may be less effective in fossils that are not fresh (e.g. treated fossils for museums). Also, contamination poses a risk for all DNA replication in general, and this method may result in misleading results if applied to contaminated material.

Polymerase chain reaction is a process that can amplify segments of DNA and is often used on extracted ancient DNA. It has three main steps: denaturation, annealing, and extension. Denaturation splits the DNA into two single strands at high temperatures. Annealing involves attaching primer strands of DNA to the single strands that allow Taq polymerase to attach to the DNA. Extension occurs when Taq polymerase is added to the sample and matches base pairs to turn the two single strands into two complete double strands. This process is repeated many times, and is usually repeated a higher number of times when used with ancient DNA. Some issues with PCR is that it requires overlapping primer pairs for ancient DNA due to the short sequences. There can also be “jumping PCR” which causes recombination during the PCR process which can make analyzing the DNA more difficult in inhomogeneous samples.

Methods of DNA analysis

DNA extracted from fossil remains is primarily sequenced using Massive parallel sequencing, which allows simultaneous amplification and sequencing of all DNA segments in a sample, even when it is highly fragmented and of low concentration. It involves attaching a generic sequence to every single strand that generic primers can bond to, and thus all of the DNA present is amplified. This is generally more costly and time intensive than PCR but due to the difficulties involved in ancient DNA amplification it is cheaper and more efficient. One method of massive parallel sequencing, developed by Margulies et al., employs bead-based emulsion PCR and pyrosequencing, and was found to be powerful in analyses of aDNA because it avoids potential loss of sample, substrate competition for templates, and error propagation in replication.

The most common way to analyze aDNA sequence is to compare it with a known sequence from other sources, and this could be done in different ways for different purposes.

The identity of the fossil remain can be uncovered by comparing its DNA sequence with those of known species using software such as BLASTN. This archaeogenetic approach is especially helpful when the morphology of the fossil is ambiguous. Apart from that, species identification can also be done by finding specific genetic markers in an aDNA sequence. For example, the American indigenous population is characterized by specific mitochondrial RFLPs and deletions defined by Wallace et al.

aDNA comparison study can also reveal the evolutionary relationship between two species. The number of base differences between DNA of an ancient species and that of a closely related extant species can be used to estimate the divergence time of those two species from their last common ancestor. The phylogeny of some extinct species, such as Australian marsupial wolves and American ground sloths, has been constructed by this method. Mitochondrial DNA in animals and chloroplast DNA in plants are usually used for this purpose because they have hundreds of copies per cell and thus are more easily accessible in ancient fossils.

Another method to investigate relationship between two species is through DNA hybridization. Single-stranded DNA segments of both species are allowed to form complementary pair bonding with each other. More closely related species have a more similar genetic makeup, and thus a stronger hybridization signal. Scholz et al. conducted southern blot hybridization on Neanderthal aDNA (extracted from fossil remain W-NW and Krapina). The results showed weak ancient human-Neanderthal hybridization and strong ancient human-modern human hybridization. The human-chimpanzee and neanderthal-chimpanzee hybridization are of similarly weak strength. This suggests that humans and neanderthals are not as closely related as two individuals of the same species are, but they are more related to each other than to chimpanzees.

There have also been some attempts to decipher aDNA to provide valuable phenotypic information of ancient species. This is always done by mapping aDNA sequence onto the karyotype of a well-studied closely related species, which share a lot of similar phenotypic traits. For example, Green et al. compared the aDNA sequence from Neanderthal Vi-80 fossil with modern human X and Y chromosome sequence, and they found a similarity in 2.18 and 1.62 bases per 10,000 respectively, suggesting Vi-80 sample was from a male individual. Other similar studies include finding of a mutation associated with dwarfism in Arabidopsis in ancient Nubian cotton, and investigation on the bitter taste perception locus in Neanderthals.

Applications

Human archaeology

Africa

Modern humans are thought to have evolved in Africa at least 200 kya (thousand years ago), with some evidence suggesting a date of over 300 kya. Examination of mitochondrial DNA (mtDNA), Y-chromosome DNA, and X-chromosome DNA indicate that the earliest population to leave Africa consisted of approximately 1500 males and females. It has been suggested by various studies that populations were geographically “structured” to some degree prior to the expansion out of Africa; this is suggested by the antiquity of shared mtDNA lineages. One study of 121 populations from various places throughout the continent found 14 genetic and linguistic “clusters,” suggesting an ancient geographic structure to African populations. In general, genotypic and phenotypic analysis have shown “large and subdivided throughout much of their evolutionary history.”

Genetic analysis has supported archaeological hypotheses of a large-scale migrations of Bantu speakers into Southern Africa approximately 5 kya. Microsatellite DNA, single nucleotide polymorphisms (SNPs), and insertion/deletion polymorphisms (INDELS) have shown that Nilo-Saharan speaking populations originate from Sudan. Furthermore, there is genetic evidence that Chad-speaking descendants of Nilo-Saharan speakers migrated from Sudan to Lake Chad about 8 kya. Genetic evidence has also indicated that non-African populations made significant contributions to the African gene pool. For example, the Saharan African Beja people have high levels of Middle-Eastern as well as East African Cushitic DNA.

Europe

Analysis of mtDNA shows that Eurasia was occupied in a single migratory event between 60 and 70 kya. Genetic evidence shows that occupation of the Near East and Europe happened no earlier than 50 kya. Studying haplogroup U has shown separate dispersals from the Near East both into Europe and into North Africa.

Much of the work done in archaeogenetics focuses on the Neolithic transition in Europe. Cavalli-Svorza's analysis of genetic-geographic patterns led him to conclude that there was a massive influx of Near Eastern populations into Europe at the start of the Neolithic. This view led him “to strongly emphasize the expanding early farmers at the expense of the indigenous Mesolithic foraging populations.” mtDNA analysis in the 1990s, however, contradicted this view. M.B. Richards estimated that 10–22% of extant European mtDNA's had come from Near Eastern populations during the Neolithic. Most mtDNA's were “already established” among existing Mesolithic and Paleolithic groups. Most “control-region lineages” of modern European mtDNA are traced to a founder event of reoccupying northern Europe towards the end of the Last Glacial Maximum (LGM). One study of extant European mtDNA's suggest this reoccupation occurred after the end of the LGM, although another suggests it occurred before. Analysis of haplogroups V, H, and U5 support a “pioneer colonization” model of European occupation, with incorporation of foraging populations into arriving Neolithic populations. Furthermore, analysis of ancient DNA, not just extant DNA, is shedding light on some issues. For instance, comparison of neolithic and mesolithic DNA has indicated that the development of dairying preceded widespread lactose tolerance.

South Asia

South Asia has served as the major early corridor for geographical dispersal of modern humans from out-of-Africa. Based on studies of mtDNA line M, some have suggested that the first occupants of India were Austro-Asiatic speakers who entered about 45–60 kya. The Indian gene pool has contributions from earliest settlers, as well as West Asian and Central Asian populations from migrations no earlier than 8 kya. The lack of variation in mtDNA lineages compared to the Y-chromosome lineages indicate that primarily males partook in these migrations. The discovery of two subbranches U2i and U2e of the U mtDNA lineage, which arose in Central Asia has “modulated” views of a large migration from Central Asia into India, as the two branches diverged 50 kya. Furthermore, U2e is found in large percentages in Europe but not India, and vice versa for U2i, implying U2i is native to India.

East Asia

Analysis of mtDNA and NRY (non-recombining region of Y chromosome) sequences have indicated that the first major dispersal out of Africa went through Saudi Arabia and the Indian coast 50–100 kya, and a second major dispersal occurred 15–50 kya north of the Himalayas.

Much work has been done to discover the extent of north-to-south and south-to-north migrations within Eastern Asia. Comparing the genetic diversity of northeastern groups with southeastern groups has allowed archaeologists to conclude many of the northeast Asian groups came from the southeast. The Pan-Asian SNP (single nucleotide polymorphism) study found “a strong and highly significant correlation between haplotype diversity and latitude,” which, when coupled with demographic analysis, supports the case for a primarily south-to-north occupation of East Asia. Archaeogenetics has also been used to study hunter-gatherer populations in the region, such as the Ainu from Japan and Negrito groups in the Philippines. For example, the Pan-Asian SNP study found that Negrito populations in Malaysia and the Negrito populations in the Philippines were more closely related to non-Negrito local populations than to each other, suggesting Negrito and non-Negrito populations are linked by one entry event into East Asia; although other Negrito groups do share affinities, including with Australian Aboriginies. A possible explanation of this is a recent admixture of some Negrito groups with their local populations.

Americas

Archaeogenetics has been used to better understand the populating of the Americas from Asia. Native American mtDNA haplogroups have been estimated to be between 15 and 20 kya, although there is some variation in these estimates. Genetic data has been used to propose various theories regarding how the Americas were colonized. Although the most widely held theory suggests “three waves” of migration after the LGM through the Bering Strait, genetic data have given rise to alternative hypotheses. For example, one hypothesis proposes a migration from Siberia to South America 20–15 kya and a second migration that occurred after glacial recession. Y-chromosome data has led some to hold that there was a single migration starting from the Altai Mountains of Siberia between 17.2–10.1 kya, after the LGM. Analysis of both mtDNA and Y-chromosome DNA reveals evidence of “small, founding populations.” Studying haplogroups has led some scientists to conclude that a southern migration into the Americas from one small population was impossible, although separate analysis has found that such a model is feasible if such a migration happened along the coasts.

Australia and New Guinea

Finally, archaeogenetics has been used to study the occupation of Australia and New Guinea. The aborigines of Australia and New Guinea are phenotypically very similar, but mtDNA has shown that this is due to convergence from living in similar conditions. Non-coding regions of mt-DNA have shown “no similarities” between the aboriginal populations of Australia and New Guinea. Furthermore, no major NRY lineages are shared between the two populations. The high frequency of a single NRY lineage unique to Australia coupled with “low diversity of lineage-associated Y-chromosomal short tandem repeat (Y-STR) haplotypes” provide evidence for a “recent founder or bottleneck” event in Australia. But there is relatively large variation in mtDNA, which would imply that the bottleneck effect impacted males primarily. Together, NRY and mtDNA studies show that the splitting event between the two groups was over 50kya, casting doubt on recent common ancestry between the two.

Plants and animals

Archaeogenetics has been used to understand the development of domestication of plants and animals.

Domestication of plants

The combination of genetics and archeological findings have been used to trace the earliest signs of plant domestication around the world. However, since the nuclear, mitochondrial, and chloroplast genomes used to trace domestication's moment of origin have evolved at different rates, its use to trace genealogy have been somewhat problematic. Nuclear DNA in specific is used over mitochondrial and chloroplast DNA because of its faster mutation rate as well as its intraspecific variation due to a higher consistency of polymorphism genetic markers. Findings in crop ‘domestication genes’ (traits that were specifically selected for or against) include

  • tb1 (teosinte branched1) – affecting the apical dominance in maize
  • tga1 (teosinte glume architecture1) – making maize kernels compatible for the convenience of humans 
  • te1 (Terminal ear1) – affecting the weight of kernels
  • fw2.2 – affecting the weight in tomatoes
  • BoCal – inflorescence of broccoli and cauliflower

Through the study of archaeogenetics in plant domestication, signs of the first global economy can also be uncovered. The geographical distribution of new crops highly selected in one region found in another where it would have not originally been introduced serve as evidence of a trading network for the production and consumption of readily available resources.

Domestication of animals

Archaeogenetics has been used to study the domestication of animals. By analyzing genetic diversity in domesticated animal populations researchers can search for genetic markers in DNA to give valuable insight about possible traits of progenitor species. These traits are then used to help distinguish archaeological remains between wild and domesticated specimens. The genetic studies can also lead to the identification of ancestors for domesticated animals. The information gained from genetics studies on current populations helps guide the Archaeologist's search for documenting these ancestors.

Archaeogenetics has been used to trace the domestication of pigs throughout the old world. These studies also reveal evidence about the details of early farmers. Methods of Archaeogenetics have also been used to further understand the development of domestication of dogs. Genetic studies have shown that all dogs are descendants from the gray wolf, however, it is currently unknown when, where, and how many times dogs were domesticated. Some genetic studies have indicated multiple domestications while others have not. Archaeological findings help better understand this complicated past by providing solid evidence about the progression of the domestication of dogs. As early humans domesticated dogs the archaeological remains of buried dogs became increasingly more abundant. Not only does this provide more opportunities for archaeologists to study the remains, it also provides clues about early human culture.

Genetic history of Europe

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Genetic_history_of_Europe

Admixture plots of modern West Eurasian populations based on seven components:
     South/West European      North/East European      Caucasus
     Middle Eastern      South Asian      East Asian      North African/Sub-Saharan African 
 
The European genetic structure (based on 273,464 SNPs). Three levels of structure as revealed by PC analysis are shown: A) inter-continental; B) intra-continental; and C) inside a single country (Estonia), where median values of the PC1&2 are shown. D) European map illustrating the origin of sample and population size. CEU – Utah residents with ancestry from Northern and Western Europe, CHB – Han Chinese from Beijing, JPT – Japanese from Tokyo, and YRI – Yoruba from Ibadan, Nigeria.

The genetic history of Europe since the Upper Paleolithic is inseparable from that of wider Western Eurasia. By about 50-40,000 years ago (50-40 ka) a basal West Eurasian lineage had emerged (alongside a separate East Asian lineage) out of the undifferentiated "non-African" lineage of 70-50 ka. Both basal East and West Eurasians acquired Neanderthal admixture in Europe and Asia.

European early modern humans (EEMH) lineages between 40 and 26 ka (Aurignacian) were still part of a large Western Eurasian "meta-population", related to Central and Western Asian populations.

Divergence into genetically distinct sub-populations within Western Eurasia is a result of increased selection pressure and founder effects during the Last Glacial Maximum (LGM, Gravettian). By the end of the LGM, after 20 ka, A Western European lineage, dubbed West European Hunter-Gatherer (WHG) emerges from the Solutrean refugium during the European Mesolithic. These mesolithic hunter-gatherer cultures are substantially replaced in the Neolithic Revolution by the arrival of Early European Farmers (EEF) lineages derived from mesolithic populations of West Asia (Anatolia and the Caucasus). In the European Bronze Age, there were again substantial population replacements in parts of Europe by the intrusion of Ancient North Eurasian (ANE) lineages from the Pontic–Caspian steppes. These Bronze Age population replacements are associated with the Beaker culture archaeologically and with the Indo-European expansion linguistically.

As a result of the population movements during the Mesolithic to Bronze Age, modern European populations are distinguished by differences in WHG, EEF and ANE ancestry. Admixture rates varied geographically; in the late Neolithic, WHG ancestry in farmers in Hungary was at around 10%, in Germany around 25% and in Iberia as high as 50%. The contribution of EEF is more significant in Mediterranean Europe, and declines towards northern and northeastern Europe, where WHG ancestry is stronger; the Sardinians are considered to be the closest European group to the population of the EEF. ANE ancestry is found throughout Europe, with maxima of about 20% found in Baltic people and Finns.

Ethnogenesis of the modern ethnic groups of Europe in the historical period is associated with numerous admixture events, primarily those associated with the Roman, Germanic, Norse, Slavic, Arab and Turkish expansions.

Research into the genetic history of Europe became possible in the second half of the 20th century, but did not yield results with high resolution before the 1990s. In the 1990s, preliminary results became possible, but they remained mostly limited to studies of mitochondrial and Y-chromosomal lineages. Autosomal DNA became more easily accessible in the 2000s, and since the mid-2010s, results of previously unattainable resolution, many of them based on full-genome analysis of ancient DNA, have been published at an accelerated pace.

Prehistory

Distribution of the Neanderthals, and main sites

The prehistory of the European peoples can be traced by the examination of archaeological sites, linguistic studies and by the examination of the DNA of the people who live in Europe or from ancient DNA. The research continues and so theories rise and fall. Although it is possible to track migrations of people across Europe using founder analysis of DNA, most information on these movements comes from archaeology.

It is important to note that the settlement of Europe did not occur in discrete migrations, as might appear to be suggested. Rather, the settlement process was complex and "likely to have occurred in multiple waves from the east and to have been subsequently obscured by millennia of recurrent gene flow".

Due to natural selection, the percentage of Neanderthal DNA in ancient Europeans gradually decreased over time. From 45,000 BP to 7,000 BP, the percentage dropped from around 3–6% to 2%. The removal of Neanderthal-derived alleles occurred more frequently around genes than other parts of the genome.

Palaeolithic

Neanderthals inhabited much of Europe and western Asia from as far back as 130,000 years ago. They existed in Europe as late as 30,000 years ago. They were eventually replaced by anatomically modern humans (AMH; sometimes known as Cro-Magnons), who began to appear in Europe circa 40,000 years ago. Given that the two hominid species likely coexisted in Europe, anthropologists have long wondered whether the two interacted. The question was resolved only in 2010, when it was established that Eurasian populations exhibit Neanderthal admixture, estimated at 1.5–2.1% on average. The question now became whether this admixture had taken place in Europe, or rather in the Levant, prior to AMH migration into Europe.

There has also been speculation about the inheritance of specific genes from Neanderthals. For example, one MAPT locus 17q21.3 which is split into deep genetic lineages H1 and H2. Since the H2 lineage seems restricted to European populations, several authors had argued for inheritance from Neanderthals beginning in 2005. However the preliminary results from the sequencing of the full Neanderthal Genome at that time (2009), failed to uncover evidence of interbreeding between Neanderthals and modern humans. By 2010, findings by Svante Pääbo (Max Planck Institute for Evolutionary Anthropology at Leipzig, Germany), Richard E. Green (University of California, Santa Cruz), and David Reich (Harvard Medical School), comparing the genetic material from the bones of three Neanderthals with that from five modern humans, did show a relationship between Neanderthals and modern people outside Africa.

Upper Paleolithic

Replacement of Neanderthals by early modern humans

It is thought that modern humans began to inhabit Europe during the Upper Paleolithic about 40,000 years ago. Some evidence shows the spread of the Aurignacian culture.

From a purely patrilineal, Y-chromosome perspective, it is possible that the old Haplogroup C1a2, F and/or E may be those with the oldest presence in Europe. They have been found in some very old human remains in Europe. However, other haplogroup are far more common among living European males.

Haplogroup I (M170), which is now relatively common and widespread within Europe, may represent a Palaeolithic marker – its age has been estimated at ~ 22,000 BP. While it is now concentrated in Europe, it probably arose in a male from the Middle East or Caucasus, or their near descendants, c. 20–25,000 years BP, when it diverged from its immediate ancestor, haplogroup IJ. At about this time, an Upper Palaeolithic culture also appeared, known as the Gravettian.

Earlier research into Y-DNA had instead focused on haplogroup R1 (M173): the most populous lineage among living European males; R1 was also believed to have emerged ~ 40,000 BP in Central Asia. However, it is now estimated that R1 emerged substantially more recently: a 2008 study dated the most recent common ancestor of haplogroup IJ to 38,500 and haplogroup R1 to 18,000 BP. This suggested that haplogroup IJ colonists formed the first wave and haplogroup R1 arrived much later.

Thus the genetic data suggests that, at least from the perspective of patrilineal ancestry, separate groups of modern humans took two routes into Europe: from the Middle East via the Balkans and another from Central Asia via the Eurasian Steppe, to the north of the Black Sea.

Martin Richards et al. found that 15–40% of extant mtDNA lineages trace back to the Palaeolithic migrations (depending on whether one allows for multiple founder events). MtDNA haplogroup U5, dated to be ~ 40–50 kYa, arrived during the first early upper Palaeolithic colonisation. Individually, it accounts for 5–15% of total mtDNA lineages. Middle U.P. movements are marked by the haplogroups HV, I and U4. HV split into Pre-V (around 26,000 years old) and the larger branch H, both of which spread over Europe, possibly via Gravettian contacts.

Haplogroup H accounts for about half the gene lines in Europe, with many subgroups. The above mtDNA lineages or their precursors, are most likely to have arrived into Europe via the Middle East. This contrasts with Y DNA evidence, whereby some 50%-plus of male lineages are characterised by the R1 superfamily, which is of possible central Asian origin. Ornella Semino postulates that these differences "may be due in part to the apparent more recent molecular age of Y chromosomes relative to other loci, suggesting more rapid replacement of previous Y chromosomes. Gender-based differential migratory demographic behaviors will also influence the observed patterns of mtDNA and Y variation".

Last Glacial Maximum

European LGM refuges, 20 kya
  Solutrean and Proto-Solutrean Cultures
  Epi-Gravettian Culture

The Last Glacial Maximum ("LGM") started c. 30 ka BCE, at the end of MIS 3, leading to a depopulation of Northern Europe. According to the classical model, people took refuge in climatic sanctuaries (or refugia) as follows:

  • Northern Iberia and Southwest France, together making up the "Franco-Cantabrian" refugium
  • The Balkans
  • Ukraine and more generally the northern coast of the Black Sea
  • Italy.

This event decreased the overall genetic diversity in Europe, a "result of drift, consistent with an inferred population bottleneck during the Last Glacial Maximum". As the glaciers receded from about 16,000–13,000 years ago, Europe began to be slowly repopulated by people from refugia, leaving genetic signatures.

Some Y haplogroup I clades appear to have diverged from their parental haplogroups sometime during or shortly after the LGM. Haplogroup I2 is prevalent in the western Balkans, as well as the rest of southeastern and central-eastern Europe in more moderate frequencies. Its frequency drops rapidly in central Europe, suggesting that the survivors bearing I2 lineages expanded predominantly through south-eastern and central-eastern Europe.

Cinnioglu sees evidence for the existence of an Anatolian refuge, which also harboured Hg R1b1b2. Today, R1b dominates the y chromosome landscape of western Europe, including the British Isles, suggesting that there could have been large population composition changes based on migrations after the LGM.

Semino, Passarino and Pericic place the origins of haplogroup R1a within the Ukrainian ice-age refuge. Its current distribution in eastern Europe and parts of Scandinavia are in part reflective of a re-peopling of Europe from the southern Russian/Ukrainian steppes after the Late Glacial Maximum.

From an mtDNA perspective, Richards et al. found that the majority of mtDNA diversity in Europe is accounted for by post-glacial re-expansions during the late upper Palaeolithic/ Mesolithic. "The regional analyses lend some support to the suggestion that much of western and central Europe was repopulated largely from the southwest when the climate improved. The lineages involved include much of the most common haplogroup, H, as well as much of K, T, W, and X." The study could not determine whether there were new migrations of mtDNA lineages from the near east during this period; a significant input was deemed unlikely.

The alternative model of more refugees was discussed by Bilton et al.

From a study of 51 individuals, researchers were able to identify five separate genetic clusters of ancient Europeans during the LGM: the Věstonice Cluster (34,000–26,000 years ago), associated with the Gravettian culture; the Mal'ta Cluster (24,000–17,000), associated with the Mal'ta-Buret' culture, the El Mirón Cluster (19,000–14,000 years ago), associated with the Magdalenian culture; the Villabruna Cluster (14,000–7,000 years ago) and the Satsurblia Cluster (13,000 to 10,000 years ago).

From around 37,000 years ago, all ancient Europeans began to share some ancestry with modern Europeans. This founding population is represented by GoyetQ116-1, a 35,000 year old specimen from Belgium. This lineage disappears from the record and is not found again until 19,000 BP in Spain at El Mirón, which shows strong affinities to GoyetQ116-1. During this interval, the distinct Věstonice Cluster is predominant in Europe, even at Goyet. The re-expansion of the El Mirón Cluster coincided with warming temperatures following the retreat of the glaciers during the Last Glacial Maximum. From 37,000 to 14,000 years ago, the population of Europe consisted of an isolated population descended from a founding population that didn't interbreed significantly with other populations.

Mesolithic

Mesolithic (post-LGM) populations had diverged significantly due to their relative isolation over several millennia, due to the harsh selection pressures during the LGM, and due to the founder effects caused by the rapid expansion from LGM refugia in the beginning Mesolithic. By the end of the LGM, around 19 to 11 ka, the familiar varieties of Eurasian phenotypes had emerged. However, the lineage of Mesolithic hunter-gatherers of Western Europe (WHG) does not survive as a majority contribution in any modern population. They were most likely blue eyed, and retained the dark skin pigmentation of pre-LGM EEMH. The HERC2 and OCA2 variations for blue eyes are derived from the WHG lineage were also found in the Yamnaya people.

Around 14,000 years ago, the Villabruna Cluster shifted away from GoyetQ116-1 affinity and started to show more affinity with the Near East, a shift which coincided with the warming temperatures of the Bølling-Allerød interstadial. This genetic shift shows that Near East populations had probably already begun moving into Europe during the end of the Upper Paleolithic, about 6,000 years earlier than previously thought, before the introduction of farming. A few specimens from the Villabruna Cluster also show genetic affinities for East Asians that are derived from gene flow. The HERC2 variation for blue eyes first appears around 13,000 to 14,000 years ago in Italy and the Caucasus. The light skin pigmentation characteristic of modern Europeans is estimated to have spread across Europe in a "selective sweep" during the Mesolithic (19 to 11 ka). The associated TYRP1 SLC24A5 and SLC45A2 alleles emerge around 19 ka, still during the LGM, most likely in the Caucasus.

Neolithic

Simplified model for the demographic history of Europeans during the Neolithic period in the introduction of agriculture
 
Ancient European Neolithic farmers were genetically closest to modern Near-Eastern/ Anatolian populations. Genetic matrilineal distances between European Neolithic Linear Pottery Culture populations (5,500–4,900 calibrated BC) and modern Western Eurasian populations.

A big cline in genetic variation that has long been recognised in Europe seems to show important dispersals from the direction of the Middle East. This has often been linked to the spread of farming technology during the Neolithic, which has been argued to be one of the most important periods in determining modern European genetic diversity.

The Neolithic started with the introduction of farming, beginning in SE Europe approximately 10,000–3000 BCE, and extending into NW Europe between 4500–1700 BCE. During this era, the Neolithic revolution led to drastic economic as well as socio-cultural changes in Europe and this is also thought to have had a big effect on Europe's genetic diversity, especially concerning genetic lineages entering Europe from the Middle East into the Balkans. There were several phases of this period:

  • In a late European Mesolithic prelude to the Neolithic, it appears that Near Eastern peoples from areas that already had farming, and who also had sea-faring technology, had a transient presence in Greece (for example at Franchthi Cave).
  • There is consensus that agricultural technology and the main breeds of animals and plants which are farmed entered Europe from somewhere in the area of the Fertile Crescent and specifically the Levant region from the Sinai to Southern Anatolia. (Less certainly, this agricultural revolution is sometimes argued to have in turn been partly triggered by movements of people and technology coming across the Sinai from Africa.) For more see Fertile Crescent: Cosmopolitan diffusion.
  • A later stage of the Neolithic, the so-called Pottery Neolithic, saw an introduction of pottery into the Levant, Balkans and Southern Italy (it had been present in the area of modern Sudan for some time before it is found in the Eastern Mediterranean, but it is thought to have developed independently), and this may have also been a period of cultural transfer from the Levant into the Balkans.

An important issue regarding the genetic impact of neolithic technologies in Europe is the manner by which they were transferred into Europe. Farming was introduced by a significant migration of farmers from the Near East (Cavalli-Sforza's biological demic diffusion model) or a "cultural diffusion" or a combination of the two, and population geneticists have tried to clarify whether any genetic signatures of Near Eastern origin correspond to the expansion routes postulated by the archaeological evidence.

Martin Richards estimated that only 11% of European mtDNA is due to immigration in this period, suggesting that farming was spread primarily due to being adopted by indigenous Mesolithic populations, rather than due to immigration from Near East. Gene flow from SE to NW Europe seems to have continued in the Neolithic, the percentage significantly declining towards the British Isles.

Classical genetics also suggested that the largest admixture to the European Paleolithic/Mesolithic stock was due to the Neolithic revolution of the 7th to 5th millennia BCE. Three main mtDNA gene groups have been identified as contributing Neolithic entrants into Europe: J, T1 and U3 (in that order of importance). With others, they amount up to around 20% of the gene pool.

In 2000, Semino's study on Y DNA revealed the presence of haplotypes belonging to the large clade E1b1b1 (E-M35). These were predominantly found in the southern Balkans, southern Italy and parts of Iberia. Semino connected this pattern, along with J haplogroup subclades, to be the Y-DNA component of Cavalli-Sforza's Neolithic demic-diffusion of farmers from the Near East. Rosser et al. rather saw it as a (direct) 'North African component' in European genealogy, although they did not propose a timing and mechanism to account for it. Underhill and Kivisild (2007) also described E1b1b as representing a late-Pleistocene migration from Africa to Europe over the Sinai Peninsula in Egypt, evidence for which does not show up in mitochondrial DNA.

The modern distribution of Y-DNA haplogroups in each European country

Concerning timing the distribution and diversity of V13 however, Battaglia et al. (2008) proposed an earlier movement whereby the E-M78* lineage ancestral to all modern E-V13 men moved rapidly out of a Southern Egyptian homeland and arrived in Europe with only Mesolithic technologies. They then suggest that the E-V13 sub-clade of E-M78 only expanded subsequently as native Balkan 'foragers-cum-farmers' adopted Neolithic technologies from the Near East. They propose that the first major dispersal of E-V13 from the Balkans may have been in the direction of the Adriatic Sea with the Neolithic Impressed Ware culture often referred to as Impressa or Cardial. Peričic et al. (2005), rather propose that the main route of E-V13 spread was along the Vardar-Morava-Danube river 'highway' system.

In contrast to Battaglia, Cruciani et al. (2007) tentatively suggested (i) a different point where the V13 mutation happened on its way from Egypt to the Balkans via the Middle East, and (ii) a later dispersal time. The authors proposed that the V13 mutation first appeared in western Asia, where it is found in low but significant frequencies, whence it entered the Balkans sometime after 11 kYa. It later experienced a rapid dispersal which he dated to c. 5300 years ago in Europe, coinciding with the Balkan Bronze Age. Like Peričic et al. they consider that "the dispersion of the E-V13 and J-M12 haplogroups seems to have mainly followed the river waterways connecting the southern Balkans to north-central Europe".

More recently, Lacan et al. (2011) announced that a 7000-year-old skeleton in a Neolithic context in a Spanish funeral cave, was an E-V13 man. (The other specimens tested from the same site were in haplogroup G2a, which has been found in Neolithic contexts throughout Europe.) Using 7 STR markers, this specimen was identified as being similar to modern individuals tested in Albania, Bosnia, Greece, Corsica, and Provence. The authors therefore proposed that, whether or not the modern distribution of E-V13 of today is a result of more recent events, E-V13 was already in Europe within the Neolithic, carried by early farmers from the Eastern Mediterranean to the Western Mediterranean, much earlier than the Bronze Age. This supports the proposals of Battaglia et al. rather than Cruciani et al. at least concerning earliest European dispersals, but E-V13 may have dispersed more than once. Even more recent than the Bronze Age, it has also been proposed that modern E-V13's modern distribution in Europe is at least partly caused by Roman era movements of people.

The migration of Neolithic farmers into Europe brought along several new adaptations. The variation for light skin colour was introduced to Europe by the neolithic farmers. After the arrival of the neolithic farmers, a SLC22A4 mutation was selected for, a mutation which probably arose to deal with ergothioneine deficiency but increases the risk of ulcerative colitis, coeliac disease, and irritable bowel syndrome.

Bronze Age

The Bronze Age saw the development of long-distance trading networks, particularly along the Atlantic Coast and in the Danube valley. There was migration from Norway to Orkney and Shetland in this period (and to a lesser extent to mainland Scotland and Ireland). There was also migration from Germany to eastern England. Martin Richards estimated that there was about 4% mtDNA immigration to Europe in the Bronze Age.

Scheme of Indo-European migrations from ca. 4000 to 1000 BC according to the Kurgan hypothesis

Another theory about the origin of the Indo-European language centres around a hypothetical Proto-Indo-European people, who, according to the Kurgan hypothesis, can be traced to north of the Black and Caspian Seas at about 4500 BCE. They domesticated the horse and possibly invented the wooden disk wheel, and are considered to have spread their culture and genes across Europe. The Y haplogroup R1a is a proposed marker of these "Kurgan" genes, as is the Y Haplogroup R1b, although these haplogroups as a whole may be much older than the language family.

In the far north, carriers of the Y-haplogroup N arrived to Europe from Siberia, eventually expanding as far as Finland, though the specific timing of their arrival is uncertain. The most common North European subclade N1c1 is estimated to be around 8,000 years old. There is evidence of human settlement in Finland dating back to 8500 BCE, linked with the Kunda culture and its putative ancestor, the Swiderian culture, but the latter is thought to have a European origin. The geographical spread of haplogroup N in Europe is well aligned with the Pit–Comb Ware culture, whose emergence is commonly dated c. 4200 BCE, and with the distribution of Uralic languages. Mitochondrial DNA studies of Sami people, haplogroup U5 are consistent with multiple migrations to Scandinavia from Volga-Ural region, starting 6,000 to 7,000 years before present.

The relationship between roles of European and Asian colonists in the prehistory of Finland is a point of some contention, and some scholars insist that Finns are "predominantly Eastern European and made up of people who trekked north from the Ukrainian refuge during the Ice Age". Farther east, the issue is less contentious. Haplogroup N carriers account for a significant part of all non-Slavic ethnic groups in northern Russia, including 37% of Karelians, 35% of Komi people (65% according to another study), 67% of Mari people, as many as 98% of Nenets people, 94% of Nganasans, and 86% to 94% of Yakuts.

The Yamnaya component contains partial ancestry from an Ancient North Eurasian component first identified in Mal'ta. According to Iosif Lazaridis, "the Ancient North Eurasian ancestry is proportionally the smallest component everywhere in Europe, never more than 20 percent, but we find it in nearly every European group we’ve studied." This genetic component does not come directly from the Mal'ta lineage itself, but a related lineage that separated from the Mal'ta lineage.

Up to a half of the Yamnaya component may have come from a Caucasus hunter-gatherer strand. On November 16, 2015, in a study published in the journal Nature Communications, geneticists announced that they had found a new fourth ancestral "tribe" or "strand" which had contributed to the modern European gene pool. They analysed genomes from two hunter-gatherers from Georgia which were 13,300 and 9,700 years old, and found that these Caucasus hunter-gatherers were probably the source of the farmer-like DNA in the Yamna. According to co-author Dr Andrea Manica of the University of Cambridge: "The question of where the Yamnaya come from has been something of a mystery up to now....we can now answer that as we've found that their genetic make-up is a mix of Eastern European Hunter-Gatherers and a population from this pocket of Caucasus hunter-gatherers who weathered much of the last Ice Age in apparent isolation."

According to Lazaridis et al. (2016), a population related to the people of the Chalcolithic Iran contributed to roughly half of the ancestry of Yamnaya populations of the Pontic–Caspian steppe. These Iranian Chalcolithic people were a mixture of "the Neolithic people of western Iran, the Levant, and Caucasus Hunter Gatherers."

The genetic variations for lactase persistence and greater height came with the Yamnaya people. The derived allele of the KITLG gene (SNP rs12821256) that is associated with – and likely causal for – blond hair in Europeans is found in populations with Eastern but not Western Hunter-Gatherers ancestry, suggesting that its origin is in the Ancient North Eurasian (ANE) population and may have been spread in Europe by individuals with steppe ancestry. Consistent with this, the earliest known individual with the derived allele is an ANE individual from the Late Upper Paleolithic Afontova Gora archaeological complex in central Siberia.

Recent history

Overview map of recent (1st to 17th centuries AD) admixture events in Europe

During the period of the Roman Empire, historical sources show that there were many movements of people around Europe, both within and outside the Empire. Historic sources sometimes cite instances of genocide inflicted by the Romans upon rebellious provincial tribes. If this did in fact occur, it would have been limited given that modern populations show considerable genetic continuity in their respective regions. The process of 'Romanisation' appears to have been accomplished by the colonisation of provinces by a few Latin speaking administrators, military personnel, settled veterans, and private citizens (merchants, traders) who emanated from the Empire's various regions (and not merely from Roman Italy). They served as a nucleus for the acculturation of local notables.

Given their small numbers and varied origins, Romanization does not appear to have left distinct genetic signatures in Europe. Indeed, Romance-speaking populations in the Balkans, like Romanians, Aromanians, Moldovans, etc. have been found to genetically resemble neighbouring Greek and South Slavic-speaking peoples rather than modern Italians, proving that they were genetically speaking, mainly through I2a2 M-423 and E1b1b1, V-13 Haplogroups native to this area.

Steven Bird has speculated that E1b1b1a was spread during the Roman era through Thracian and Dacian populations from the Balkans into the rest of Europe.

Concerning the late Roman period of (not only) Germanic "Völkerwanderung", some suggestions have been made, at least for Britain, with Y haplogroup I1a being associated with Anglo-Saxon immigration in eastern England, and R1a being associated with Norse immigration in northern Scotland.

Genetics of modern European populations

Patrilineal studies

There are four main Y-chromosome DNA haplogroups that account for most of Europe's patrilineal descent.

  • Haplogroup I is found in the form of various sub-clades throughout Europe and is found at highest frequencies in the Nordic Countries as I1 (Norway, Denmark, Sweden, Finland) and in the Balkan Peninsula as I2a (Bosnia and Herzegovina 65%, Croatia and Serbia). I1 is also frequent in Germany, Great Britain and Netherlands, while I2a is frequent also in Sardinia, Romania/Moldova, Bulgaria and Ukraine. This clade is found at its highest expression by far in Europe and may have been there since before the LGM.
  • Haplogroup E1b1b (formerly known as E3b) represents the last major direct migration from Africa into Europe. It is believed to have first appeared in the Horn of Africa approximately 26,000 years ago and dispersed to North Africa and the Near East during the late Paleolithic and Mesolithic periods. E1b1b lineages are closely linked to the diffusion of Afroasiatic languages. Although present throughout Europe, it peaks in the western Balkan region amongst Albanians and their neighbors. It is also common in Italy and the Iberian peninsula. Haplogroup E1b1b1, mainly in the form of its E1b1b1a2 (E-V13) sub-clade, reaches frequencies above 47% around the area of Kosovo. This clade is thought to have arrived in Europe from western Asia either in the later Mesolithic, or the Neolithic. North Africa subclade E-M81 is also present in Sicily and Andalusia.
  • Haplogroup R1b is common all over Europe, with R1b1a1a2 especially common in Western Europe. Nearly all of this R1b in Europe is in the form of the R1b1a2 (2011 name) (R-M269) sub-clade, specifically within the R-L23 sub-sub-clade whereas R1b found in Central Asia, western Asia and Africa tends to be in other clades. It has also been pointed out that outlier types are present in Europe and are particularly notable in some areas such as Sardinia and Armenia. Haplogroup R1b frequencies vary from highs in western Europe in a steadily decreasing cline with growing distance from the Atlantic: 80–90% (Welsh, Basque, Irish, Scots, Bretons) around 70–80% in other areas of Spain, Britain and France and around 40–60% in most other parts of western Europe like eastern Germany, and northern-central Italy. It drops outside this area and is around 30% or less in areas such as southern Italy, Sweden, Poland, the Balkans and Cyprus. R1b remains the most common clade as one moves east to Germany, while farther east, in Poland, R1a is more common (see below). In southeastern Europe, R1b drops behind R1a in the area in and around Hungary and Serbia but is more common both to the south and north of this region. R1b in Western Europe is dominated by at least two sub-clades, R-U106, which is distributed from the east side of the Rhine into northern and central Europe (with a strong presence in England) and R-P312, which is most common west of the Rhine, including the British Isles. Some have posited that this haplogroup's presence in Europe dates back to the LGM, while others link it to the spread of the Centum branch of the Indo-European languages.
The oldest human remains found to carry R1b so far are an individual from the Epigravettian culture context in Italy (Villabruna) who lived c. 12,000 BCE and reportedly belonged to R1b1a (L754), and the 7,000 year-old remains of a hunter-gatherer, belonging to the Samara culture of the Volga River area who carried R1b1* (R-L278*).
  • Haplogroup R1a, almost entirely in the R1a1a sub-clade, is prevalent in much of Eastern and Central Europe (also in South and Central Asia). For example, there is a sharp increase in R1a1 and decrease in R1b1b2 as one goes east from Germany to Poland. It also has a substantial presence in Scandinavia (particularly Norway). In the Baltic countries R1a frequencies decrease from Lithuania (45%) to Estonia (around 30%).

Putting aside small enclaves, there are also several haplogroups apart from the above four that are less prominent or most common only in certain areas of Europe.

  • Haplogroup G, the original Neolithic Europeans (Caucasians), is common in most parts of Europe at a low frequency, reaching peaks above 70% around Georgia and among the Madjars (although living in Asia they border the eastern perimeter of Europe), up to 10% in Sardinia, 12% in Corsica and Uppsala (Sweden), 11% in the Balkans and Portugal, 10% in Spain and 9% in European Russia. This clade is also found in the Near East.
  • Haplogroup N, is common only in the northeast of Europe and in the form of its N1c1 sub-clade reaches frequencies of approximately 60% among Finns and approximately 40% among Estonians, Latvians, and Lithuanians.
  • Haplogroup J2, in various sub-clades (J2a, J2b), is found in levels of around 15–30% in parts of the Balkans and Italy and is common all over Europe and especially the Mediterranean basin

Matrilineal studies

There have been a number of studies about the mitochondrial DNA haplogroups (mtDNA) in Europe. In contrast to Y DNA haplogroups, mtDNA haplogroups did not show as much geographical patterning, but were more evenly ubiquitous. Apart from the outlying Saami, all Europeans are characterised by the predominance of haplogroups H, U and T. The lack of observable geographic structuring of mtDNA may be due to socio-cultural factors, namely the phenomena of polygyny and patrilocality.

Genetic studies suggest some maternal gene flow to eastern Europe from eastern Asia or southern Siberia 13,000 – 6,600 years BP. Analysis of Neolithic skeletons in the Great Hungarian Plain found a high frequency of eastern Asian mtDNA haplogroups, some of which survive in modern eastern European populations. Maternal gene flow to Europe from sub-Saharan Africa began as early as 11,000 years BP, although the majority of lineages, approximately 65%, are estimated to have arrived more recently, including during the Romanization period, the Arab conquests of southern Europe, and during the Atlantic slave trade.

European population sub-structure

Genetically, Europe is relatively homogeneous, but distinct sub-population patterns of various types of genetic markers have been found, particularly along a southeast-northwest cline. For example, Cavalli-Sforza's principal component analyses revealed five major clinal patterns throughout Europe, and similar patterns have continued to be found in more recent studies.

  1. A cline of genes with highest frequencies in the Middle East, spreading to lowest levels northwest. Cavalli-Sforza originally described this as faithfully reflecting the spread of agriculture in Neolithic times. This has been the general tendency in interpretation of all genes with this pattern.
  2. A cline of genes with highest frequencies among Finnish and Sami in the extreme north east, and spreading to lowest frequencies in the south west.
  3. A cline of genes with highest frequencies in the area of the lower Don and Volga rivers in southern Russia, and spreading to lowest frequencies in Spain, Southern Italy, Greece and the areas inhabited by Saami speakers in the extreme north of Scandinavia. Cavalli-Sforza associated this with the spread of Indo-European languages, which he links in turn to a "secondary expansion" after the spread of agriculture, associated with animal grazing.
  4. A cline of genes with highest frequencies in the Balkans and Southern Italy, spreading to lowest levels in Britain and the Basque country. Cavalli-Sforza associates this with "the Greek expansion, which reached its peak in historical times around 1000 and 500 BCE but which certainly began earlier".
  5. A cline of genes with highest frequencies in the Basque country, and lower levels beyond the area of Iberia and Southern France. In perhaps the most well-known conclusion from Cavalli-Sforza, this weakest of the five patterns was described as isolated remnants of the pre-Neolithic population of Europe, "who at least partially withstood the expansion of the cultivators". It corresponds roughly to the geographical spread of rhesus negative blood types. In particular, the conclusion that the Basques are a genetic isolate has become widely discussed, but also a controversial conclusion.

He also created a phylogenetic tree to analyse the internal relationships among Europeans. He found four major 'outliers'- Basques, Sami, Sardinians and Icelanders; a result he attributed to their relative isolation (note: the Icelanders and the Sardinians speak Indo-European languages, while the other two groups do not). Greeks and Yugoslavs represented a second group of less extreme outliers. The remaining populations clustered into several groups : "Celtic", "Germanic", "south-western Europeans", "Scandinavians" and "eastern Europeans".

A study in May 2009 of 19 populations from Europe using 270,000 SNPs highlighted the genetic diversity of European populations corresponding to the northwest to southeast gradient and distinguished "four several distinct regions" within Europe:

In this study, barrier analysis revealed "genetic barriers" between Finland, Italy and other countries and that barriers could also be demonstrated within Finland (between Helsinki and Kuusamo) and Italy (between northern and southern part, Fst=0.0050). Fst (Fixation index) was found to correlate considerably with geographic distances ranging from ≤0.0010 for neighbouring populations to 0.0200–0.0230 for Southern Italy and Finland. For comparisons, pair-wise Fst of non-European samples were as follows: Europeans – Africans (Yoruba) 0.1530; Europeans – Chinese 0.1100; Africans (Yoruba) – Chinese 0.1900.

A study by Chao Tian in August 2009 extended the analysis of European population genetic structure to include additional southern European groups and Arab populations (Palestinians, Druzes...) from the Near-East. This study determined autosomal Fst between 18 population groups and concluded that, in general, genetic distances corresponded to geographical relationships with smaller values between population groups with origins in neighbouring countries/regions (for example, Greeks/Tuscans: Fst=0.0010, Greeks/Palestinians: Fst=0.0057) compared with those from very different regions in Europe (for example Greeks/Swedish: Fst=0.0087, Greeks/Russians: Fst=0.0108).

Autosomal DNA

Seldin (2006) used over 5,000 autosomal SNPs. It showed "a consistent and reproducible distinction between ‘northern’ and ‘southern’ European population groups". Most individual participants with southern European ancestry (Italians, Greeks, Portuguese, Spaniards), and Ashkenazi Jews have >85% membership in the southern population; and most northern, western, central, and eastern Europeans (Swedes, English, Irish, Germans, and Ukrainians) have >90% in the northern population group. However, many of the participants in this study were actually American citizens who self-identified with different European ethnicities based on self-reported familial pedigree.

A similar study in 2007 using samples predominantly from Europe found that the most important genetic differentiation in Europe occurs on a line from the north to the south-east (northern Europe to the Balkans), with another east-west axis of differentiation across Europe. Its findings were consistent with earlier results based on mtDNA and Y-chromosomal DNA that support the theory that modern Iberians (Spanish and Portuguese) hold the most ancient European genetic ancestry, as well as separating Basques and Sami from other European populations.

It suggested that the English and Irish cluster with other Northern and Eastern Europeans such as Germans and Poles, while some Basque and Italian individuals also clustered with Northern Europeans. Despite these stratifications, it noted that "there is low apparent diversity in Europe with the entire continent-wide samples only marginally more dispersed than single population samples elsewhere in the world".

In 2008, two international research teams published analyses of large-scale genotyping of large samples of Europeans, using over 300,000 autosomal SNPs. With the exception of usual isolates such as Basques, Finns and Sardinians, the European population lacked sharp discontinuities (clustering) as previous studies have found (see Seldin et al. 2006 and Bauchet et al. 2007), although there was a discernible south to north gradient. Overall, they found only a low level of genetic differentiation between subpopulations, and differences which did exist were characterised by a strong continent-wide correlation between geographic and genetic distance. In addition, they found that diversity was greatest in southern Europe due a larger effective population size and/or population expansion from southern to northern Europe. The researchers take this observation to imply that genetically, Europeans are not distributed into discrete populations.

A study on north-eastern populations, published in March 2013, found that Komi peoples formed a pole of genetic diversity that is distinct from other populations.

Autosomal genetic distances (Fst) based on SNPs (2009)

The genetic distance between populations is often measured by Fixation index (Fst), based on genetic polymorphism data, such as single-nucleotide polymorphisms (SNPs) or microsatellites. Fst is a special case of F-statistics, the concept developed in the 1920s by Sewall Wright. Fst is simply the correlation of randomly chosen alleles within the same sub-population relative to that found in the entire population. It is often expressed as the proportion of genetic diversity due to allele frequency differences among populations.

The values range from 0 to 1. A zero value implies that the two populations are panmictic, that they are interbreeding freely. A value of one would imply that the two populations are completely separate. The greater the Fst value, the greater the genetic distance. Essentially, these low Fst values suggest that the majority of genetic variation is at the level of individuals within the same population group (~ 85%); whilst belonging to a different population group within same ‘race’/ continent, and even to different racial/ continental groups added a much smaller degree of variation (3–8%; 6–11%, respectively).

Pleiades in folklore and literature

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Ple...