Search This Blog

Thursday, April 28, 2022

Ethnic groups in Europe

From Wikipedia, the free encyclopedia

Europeans are the focus of European ethnology, the field of anthropology related to the various indigenous groups that reside in the states of Europe. Groups may be defined by common genetic ancestry, common language, or both. Pan and Pfeil (2004) count 87 distinct "peoples of Europe", of which 33 form the majority population in at least one sovereign state, while the remaining 54 constitute ethnic minorities. The total number of national minority populations in Europe is estimated at 105 million people, or 14% of 770 million Europeans. The Russians are the most populous among Europeans, with a population over 134 million. There are no universally accepted and precise definitions of the terms "ethnic group" and "nationality". In the context of European ethnography in particular, the terms ethnic group, people, nationality and ethno-linguistic group, are used as mostly synonymous, although preference may vary in usage with respect to the situation specific to the individual countries of Europe.

Overview

About 20–25 million residents (3%) are members of diasporas of non-European origin. The population of the European Union, with some 450 million residents, accounts for two thirds of the current European population.

Both Spain and the United Kingdom are special cases, in that the designation of nationality, Spanish and British, may controversially take ethnic aspects, subsuming various regional ethnic groups (see nationalisms and regionalisms of Spain and native populations of the United Kingdom). Switzerland is a similar case, but the linguistic subgroups of the Swiss are discussed in terms of both ethnicity and language affiliations.

Linguistic classifications

Distribution of major languages of Europe

Of the total population of Europe of some 740 million (as of 2010), close to 90% (or some 650 million) fall within three large branches of Indo-European languages, these being;

Three stand-alone Indo-European languages do not fall within larger sub-groups and are not closely related to those larger language families;

In addition, there are also smaller sub-groups within the Indo-European languages of Europe, including;

Besides the Indo-European languages, there are other language families on the European continent which are considered unrelated to Indo-European:

History

Prehistoric populations

Simplified model for the demographic history of Europeans during the Neolithic period and the introduction of agriculture.

The Basques have been found to descend from the population of the late Neolithic or early Bronze Age directly. By contrast, Indo-European groups of Europe (the Centum, Balto-Slavic, and Albanian groups) migrated throughout most of Europe from the Pontic steppe. They are assumed to have developed in situ through admixture of earlier Mesolithic and Neolithic populations with Bronze Age, proto-Indo-Europeans. The Finnic peoples are assumed to also be descended from Proto-Uralic populations further to the east, nearer to the Ural Mountains, that had migrated to their historical homelands in Europe by about 3,000 years ago. A more recent study in 2019 found that proto-Uralic may have originated further East in Siberia among an East Asian-related populations. The authors link the arrival of Uralic languages to the arrival of Siberian-like ancestry to the Baltic region.

Reconstructed languages of Iron Age Europe include Proto-Celtic, Proto-Italic and Proto-Germanic, all of these Indo-European languages of the centum group, and Proto-Slavic and Proto-Baltic, of the satem group. A group of Tyrrhenian languages appears to have included Etruscan, Rhaetian, Lemnian, and perhaps Camunic. A pre-Roman stage of Proto-Basque can only be reconstructed with great uncertainty.

Regarding the European Bronze Age, the only relatively likely reconstruction is that of Proto-Greek (ca. 2000 BC). A Proto-Italo-Celtic ancestor of both Italic and Celtic (assumed for the Bell beaker period), and a Proto-Balto-Slavic language (assumed for roughly the Corded Ware horizon) has been postulated with less confidence. Old European hydronymy has been taken as indicating an early (Bronze Age) Indo-European predecessor of the later centum languages.

According to geneticist David Reich, based on ancient human genomes that his laboratory sequenced in 2016, Europeans descend from a mixture of four distinct ancestral components.

Historical populations

Map of the Roman Empire and barbarian tribes in 125 AD.

Iron Age (pre-Great Migrations) populations of Europe known from Greco-Roman historiography, notably Herodotus, Pliny, Ptolemy and Tacitus:

Historical immigration

Map showing the distribution of Slavic tribes between the 7th–9th centuries AD.

Ethno-linguistic groups that arrived from outside Europe during historical times are:

History of European ethnography

Europa Regina (Representation of Europe printed by Sebastian Munster (1570).
 
Ethnographic map of Europe, The Times Atlas (1896).

The earliest accounts of European ethnography date from Classical Antiquity. Herodotus described the Scythians and Thraco-Illyrians. Dicaearchus gave a description of Greece itself, besides accounts of western and northern Europe. His work survives only fragmentarily, but was received by Polybius and others.

Roman Empire period authors include Diodorus Siculus, Strabo and Tacitus. Julius Caesar gives an account of the Celtic tribes of Gaul, while Tacitus describes the Germanic tribes of Magna Germania. A number of authors like Diodorus Siculus, Pausanias and Sallust depict the ancient Sardinian and Corsican peoples.

The 4th century Tabula Peutingeriana records the names of numerous peoples and tribes. Ethnographers of Late Antiquity such as Agathias of Myrina Ammianus Marcellinus, Jordanes and Theophylact Simocatta give early accounts of the Slavs, the Franks, the Alamanni and the Goths.

Book IX of Isidore's Etymologiae (7th century) treats de linguis, gentibus, regnis, militia, civibus (concerning languages, peoples, realms, war and cities). Ahmad ibn Fadlan in the 10th century gives an account of the Bolghar and the Rus' peoples. William Rubruck, while most notable for his account of the Mongols, in his account of his journey to Asia also gives accounts of the Tatars and the Alans. Saxo Grammaticus and Adam of Bremen give an account of pre-Christian Scandinavia. The Chronicon Slavorum (12th century) gives an account of the northwestern Slavic tribes.

Gottfried Hensel in his 1741 Synopsis Universae Philologiae published one of the earliest ethno-linguistic map of Europe, showing the beginning of the pater noster in the various European languages and scripts. In the 19th century, ethnicity was discussed in terms of scientific racism, and the ethnic groups of Europe were grouped into a number of "races", Mediterranean, Alpine and Nordic, all part of a larger "Caucasian" group.

The beginnings of ethnic geography as an academic subdiscipline lie in the period following World War I, in the context of nationalism, and in the 1930s exploitation for the purposes of fascist and Nazi propaganda, so that it was only in the 1960s that ethnic geography began to thrive as a bona fide academic subdiscipline.

The origins of modern ethnography are often traced to the work of Bronisław Malinowski, who emphasized the importance of fieldwork. The emergence of population genetics further undermined the categorisation of Europeans into clearly defined racial groups. A 2007 study on the genetic history of Europe found that the most important genetic differentiation in Europe occurs on a line from the north to the south-east (northern Europe to the Balkans), with another east–west axis of differentiation across Europe, separating the indigenous Basques, Sardinians and Sami from other European populations. Despite these stratifications it noted the unusually high degree of European homogeneity: "there is low apparent diversity in Europe with the entire continent-wide samples only marginally more dispersed than single population samples elsewhere in the world."

Minorities

Gagauz people in Moldova
 
Sámi family in Lapland of Finland, 1936.

The total number of national minority populations in Europe is estimated at 105 million people, or 14% of Europeans.

The member states of the Council of Europe in 1995 signed the Framework Convention for the Protection of National Minorities. The broad aims of the Convention are to ensure that the signatory states respect the rights of national minorities, undertaking to combat discrimination, promote equality, preserve and develop the culture and identity of national minorities, guarantee certain freedoms in relation to access to the media, minority languages and education and encourage the participation of national minorities in public life. The Framework Convention for the Protection of National Minorities defines a national minority implicitly to include minorities possessing a territorial identity and a distinct cultural heritage. By 2008, 39 member states had signed and ratified the Convention, with the notable exception of France.

Indigenous minorities

Various European ethnic groups have lived there for millennia, however, the UN recognizes very few indigenous populations of Europe, which are confined to the far north and far east of the continent.

Notable indigenous minority populations in Europe that are recognized by the UN include the Uralic Nenets, Samoyed, and Komi peoples of northern Russia; Circassians of southern Russia and the North Caucasus; Crimean Tatars, Krymchaks and Crimean Karaites of Crimea in Ukraine; Sámi peoples of northern Norway, Sweden, and Finland and northwestern Russia (in an area also referred to as Sápmi); Basques of Basque Country, Spain and southern France; and the Sorbian people of Germany and Poland.

Non-indigenous minorities

Expulsions of Jews in Europe from 1100 to 1600

Many non-European ethnic groups and nationalities have migrated to Europe over the centuries. Some arrived centuries ago. However, the vast majority arrived more recently, mostly in the 20th and 21st centuries. Often, they come from former colonies of the British, Dutch, French, Portuguese and Spanish empires.

European identity

Historical

Personifications of Sclavinia, Germania, Gallia, and Roma, bringing offerings to Otto III; from a gospel book dated 990.

Medieval notions of a relation of the peoples of Europe are expressed in terms of genealogy of mythical founders of the individual groups. The Europeans were considered the descendants of Japheth from early times, corresponding to the division of the known world into three continents, the descendants of Shem peopling Asia and those of Ham peopling Africa. Identification of Europeans as "Japhetites" is also reflected in early suggestions for terming the Indo-European languages "Japhetic".

In this tradition, the Historia Brittonum (9th century) introduces a genealogy of the peoples of the Migration Period based on the sixth-century Frankish Table of Nations as follows,

The first man that dwelt in Europe was Alanus, with his three sons, Hisicion, Armenon, and Neugio. Hisicion had four sons, Francus, Romanus, Alamanus, and Bruttus. Armenon had five sons, Gothus, Valagothus, Cibidus, Burgundus, and Longobardus. Neugio had three sons, Vandalus, Saxo, and Boganus.
From Hisicion arose four nations—the Franks, the Latins, the Germans, and Britons; from Armenon, the Gothi, Valagothi, Cibidi, Burgundi, and Longobardi; from Neugio, the Bogari, Vandali, Saxones, and Tarincgi. The whole of Europe was subdivided into these tribes.[60]

The text goes then on to list the genealogy of Alanus, connecting him to Japheth via eighteen generations.

European culture

European culture is largely rooted in what is often referred to as its "common cultural heritage". Due to the great number of perspectives which can be taken on the subject, it is impossible to form a single, all-embracing conception of European culture. Nonetheless, there are core elements which are generally agreed upon as forming the cultural foundation of modern Europe. One list of these elements given by K. Bochmann includes:

Berting says that these points fit with "Europe's most positive realisations". The concept of European culture is generally linked to the classical definition of the Western world. In this definition, Western culture is the set of literary, scientific, political, artistic and philosophical principles which set it apart from other civilizations. Much of this set of traditions and knowledge is collected in the Western canon. The term has come to apply to countries whose history has been strongly marked by European immigration or settlement during the 18th and 19th centuries, such as the Americas, and Australasia, and is not restricted to Europe.

Religion

Eurobarometer Poll 2005 chart results

Since the High Middle Ages, most of Europe has been dominated by Christianity. There are three major denominations: Roman Catholic, Protestant and Eastern Orthodox, with Protestantism restricted mostly to Northern Europe, and Orthodoxy to East and South Slavic regions, Romania, Moldova, Greece, and Georgia. The Armenian Apostolic Church, part of the Oriental Church, is also in Europe – another branch of Christianity (world's oldest National Church). Catholicism, while typically centered in Western Europe, also has a very significant following in Central Europe (especially among the Germanic, Western Slavic and Hungarian peoples/regions) as well as in Ireland (with some in Great Britain).

Christianity has been the dominant religion shaping European culture for at least the last 1700 years. Modern philosophical thought has very much been influenced by Christian philosophers such as St Thomas Aquinas and Erasmus. And throughout most of its history, Europe has been nearly equivalent to Christian culture, The Christian culture was the predominant force in western civilization, guiding the course of philosophy, art, and science. The notion of "Europe" and the "Western World" has been intimately connected with the concept of "Christianity and Christendom" many even attribute Christianity for being the link that created a unified European identity.

Christianity is still the largest religion in Europe; according to a 2011 survey, 76.2% of Europeans considered themselves Christians. Also according to a study on Religiosity in the European Union in 2012, by Eurobarometer, Christianity is the largest religion in the European Union, accounting for 72% of the EU's population. As of 2010 Catholics were the largest Christian group in Europe, accounting for more than 48% of European Christians. The second-largest Christian group in Europe were the Orthodox, who made up 32% of European Christians. About 19% of European Christians were part of the Protestant tradition. Russia is the largest Christian country in Europe by population, followed by Germany and Italy.

Islam has some tradition in the Balkans and the Caucasus due to conquest and colonization from the Ottoman Empire in the 16th to 19th centuries, as well as earlier though discontinued long-term presence in much of Iberia as well as Sicily. Muslims account for the majority of the populations in Albania, Azerbaijan, Kosovo, Northern Cyprus (controlled by Turks), and Bosnia and Herzegovina. Significant minorities are present in the rest of Europe. Russia also has one of the largest Muslim communities in Europe, including the Tatars of the Middle Volga and multiple groups in the Caucasus, including Chechens, Avars, Ingush and others. With 20th-century migrations, Muslims in Western Europe have become a noticeable minority. According to the Pew Forum, the total number of Muslims in Europe in 2010 was about 44 million (6%), while the total number of Muslims in the European Union in 2007 was about 16 million (3.2%).

Judaism has a long history in Europe, but is a small minority religion, with France (1%) the only European country with a Jewish population in excess of 0.5%. The Jewish population of Europe is composed primarily of two groups, the Ashkenazi and the Sephardi. Ancestors of Ashkenazi Jews likely migrated to Central Europe at least as early as the 8th century, while Sephardi Jews established themselves in Spain and Portugal at least one thousand years before that. Jews originated in the Levant where they resided for thousands of years until the 2nd century AD, when they spread around the Mediterranean and into Europe, although small communities were known to exist in Greece as well as the Balkans since at least the 1st century BC. Jewish history was notably affected by the Holocaust and emigration (including Aliyah, as well as emigration to America) in the 20th century. The Jewish population of Europe in 2010 was estimated to be approximately 1.4 million (0.2% of European population) or 10% of the world's Jewish population. In the 21st century, France has the largest Jewish population in Europe, followed by the United Kingdom, Germany, Russia and Ukraine.

In modern times, significant secularization since the 20th century, notably in secularist France, Estonia and the Czech Republic. Currently, distribution of theism in Europe is very heterogeneous, with more than 95% in Poland, and less than 20% in the Czech Republic and Estonia. The 2005 Eurobarometer poll found that 52% of EU citizens believe in God. According to a Pew Research Center Survey in 2012 the Religiously Unaffiliated (Atheists and Agnostics) make up about 18.2% of the European population in 2010. According to the same Survey the Religiously Unaffiliated make up the majority of the population in only two European countries: Czech Republic (76%) and Estonia (60%).

Pan-European identity

"Pan-European identity" or "Europatriotism" is an emerging sense of personal identification with Europe, or the European Union as a result of the gradual process of European integration taking place over the last quarter of the 20th century, and especially in the period after the end of the Cold War, since the 1990s. The foundation of the OSCE following the 1990s Paris Charter has facilitated this process on a political level during the 1990s and 2000s.

From the later 20th century, 'Europe' has come to be widely used as a synonym for the European Union even though there are millions of people living on the European continent in non-EU member states. The prefix pan implies that the identity applies throughout Europe, and especially in an EU context, and 'pan-European' is often contrasted with national identity.

Genome-wide association study

From Wikipedia, the free encyclopedia

In genomics, a genome-wide association study (GWA study, or GWAS), also known as whole genome association study (WGA study, or WGAS), is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

Manhattan plot of a GWAS
An illustration of a Manhattan plot depicting several strongly associated risk loci. Each dot represents a SNP, with the X-axis showing genomic location and Y-axis showing association level. This example is taken from a GWA study investigating kidney stone disease, so the peaks indicate genetic variants that are found more often in individuals with kidney stones.

When applied to human data, GWA studies compare the DNA of participants having varying phenotypes for a particular trait or disease. These participants may be people with a disease (cases) and similar people without the disease (controls), or they may be people with different phenotypes for a particular trait, for example blood pressure. This approach is known as phenotype-first, in which the participants are classified first by their clinical manifestation(s), as opposed to genotype-first. Each person gives a sample of DNA, from which millions of genetic variants are read using SNP arrays. If one type of the variant (one allele) is more frequent in people with the disease, the variant is said to be associated with the disease. The associated SNPs are then considered to mark a region of the human genome that may influence the risk of disease.

GWA studies investigate the entire genome, in contrast to methods that specifically test a small number of pre-specified genetic regions. Hence, GWAS is a non-candidate-driven approach, in contrast to gene-specific candidate-driven studies. GWA studies identify SNPs and other variants in DNA associated with a disease, but they cannot on their own specify which genes are causal.

The first successful GWAS published in 2002 studied myocardial infarction. This study design was then implemented in the landmark GWA 2005 study investigating patients with age-related macular degeneration, and found two SNPs with significantly altered allele frequency compared to healthy controls. As of 2017, over 3,000 human GWA studies have examined over 1,800 diseases and traits, and thousands of SNP associations have been found. Except in the case of rare genetic diseases, these associations are very weak, but while they may not explain much of the risk, they provide insight into genes and pathways that can be important.

Background

GWA studies typically identify common variants with small effect sizes (lower right).

Any two human genomes differ in millions of different ways. There are small variations in the individual nucleotides of the genomes (SNPs) as well as many larger variations, such as deletions, insertions and copy number variations. Any of these may cause alterations in an individual's traits, or phenotype, which can be anything from disease risk to physical properties such as height. Around the year 2000, prior to the introduction of GWA studies, the primary method of investigation was through inheritance studies of genetic linkage in families. This approach had proven highly useful towards single gene disorders. However, for common and complex diseases the results of genetic linkage studies proved hard to reproduce. A suggested alternative to linkage studies was the genetic association study. This study type asks if the allele of a genetic variant is found more often than expected in individuals with the phenotype of interest (e.g. with the disease being studied). Early calculations on statistical power indicated that this approach could be better than linkage studies at detecting weak genetic effects.

In addition to the conceptual framework several additional factors enabled the GWA studies. One was the advent of biobanks, which are repositories of human genetic material that greatly reduced the cost and difficulty of collecting sufficient numbers of biological specimens for study. Another was the International HapMap Project, which, from 2003 identified a majority of the common SNPs interrogated in a GWA study. The haploblock structure identified by HapMap project also allowed the focus on the subset of SNPs that would describe most of the variation. Also the development of the methods to genotype all these SNPs using genotyping arrays was an important prerequisite.

Methods

Example calculation illustrating the methodology of a case-control GWA study. The allele count of each measured SNP is evaluated—in this case with a chi-squared test—to identify variants associated with the trait in question. The numbers in this example are taken from a 2007 study of coronary artery disease (CAD) that showed that the individuals with the G-allele of SNP1 (rs1333049) were overrepresented amongst CAD-patients.
 
Illustration of a simulated genotype by phenotype regression for a single SNP. Each dot represents an individual. A GWAS of a continuous trait essentially consists of repeating this analysis at each SNP.

The most common approach of GWA studies is the case-control setup, which compares two large groups of individuals, one healthy control group and one case group affected by a disease. All individuals in each group are genotyped for the majority of common known SNPs. The exact number of SNPs depends on the genotyping technology, but are typically one million or more. For each of these SNPs it is then investigated if the allele frequency is significantly altered between the case and the control group. In such setups, the fundamental unit for reporting effect sizes is the odds ratio. The odds ratio is the ratio of two odds, which in the context of GWA studies are the odds of case for individuals having a specific allele and the odds of case for individuals who do not have that same allele.

Example: suppose that there are two alleles, T and C. The number of individuals in the case group having allele T is represented by 'A' and the number of individuals in the control group having allele T is represented by 'B'. Similarly, the number of individuals in the case group having allele C is represented by 'X' and the number of individuals in the control group having allele C is represented by 'Y'. In this case the odds ratio for allele T is A:B (meaning 'A to B', in standard odds terminology) divided by X:Y, which in mathematical notation is simply (A/B)/(X/Y).

When the allele frequency in the case group is much higher than in the control group, the odds ratio is higher than 1, and vice versa for lower allele frequency. Additionally, a P-value for the significance of the odds ratio is typically calculated using a simple chi-squared test. Finding odds ratios that are significantly different from 1 is the objective of the GWA study because this shows that a SNP is associated with disease. Because so many variants are tested, it is standard practice to require the p-value to be lower than 5×10−8 to consider a variant significant.

Variations on the case-control approach. A common alternative to case-control GWA studies is the analysis of quantitative phenotypic data, e.g. height or biomarker concentrations or even gene expression. Likewise, alternative statistics designed for dominance or recessive penetrance patterns can be used. Calculations are typically done using bioinformatics software such as SNPTEST and PLINK, which also include support for many of these alternative statistics. GWAS focuses on the effect of individual SNPs. However, it is also possible that complex interactions among two or more SNPs, epistasis, might contribute to complex diseases. Due to the potentially exponential number of interactions, detecting statistically significant interactions in GWAS data is both computationally and statistically challenging. This task has been tackled in existing publications that use algorithms inspired from data mining. Moreover, the researchers try to integrate GWA data with other biological data such as protein-protein interaction network to extract more informative results.

A key step in the majority of GWA studies is the imputation of genotypes at SNPs not on the genotype chip used in the study. This process greatly increases the number of SNPs that can be tested for association, increases the power of the study, and facilitates meta-analysis of GWAS across distinct cohorts. Genotype imputation is carried out by statistical methods that combine the GWAS data together with a reference panel of haplotypes. These methods take advantage of sharing of haplotypes between individuals over short stretches of sequence to impute alleles. Existing software packages for genotype imputation include IMPUTE2, Minimac, Beagle, and MaCH.

In addition to the calculation of association, it is common to take into account any variables that could potentially confound the results. Sex and age are common examples of confounding variables. Moreover, it is also known that many genetic variations are associated with the geographical and historical populations in which the mutations first arose. Because of this association, studies must take account of the geographic and ethnic background of participants by controlling for what is called population stratification. If they fail to do so, these studies can produce false positive results.

After odds ratios and P-values have been calculated for all SNPs, a common approach is to create a Manhattan plot. In the context of GWA studies, this plot shows the negative logarithm of the P-value as a function of genomic location. Thus the SNPs with the most significant association stand out on the plot, usually as stacks of points because of haploblock structure. Importantly, the P-value threshold for significance is corrected for multiple testing issues. The exact threshold varies by study, but the conventional genome-wide significance threshold is 5×10−8 to be significant in the face of hundreds of thousands to millions of tested SNPs. GWA studies typically perform the first analysis in a discovery cohort, followed by validation of the most significant SNPs in an independent validation cohort.

Results

Regional association plot, showing individual SNPs in the LDL receptor region and their association to LDL-cholesterol levels. This type of plot is similar to the Manhattan plot in the lead section, but for a more limited section of the genome. The haploblock structure is visualized with colour scale and the association level is given by the left Y-axis. The dot representing the rs73015013 SNP (in the top-middle) has a high Y-axis location because this SNP explains some of the variation in LDL-cholesterol.
 
Relationship between the minor allele frequency and the effect size of genome wide significant variants in a GWAS of height.

Attempts have been made at creating comprehensive catalogues of SNPs that have been identified from GWA studies. As of 2009, SNPs associated with diseases are numbered in the thousands.

The first GWA study, conducted in 2005, compared 96 patients with age-related macular degeneration (ARMD) with 50 healthy controls. It identified two SNPs with significantly altered allele frequency between the two groups. These SNPs were located in the gene encoding complement factor H, which was an unexpected finding in the research of ARMD. The findings from these first GWA studies have subsequently prompted further functional research towards therapeutical manipulation of the complement system in ARMD.

Another landmark publication in the history of GWA studies was the Wellcome Trust Case Control Consortium (WTCCC) study, the largest GWA study ever conducted at the time of its publication in 2007. The WTCCC included 14,000 cases of seven common diseases (~2,000 individuals for each of coronary heart disease, type 1 diabetes, type 2 diabetes, rheumatoid arthritis, Crohn's disease, bipolar disorder, and hypertension) and 3,000 shared controls. This study was successful in uncovering many new disease genes underlying these diseases.

Since these first landmark GWA studies, there have been two general trends. One has been towards larger and larger sample sizes. In 2018, several genome-wide association studies are reaching a total sample size of over 1 million participants, including 1.1 million in a genome-wide study of educational attainment and a study of insomnia containing 1.3 million individuals. The reason is the drive towards reliably detecting risk-SNPs that have smaller odds ratios and lower allele frequency. Another trend has been towards the use of more narrowly defined phenotypes, such as blood lipids, proinsulin or similar biomarkers. These are called intermediate phenotypes, and their analyses may be of value to functional research into biomarkers.

A variation of GWAS uses participants that are first-degree relatives of people with a disease. This type of study has been named genome-wide association study by proxy (GWAX).

A central point of debate on GWA studies has been that most of the SNP variations found by GWA studies are associated with only a small increased risk of the disease, and have only a small predictive value. The median odds ratio is 1.33 per risk-SNP, with only a few showing odds ratios above 3.0. These magnitudes are considered small because they do not explain much of the heritable variation. This heritable variation is estimated from heritability studies based on monozygotic twins. For example, it is known that 80-90% of variance in height can be explained by hereditary differences, but GWA studies only account for a minority of this variance.

Clinical applications and examples

A challenge for future successful GWA study is to apply the findings in a way that accelerates drug and diagnostics development, including better integration of genetic studies into the drug-development process and a focus on the role of genetic variation in maintaining health as a blueprint for designing new drugs and diagnostics. Several studies have looked into the use of risk-SNP markers as a means of directly improving the accuracy of prognosis. Some have found that the accuracy of prognosis improves, while others report only minor benefits from this use. Generally, a problem with this direct approach is the small magnitudes of the effects observed. A small effect ultimately translates into a poor separation of cases and controls and thus only a small improvement of prognosis accuracy. An alternative application is therefore the potential for GWA studies to elucidate pathophysiology.

Hepatitis C treatment

One such success is related to identifying the genetic variant associated with response to anti-hepatitis C virus treatment. For genotype 1 hepatitis C treated with Pegylated interferon-alpha-2a or Pegylated interferon-alpha-2b combined with ribavirin, a GWA study has shown that SNPs near the human IL28B gene, encoding interferon lambda 3, are associated with significant differences in response to the treatment. A later report demonstrated that the same genetic variants are also associated with the natural clearance of the genotype 1 hepatitis C virus. These major findings facilitated the development of personalized medicine and allowed physicians to customize medical decisions based on the patient's genotype.

eQTL, LDL and cardiovascular disease

The goal of elucidating pathophysiology has also led to increased interest in the association between risk-SNPs and the gene expression of nearby genes, the so-called expression quantitative trait loci (eQTL) studies. The reason is that GWAS studies identify risk-SNPs, but not risk-genes, and specification of genes is one step closer towards actionable drug targets. As a result, major GWA studies by 2011 typically included extensive eQTL analysis. One of the strongest eQTL effects observed for a GWA-identified risk SNP is the SORT1 locus. Functional follow up studies of this locus using small interfering RNA and gene knock-out mice have shed light on the metabolism of low-density lipoproteins, which have important clinical implications for cardiovascular disease.

Atrial fibrillation

For example, a meta-analysis accomplished in 2018 revealed the discovery of 70 new loci associated with atrial fibrillation. It has been identified different variants associated with transcription factor coding-genes, such as TBX3 and TBX5, NKX2-5 o PITX2, which are involved in cardiac conduction regulation, in ionic channel modulation and cardiac development. It was also identified new genes involved in tachycardia (CASQ2) or associated with alteration of cardiac muscle cell communication (PKP2).

Schizophrenia

While there is some research using a High-Precision Protein Interaction Prediction (HiPPIP) computational model that discovered 504 new protein-protein interactions (PPIs) associated with genes linked to schizophrenia, the evidence supporting the genetic basis of schizophrenia is actually controversial and may suffer from some of the limitation of this method of study.

Agricultural applications

Plant growth stages and yield components

GWA studies act as an important tool in plant breeding. With large genotyping and phenotyping data, GWAS are powerful in analyzing complex inheritance modes of traits that are important yield components such as number of grains per spike, weight of each grain and plant structure. In a study on GWAS in spring wheat, GWAS have revealed a strong correlation of grain production with booting data, biomass and number of grains per spike. GWA study is also a success in study genetic architecture of complex traits in rice.

Plant pathogens

The emergences of plant pathogens have posed serious threats to plant health and biodiversity. Under this consideration, identification of wild types that have the natural resistance to certain pathogens could be of vital importance. Furthermore, we need to predict which alleles are associated with the resistance. GWA studies is a powerful tool to detect the relationships of certain variants and the resistance to the plant pathogen, which is beneficial for developing new pathogen-resisted cultivars. 

Chicken

The first GWA study was done by Abasht and Lamont in 2007. This new tool was used to study the fatness trait in F2 population found previously. SNPs they found are on 10 chromosomes (1, 2, 3, 4, 7, 8, 10, 12, 15 and 27).

Limitations

GWA studies have several issues and limitations that can be taken care of through proper quality control and study setup. Lack of well defined case and control groups, insufficient sample size, control for multiple testing and control for population stratification are common problems. Particularly the statistical issue of multiple testing wherein it has been noted that "the GWA approach can be problematic because the massive number of statistical tests performed presents an unprecedented potential for false-positive results". Ignoring these correctible issues has been cited as contributing to a general sense of problems with the GWA methodology. In addition to easily correctible problems such as these, some more subtle but important issues have surfaced. A high-profile GWA study that investigated individuals with very long life spans to identify SNPs associated with longevity is an example of this. The publication came under scrutiny because of a discrepancy between the type of genotyping array in the case and control group, which caused several SNPs to be falsely highlighted as associated with longevity. The study was subsequently retracted, but a modified manuscript was later published.

In addition to these preventable issues, GWA studies have attracted more fundamental criticism, mainly because of their assumption that common genetic variation plays a large role in explaining the heritable variation of common disease. Indeed, it has been estimated that for most conditions the SNP heritability attributable to common SNPs is <0.05. This aspect of GWA studies has attracted the criticism that, although it could not have been known prospectively, GWA studies were ultimately not worth the expenditure. GWA studies also face criticism that the broad variation of individual responses or compensatory mechanisms to a disease state cancel out and mask potential genes or causal variants associated with the disease. Additionally, GWA studies identify candidate risk variants for the population from which their analysis is performed, and with most GWA studies stemming from European databases, there is a lack of translation of the identified risk variants to other non-European populations. Alternative strategies suggested involve linkage analysis. More recently, the rapidly decreasing price of complete genome sequencing have also provided a realistic alternative to genotyping array-based GWA studies. It can be discussed if the use of this new technique is still referred to as a GWA study, but high-throughput sequencing does have potential to side-step some of the shortcomings of non-sequencing GWA.

Fine-mapping

Genotyping arrays designed for GWAS rely on linkage disequilibrium to provide coverage of the entire genome by genotyping a subset of variants. Because of this, the reported associated variants are unlikely to be the actual causal variants. Associated regions can contain hundreds of variants spanning large regions and encompassing many different genes, making the biological interpretation of GWAS loci more difficult. Fine-mapping is a process to refine these lists of associated variants to a credible set most likely to include the causal variant.

Fine-mapping requires all variants in the associated region to have been genotyped or imputed (dense coverage), very stringent quality control resulting in high-quality genotypes, and large sample sizes sufficient in separating out highly correlated signals. There are several different methods to perform fine-mapping, and all methods produce a posterior probability that a variant in that locus is causal. Because the requirements are often difficult to satisfy, there are still limited examples of these methods being more generally applied.

Samaritans

From Wikipedia, the free encyclopedia https://en.wikipedia.org/w...