Search This Blog

Thursday, February 19, 2015

Human genome



From Wikipedia, the free encyclopedia

Genomic information
Karyotype.png
Graphical representation of the idealized human diploid karyotype, showing the organization of the genome into chromosomes. This drawing shows both the female (XX) and male (XY) versions of the 23rd chromosome pair. Chromosomes are shown aligned at their centromeres. The mitochondrial DNA is not shown.
NCBI genome ID 51
Ploidy diploid
Genome size 3,234.83 Mb (Mega-basepairs)
Number of chromosomes 23 pairs

The human genome is the complete set of genetic information for humans (Homo sapiens sapiens). This information is encoded as DNA sequences within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. Human genomes include both protein-coding DNA genes and noncoding DNA. Haploid human genomes (contained in egg and sperm cells) consist of three billion DNA base pairs, while diploid genomes (found in somatic cells) have twice the DNA content. While there are significant differences among the genomes of human individuals (on the order of 0.1%)[citation needed], these are considerably smaller than the differences between humans and their closest living relatives, the chimpanzees (approximately 4%[1]) and bonobos.

The Human Genome Project produced the first complete sequences of individual human genomes. As of 2012, thousands of human genomes have been completely sequenced, and many more have been mapped at lower levels of resolution. The resulting data are used worldwide in biomedical science, anthropology, forensics and other branches of science. There is a widely held expectation that genomic studies will lead to advances in the diagnosis and treatment of diseases, and to new insights in many fields of biology, including human evolution.

Although the sequence of the human genome has been (almost) completely determined by DNA sequencing, it is not yet fully understood. Most (though probably not all) genes have been identified by a combination of high throughput experimental and bioinformatics approaches, yet much work still needs to be done to further elucidate the biological functions of their protein and RNA products. Recent results suggest that most of the vast quantities of noncoding DNA within the genome have associated biochemical activities, including regulation of gene expression, organization of chromosome architecture, and signals controlling epigenetic inheritance.

There are an estimated 20,000-25,000 human protein-coding genes. The estimate of the number of human genes has been repeatedly revised down from initial predictions of 100,000 or more as genome sequence quality and gene finding methods have improved, and could continue to drop further,[2][3] Protein-coding sequences account for only a very small fraction of the genome (approximately 1.5%), and the rest is associated with non-coding RNA molecules, regulatory DNA sequences, LINEs, SINEs, introns, and sequences for which as yet no function has been elucidated.[4]

Molecular organization and gene content

The total length of the human genome is over 3 billion base pairs. The genome is organized into 22 paired chromosomes, the X chromosome (one in males, two in females) and, in males only, one Y chromosome, all being large linear DNA molecules contained within the cell nucleus. It also includes the mitochondrial DNA, a comparatively small circular molecule present in each mitochondrion. Basic information about these molecules and their gene content, based on a reference genome that does not represent the sequence of any specific individual, are provided in the following table. (Data source: Ensembl genome browser release 68, July 2012)
Chromosome Length (mm) Base pairs Variations Confirmed proteins Putative proteins Pseudogenes miRNA rRNA snRNA snoRNA Misc ncRNA Links Centromere position (Mbp) Cumulative (%)
1 85 249,250,621 4,401,091 2,012 31 1,130 134 66 221 145 106 EBI 125.0 7.9
2 83 243,199,373 4,607,702 1,203 50 948 115 40 161 117 93 EBI 93.3 16.2
3 67 198,022,430 3,894,345 1,040 25 719 99 29 138 87 77 EBI 91.0 23.0
4 65 191,154,276 3,673,892 718 39 698 92 24 120 56 71 EBI 50.4 29.6
5 62 180,915,260 3,436,667 849 24 676 83 25 106 61 68 EBI 48.4 35.8
6 58 171,115,067 3,360,890 1,002 39 731 81 26 111 73 67 EBI 61.0 41.6
7 54 159,138,663 3,045,992 866 34 803 90 24 90 76 70 EBI 59.9 47.1
8 50 146,364,022 2,890,692 659 39 568 80 28 86 52 42 EBI 45.6 52.0
9 48 141,213,431 2,581,827 785 15 714 69 19 66 51 55 EBI 49.0 56.3
10 46 135,534,747 2,609,802 745 18 500 64 32 87 56 56 EBI 40.2 60.9
11 46 135,006,516 2,607,254 1,258 48 775 63 24 74 76 53 EBI 53.7 65.4
12 45 133,851,895 2,482,194 1,003 47 582 72 27 106 62 69 EBI 35.8 70.0
13 39 115,169,878 1,814,242 318 8 323 42 16 45 34 36 EBI 17.9 73.4
14 36 107,349,540 1,712,799 601 50 472 92 10 65 97 46 EBI 17.6 76.4
15 35 102,531,392 1,577,346 562 43 473 78 13 63 136 39 EBI 19.0 79.3
16 31 90,354,753 1,747,136 805 65 429 52 32 53 58 34 EBI 36.6 82.0
17 28 81,195,210 1,491,841 1,158 44 300 61 15 80 71 46 EBI 24.0 84.8
18 27 78,077,248 1,448,602 268 20 59 32 13 51 36 25 EBI 17.2 87.4
19 20 59,128,983 1,171,356 1,399 26 181 110 13 29 31 15 EBI 26.5 89.3
20 21 63,025,520 1,206,753 533 13 213 57 15 46 37 34 EBI 27.5 91.4
21 16 48,129,895 787,784 225 8 150 16 5 21 19 8 EBI 13.2 92.6
22 17 51,304,566 745,778 431 21 308 31 5 23 23 23 EBI 14.7 93.8
X 53 155,270,560 2,174,952 815 23 780 128 22 85 64 52 EBI 60.6 99.1
Y 20 59,373,566 286,812 45 8 327 15 7 17 3 2 EBI 12.5 100.0
mtDNA 0.0054 16,569 929 13 0 0 0 2 0 0 22 EBI N/A 100.0

Table 1 (above) summarizes the physical organization and gene content of the human reference genome, with links to the original analysis, as published in the Ensembl database at the European Bioinformatics Institute (EBI) and Wellcome Trust Sanger Institute. Chromosome lengths were estimated by multiplying the number of base pairs by 0.34 nanometers, the distance between base pairs in the DNA double helix. The number of proteins is based on the number of initial precursor mRNA transcripts, and does not include products of alternative pre-mRNA splicing, or modifications to protein structure that occur after translation.

The number of variations is a summary of unique DNA sequence changes that have been identified within the sequences analyzed by Ensembl as of July, 2012; that number is expected to increase as further personal genomes are sequenced and examined. In addition to the gene content shown in this table, a large number of non-expressed functional sequences have been identified throughout the human genome (see below). Links open windows to the reference chromosome sequence in the EBI genome browser. The table also describes prevalence of genes encoding structural RNAs in the genome.

MiRNA, or MicroRNA, functions as a post-transcriptional regulator of gene expression. Ribosomal RNA, or rRNA, makes up the RNA portion of the ribosome and is critical in the synthesis of proteins. Small nuclear RNA, or snRNA, is found in the nucleus of the cell. Its primary function is in the processing of pre-mRNA molecules and also in the regulation of transcription factors. SnoRNA, or Small nucleolar RNA, primarily functions in guiding chemical modifications to other RNA molecules.

Completeness of the human genome sequence

Although the human genome has been completely sequenced for all practical purposes, there are still hundreds of gaps in the sequence. A recent study noted more than 160 euchromatic gaps of which 50 gaps were closed.[5] However, there are still numerous gaps in the heterochromatic parts of the genome which is much harder to sequence due to numerous repeats and other intractable sequence features.

Coding vs. noncoding DNA

The content of the human genome is commonly divided into coding and noncoding DNA sequences. Coding DNA is defined as those sequences that can be transcribed into mRNA and translated into proteins during the human life cycle; these sequences occupy only a small fraction of the genome (<2 a="" href="/wiki/Noncoding_DNA" title="Noncoding DNA">Noncoding DNA
is made up of all of those sequences (ca. 98% of the genome) that are not used to encode proteins.
Some noncoding DNA contains genes for RNA molecules with important biological functions (noncoding RNA, for example ribosomal RNA and transfer RNA). The exploration of the function and evolutionary origin of noncoding DNA is an important goal of contemporary genome research, including the ENCODE (Encyclopedia of DNA Elements) project, which aims to survey the entire human genome, using a variety of experimental tools whose results are indicative of molecular activity.

Because non-coding DNA greatly outnumbers coding DNA, the concept of the sequenced genome has become a more focused analytical concept than the classical concept of the DNA-coding gene.[6][7]

Coding sequences (protein-coding genes)


Human genes categorized by function of the transcribed proteins, given both as number of encoding genes and percentage of all genes.[8]

Protein-coding sequences represent the most widely studied and best understood component of the human genome. These sequences ultimately lead to the production of all human proteins, although several biological processes (e.g. DNA rearrangements and alternative pre-mRNA splicing) can lead to the production of many more unique proteins than the number of protein-coding genes.

The complete modular protein-coding capacity of the genome is contained within the exome, and consists of DNA sequences encoded by exons that can be translated into proteins. Because of its biological importance, and the fact that it constitutes less than 2% of the genome, sequencing of the exome was the first major milepost of the Human Genome Project.

Number of protein-coding genes. About 20,000 human proteins have been annotated in databases such as Uniprot.[9] Historically, estimates for the number of protein genes have varied widely, ranging up to 2,000,000 in the late 1960s,[10] but several researchers pointed out in the early 1970s that the estimated mutational load from deleterious mutations placed an upper limit of approximately 40,000 for the total number of functional loci (this includes protein-coding and functional non-coding genes).[11]

The number of human protein-coding genes is not significantly larger than that of many less complex organisms, such as the roundworm and the fruit fly. This difference may result from the extensive use of alternative pre-mRNA splicing in humans, which provides the ability to build a very large number of modular proteins through the selective incorporation of exons

Protein-coding capacity per chromosome. Protein-coding genes are distributed unevenly across the chromosomes, ranging from a few dozen to more than 2000, with an especially high gene density within chromosomes 19, 11, and 1 (Table 1). Each chromosome contains various gene-rich and gene-poor regions, which may be correlated with chromosome bands and GC-content[citation needed]. The significance of these nonrandom patterns of gene density is not well understood.[12]

Size of protein-coding genes. The size of protein-coding genes within the human genome shows enormous variability (Table 2). For example, the gene for histone H1a (HIST1HIA) is relatively small and simple, lacking introns and encoding mRNA sequences of 781 nt and a 215 amino acid protein (648 nt open reading frame). Dystrophin (DMD) is the largest protein-coding gene in the human reference genome, spanning a total of 2.2 MB, while Titin (TTN) has the longest coding sequence (80,780 bp), the largest number of exons (364), and the longest single exon (17,106 bp). Over the whole genome, the median size of an exon is 122 bp (mean = 145 bp), the median number of exons is 7 (mean = 8.8), and the median coding sequence encodes 367 amino acids (mean = 447 amino acids; Table 21 in[4] ).
Protein Chrom Gene Length Exons Exon length Intron length Alt splicing
Breast cancer type 2 susceptibility protein 13 BRCA2 83,736 27 11,386 72,350 yes
Cystic fibrosis transmembrane conductance regulator 7 CFTR 202,881 27 4,440 198,441 yes
Cytochrome b MT MTCYB 1,140 1 1,140 0 no
Dystrophin X DMD 2,220,381 79 10,500 2,209,881 yes
Glyceraldehyde-3-phosphate dehydrogenase 12 GAPDH 4,444 9 1,425 3,019 yes
Hemoglobin beta subunit 11 HBB 1,605 3 626 979 no
Histone H1A 6 HIST1H1A 781 1 781 0 no
Titin 2 TTN 281,434 364 104,301 177,133 yes
Table 2. Examples of human protein-coding genes. Chrom, chromosome. Alt splicing, alternative pre-mRNA splicing. (Data source: Ensembl genome browser release 68, July 2012)

Noncoding DNA (ncDNA)

Noncoding DNA is defined as all of the DNA sequences within a genome that are not found within protein-coding exons, and so are never represented within the amino acid sequence of expressed proteins. By this definition, more than 98% of the human genomes is composed of ncDNA.
Numerous classes of noncoding DNA have been identified, including genes for noncoding RNA (e.g. tRNA and rRNA), pseudogenes, introns, untranslated regions of mRNA, regulatory DNA sequences, repetitive DNA sequences, and sequences related to mobile genetic elements.

Numerous sequences that are included within genes are also defined as noncoding DNA. These include genes for noncoding RNA (e.g. tRNA, rRNA), and untranslated components of protein-coding genes (e.g. introns, and 5' and 3' untranslated regions of mRNA).

Protein-coding sequences (specifically, coding exons) constitute less than 1.5% of the human genome.[4] In addition, about 26% of the human genome is introns.[13] Aside from genes (exons and introns) and known regulatory sequences (8–20%), the human genome contains regions of noncoding DNA. The exact amount of noncoding DNA that plays a role in cell physiology has been hotly debated. Recent analysis by the ENCODE project indicates that 80% of the entire human genome is either transcribed, binds to regulatory proteins, or is associated with some other biochemical activity.[3]

It however remains controversial whether all of this biochemical activity contributes to cell physiology, or whether a substantial portion of this is the result transcriptional and biochemical noise, which must be actively filtered out by the organism.[14] Excluding protein-coding sequences, introns, and regulatory regions, much of the non-coding DNA is composed of: Many DNA sequences that do not play a role in gene expression have important biological functions. Comparative genomics studies indicate that about 5% of the genome contains sequences of noncoding DNA that are highly conserved, sometimes on time-scales representing hundreds of millions of years, implying that these noncoding regions are under strong evolutionary pressure and positive selection.[15]

Many of these sequences regulate the structure of chromosomes by limiting the regions of heterochromatin formation and regulating structural features of the chromosomes, such as the telomeres and centromeres. Other noncoding regions serve as origins of DNA replication. Finally several regions are transcribed into functional noncoding RNA that regulate the expression of protein-coding genes (for example[16] ), mRNA translation and stability (see miRNA), chromatin structure (including histone modifications, for example[17] ), DNA methylation (for example[18] ), DNA recombination (for example[19] ), and cross-regulate other noncoding RNAs (for example[20] ). It is also likely that many transcribed noncoding regions do not serve any role and that this transcription is the product of non-specific RNA Polymerase activity.[14]

Pseudogenes

Pseudogenes are inactive copies of protein-coding genes, often generated by gene duplication, that have become nonfunctional through the accumulation of inactivating mutations. Table 1 shows that the number of pseudogenes in the human genome is on the order of 13,000,[21] and in some chromosomes is nearly the same as the number of functional protein-coding genes. Gene duplication is a major mechanism through which new genetic material is generated during molecular evolution.
For example, the olfactory receptor gene family is one of the best-documented examples of pseudogenes in the human genome. More than 60 percent of the genes in this family are non-functional pseudogenes in humans. By comparison, only 20 percent of genes in the mouse olfactory receptor gene family are pseudogenes. Research suggests that this is a species-specific characteristic, as the most closely related primates all have proportionally fewer pseudogenes. This genetic discovery helps to explain the less acute sense of smell in humans relative to other mammals.[22]

Genes for noncoding RNA (ncRNA)

Noncoding RNA molecules play many essential roles in cells, especially in the many reactions of protein synthesis and RNA processing. The human genome contains genes encoding 18,400 ncRNAs, including tRNA, ribosomal RNA, microRNA, and other non-coding RNA genes.[3][23]
One historical misconception regarding the ncRNAs is that they lack critical genetic information or function. Rather, these ncRNAs are often critical elements in gene regulation and expression. Noncoding RNA also contributes to epigenetics, transcription, RNA splicing, and the translational machinery. The role of RNA in genetic regulation and disease offers a new potential level of unexplored genomic complexity.[24]

Introns and untranslated regions of mRNA

In addition to the ncRNA molecules that are encoded by discrete genes, the initial transcripts of protein coding genes usually contain extensive noncoding sequences, in the form of introns, 5'-untranslated regions (5'-UTR), and 3'-untranslated regions (3'-UTR). Within most protein-coding genes of the human genome, the length of intron sequences is 10- to 100-times the length of exon sequences (Table 2).

Regulatory DNA sequences

The human genome has many different regulatory sequences which are crucial to controlling gene expression. Conservative estimates indicate that these sequences make up 8% of the genome,[25] however extrapolations from the ENCODE project give that 20[26]-40%[27] of the genome is gene regulatory sequence. Some types of non-coding DNA are genetic "switches" that do not encode proteins, but do regulate when and where genes are expressed (called enhancers).[28]

Regulatory sequences have been known since the late 1960s.[29] The first identification of regulatory sequences in the human genome relied on recombinant DNA technology.[30] Later with the advent of genomic sequencing, the identification of these sequences could be inferred by evolutionary conservation. The evolutionary branch between the primates and mouse, for example, occurred 70–90 million years ago.[31] So computer comparisons of gene sequences that identify conserved non-coding sequences will be an indication of their importance in duties such as gene regulation.[32]

Other genomes have been sequenced with the same intention of aiding conservation-guided methods, for exampled the pufferfish genome.[33] However, regulatory sequences disappear and re-evolve during evolution at a high rate.[34][35][36]

As of 2012, the efforts have shifted toward finding interactions between DNA and regulatory proteins by the technique ChIP-Seq, or gaps where the DNA is not packaged by histones (DNase hypersensitive sites), both of which tell where there are active regulatory sequences in the investigated cell type.[25]

Repetitive DNA sequences

Repetitive DNA sequences comprise approximately 50% of the human genome.[37]

About 8% of the human genome consists of tandem DNA arrays or tandem repeats, low complexity repeat sequences that have multiple adjacent copies (e.g. "CAGCAGCAG...").[citation needed] The tandem sequences may be of variable lengths, from two nucleotides to tens of nucleotides. These sequences are highly variable, even among closely related individuals, and so are used for genealogical DNA testing and forensic DNA analysis.

Repeated sequences of fewer than ten nucleotides (e.g. the dinucleotide repeat (AC)n) are termed microsatellite sequences. Among the microsatellite sequences, trinucleotide repeats are of particular importance, as sometimes occur within coding regions of genes for proteins and may lead to genetic disorders. For example, Huntington's disease results from an expansion of the trinucleotide repeat (CAG)n within the Huntingtin gene on human chromosome 4. Telomeres (the ends of linear chromosomes) end with a microsatellite hexanucleotide repeat of the sequence (TTAGGG)n.

Tandem repeats of longer sequences (arrays of repeated sequences 10–60 nucleotides long) are termed minisatellites.

Mobile genetic elements (transposons) and their relics

Transposable genetic elements, DNA sequences that can replicate and insert copies of themselves at other locations within a host genome, are an abundant component in the human genome. The most abundant transposon lineage, Alu, has about 50,000 active copies,[38] while another lineage, LINE-1, has about 100 active copies per genome (the number varies between people).[39] Together with non-functional relics of old transposons, they account for over half of total human DNA.[40]
Sometimes called "jumping genes", transposons have played a major role in sculpting the human genome. Some of these sequences represent endogenous retroviruses, DNA copies of viral sequences that have become permanently integrated into the genome and are now passed on to succeeding generations.

Mobile elements within the human genome can be classified into LTR retrotransposons (8.3% of total genome), SINEs (13.1% of total genome) including Alu elements, LINEs (20.4% of total genome), SVAs and Class II DNA transposons (2.9% of total genome).

Genomic variation in humans

Human Reference Genome

With the exception of identical twins, all humans show significant variation in genomic DNA sequences. The Human Reference Genome (HRG) is used as a standard sequence reference.
There are several important points concerning the Human Reference Genome--
  • The HRG is a haploid sequence. Each chromosome is represented once.
  • The HRG is a composite sequence, and does not correspond to any actual human individual.
  • The HRG is periodically updated to correct errors and ambiguities.
  • The HRG in no way represents an "ideal" or "perfect" human individual. It is simply a standardized representation or model that is used for comparative purposes.

Measuring human genetic variation

Most studies of human genetic variation have focused on single-nucleotide polymorphisms (SNPs), which are substitutions in individual bases along a chromosome. Most analyses estimate that SNPs occur 1 in 1000 base pairs, on average, in the euchromatic human genome, although they do not occur at a uniform density. Thus follows the popular statement that "we are all, regardless of race, genetically 99.9% the same",[41] although this would be somewhat qualified by most geneticists. For example, a much larger fraction of the genome is now thought to be involved in copy number variation.[42] A large-scale collaborative effort to catalog SNP variations in the human genome is being undertaken by the International HapMap Project.

The genomic loci and length of certain types of small repetitive sequences are highly variable from person to person, which is the basis of DNA fingerprinting and DNA paternity testing technologies. The heterochromatic portions of the human genome, which total several hundred million base pairs, are also thought to be quite variable within the human population (they are so repetitive and so long that they cannot be accurately sequenced with current technology). These regions contain few genes, and it is unclear whether any significant phenotypic effect results from typical variation in repeats or heterochromatin.

Most gross genomic mutations in gamete germ cells probably result in inviable embryos; however, a number of human diseases are related to large-scale genomic abnormalities. Down syndrome, Turner Syndrome, and a number of other diseases result from nondisjunction of entire chromosomes. Cancer cells frequently have aneuploidy of chromosomes and chromosome arms, although a cause and effect relationship between aneuploidy and cancer has not been established.

Mapping human genomic variation

Whereas a genome sequence lists the order of every DNA base in a genome, a genome map identifies the landmarks. A genome map is less detailed than a genome sequence and aids in navigating around the genome.[43][44]

An example of a variation map is the HapMap being developed by the International HapMap Project. The HapMap is a haplotype map of the human genome, "which will describe the common patterns of human DNA sequence variation."[45] It catalogs the patterns of small-scale variations in the genome that involve single DNA letters, or bases.

Researchers published the first sequence-based map of large-scale structural variation across the human genome in the journal Nature in May 2008.[46][47] Large-scale structural variations are differences in the genome among people that range from a few thousand to a few million DNA bases; some are gains or losses of stretches of genome sequence and others appear as re-arrangements of stretches of sequence. These variations include differences in the number of copies individuals have of a particular gene, deletions, translocations and inversions.

Personal genomes

A personal genome sequence is a (nearly) complete sequence of the chemical base pairs that make up the DNA of a single person. Because medical treatments have different effects on different people because of genetic variations such as single-nucleotide polymorphisms (SNPs), the analysis of personal genomes may lead to personalized medical treatment based on individual genotypes.[citation needed]

The first personal genome sequence to be determined was that of Craig Venter in 2007. Personal genomes had not been sequenced in the public Human Genome Project to protect the identity of volunteers who provided DNA samples. That sequence was derived from the DNA of several volunteers from a diverse population.[48] However, early in the Venter-led Celera Genomics genome sequencing effort the decision was made to switch from sequencing a composite sample to using DNA from a single individual, later revealed to have been Venter himself. Thus the Celera human genome sequence released in 2000 was largely that of one man. Subsequent replacement of the early composite-derived data and determination of the diploid sequence, representing both sets of chromosomes, rather than a haploid sequence originally reported, allowed the release of the first personal genome.[49] In April 2008, that of James Watson was also completed. Since then hundreds of personal genome sequences have been released,[50] including those of Desmond Tutu,[51][52] and of a Paleo-Eskimo.[53] In November 2013, a Spanish family made their personal genomics data obtained by direct-to-consumer genetic testing with 23andMe publicly available under a Creative Commons public domain license. This is believed to be the first such public genomics dataset for a whole family.[54]

The sequencing of individual genomes further unveiled levels of genetic complexity that had not been appreciated before. Personal genomics helped reveal the significant level of diversity in the human genome attributed not only to SNPs but structural variations as well. However, the application of such knowledge to the treatment of disease and in the medical field is only in its very beginnings.[55] Exome sequencing has become increasingly popular as a tool to aid in diagnosis of genetic disease because the exome contributes only 1% of the genomic sequence but accounts for roughly 85% of mutations that contribute significantly to disease.[56]

Human genetic disorders

Most aspects of human biology involve both genetic (inherited) and non-genetic (environmental) factors. Some inherited variation influences aspects of our biology that are not medical in nature (height, eye color, ability to taste or smell certain compounds, etc.). Moreover, some genetic disorders only cause disease in combination with the appropriate environmental factors (such as diet). With these caveats, genetic disorders may be described as clinically defined diseases caused by genomic DNA sequence variation. In the most straightforward cases, the disorder can be associated with variation in a single gene. For example, cystic fibrosis is caused by mutations in the CFTR gene, and is the most common recessive disorder in caucasian populations with over 1,300 different mutations known.[57]
Disease-causing mutations in specific genes are usually severe in terms of gene function, and are fortunately rare, thus genetic disorders are similarly individually rare. However, since there are many genes that can vary to cause genetic disorders, in aggregate they constitute a significant component of known medical conditions, especially in pediatric medicine. Molecularly characterized genetic disorders are those for which the underlying causal gene has been identified, currently there are approximately 2,200 such disorders annotated in the OMIM database.[57]

Studies of genetic disorders are often performed by means of family-based studies. In some instances population based approaches are employed, particularly in the case of so-called founder populations such as those in Finland, French-Canada, Utah, Sardinia, etc. Diagnosis and treatment of genetic disorders are usually performed by a geneticist-physician trained in clinical/medical genetics. The results of the Human Genome Project are likely to provide increased availability of genetic testing for gene-related disorders, and eventually improved treatment. Parents can be screened for hereditary conditions and counselled on the consequences, the probability it will be inherited, and how to avoid or ameliorate it in their offspring.

As noted above, there are many different kinds of DNA sequence variation, ranging from complete extra or missing chromosomes down to single nucleotide changes. It is generally presumed that much naturally occurring genetic variation in human populations is phenotypically neutral, i.e. has little or no detectable effect on the physiology of the individual (although there may be fractional differences in fitness defined over evolutionary time frames). Genetic disorders can be caused by any or all known types of sequence variation. To molecularly characterize a new genetic disorder, it is necessary to establish a causal link between a particular genomic sequence variant and the clinical disease under investigation. Such studies constitute the realm of human molecular genetics.

With the advent of the Human Genome and International HapMap Project, it has become feasible to explore subtle genetic influences on many common disease conditions such as diabetes, asthma, migraine, schizophrenia, etc. Although some causal links have been made between genomic sequence variants in particular genes and some of these diseases, often with much publicity in the general media, these are usually not considered to be genetic disorders per se as their causes are complex, involving many different genetic and environmental factors. Thus there may be disagreement in particular cases whether a specific medical condition should be termed a genetic disorder. The categorized table below provides the prevalence as well as the genes or chromosomes associated with some human genetic disorders.
Disorder Prevalence Chromosome or gene involved
Chromosomal conditions
Down syndrome 1:600 Chromosome 21
Klinefelter syndrome 1:500–1000 males Additional X chromosome
Turner syndrome 1:2000 females Loss of X chromosome
Sickle cell anemia 1 in 50 births in parts of Africa; rarer elsewhere[58] β-globin (on chromosome 11)
Cancers
Breast/Ovarian cancer (susceptibility) ~5% of cases of these cancer types BRCA1, BRCA2
FAP (hereditary nonpolyposis coli) 1:3500 APC
Lynch syndrome 5–10% of all cases of bowel cancer MLH1, MSH2, MSH6, PMS2
Neurological conditions
Huntington disease 1:20000 Huntingtin
Alzheimer disease ‐ early onset 1:2500 PS1, PS2, APP
Other conditions
Cystic fibrosis 1:2500 CFTR
Duchenne muscular dystrophy 1:3500 boys Dystrophin

Evolution

Comparative genomics studies of mammalian genomes suggest that approximately 5% of the human genome has been conserved by evolution since the divergence of extant lineages approximately 200 million years ago, containing the vast majority of genes.[59][60] The published chimpanzee genome differs from that of the human genome by 1.23% in direct sequence comparisons.[61] Around 20% of this figure is accounted for by variation within each species, leaving only ~1.06% consistent sequence divergence between humans and chimps at shared genes.[62] This nucleotide by nucleotide difference is dwarfed, however, by the portion of each genome that is not shared, including around 6% of functional genes that are unique to either humans or chimps.[63]
In other words, the considerable observable differences between humans and chimps may be due as much or more to genome level variation in the number, function and expression of genes rather than DNA sequence changes in shared genes. Indeed, even within humans, there has been found to be a previously unappreciated amount of copy number variation (CNV) which can make up as much as 5 – 15% of the human genome. In other words, between humans, there could be +/- 500,000,000 base pairs of DNA, some being active genes, others inactivated, or active at different levels. The full significance of this finding remains to be seen. On average, a typical human protein-coding gene differs from its chimpanzee ortholog by only two amino acid substitutions; nearly one third of human genes have exactly the same protein translation as their chimpanzee orthologs. A major difference between the two genomes is human chromosome 2, which is equivalent to a fusion product of chimpanzee chromosomes 12 and 13[64] (later renamed to chromosomes 2A and 2B, respectively).

Humans have undergone an extraordinary loss of olfactory receptor genes during our recent evolution, which explains our relatively crude sense of smell compared to most other mammals. Evolutionary evidence suggests that the emergence of color vision in humans and several other primate species has diminished the need for the sense of smell.[65]

Mitochondrial DNA

The human mitochondrial DNA is of tremendous interest to geneticists, since it undoubtedly plays a role in mitochondrial disease. It also sheds light on human evolution; for example, analysis of variation in the human mitochondrial genome has led to the postulation of a recent common ancestor for all humans on the maternal line of descent. (see Mitochondrial Eve)

Due to the lack of a system for checking for copying errors, Mitochondrial DNA (mtDNA) has a more rapid rate of variation than nuclear DNA. This 20-fold increase in the mutation rate allows mtDNA to be used for more accurate tracing of maternal ancestry. Studies of mtDNA in populations have allowed ancient migration paths to be traced, such as the migration of Native Americans from Siberia or Polynesians from southeastern Asia. It has also been used to show that there is no trace of Neanderthal DNA in the European gene mixture inherited through purely maternal lineage.[66] Due to the restrictive all or none manner of mtDNA inheritance, this result (no trace of Neanderthal mtDNA) would be likely unless there were a large percentage of Neanderthal ancestry, or there was strong positive selection for that mtDNA (for example, going back 5 generations, only 1 of your 32 ancestors contributed to your mtDNA, so if one of these 32 was pure Neanderthal you would expect that ~3% of your autosomal DNA would be of Neanderthal origin, yet you would have a ~97% chance to have no trace of Neanderthal mtDNA).

Epigenome

Epigenetics describes a variety of features of the human genome that transcend its primary DNA sequence, such as chromatin packaging, histone modifications and DNA methylation, and which are important in regulating gene expression, genome replication and other cellular processes. Epigenetic markers strengthen and weaken transcription of certain genes but do not affect the actual sequence of DNA nucleotides. DNA methylation is a major form of epigenetic control over gene expression and one of the most highly studied topics in epigenetics. During development, the human DNA methylation profile experiences dramatic changes. In early germ line cells, the genome has very low methylation levels. These low levels generally describe active genes. As development progresses, parental imprinting tags lead to increased methylation activity.[67][68]
Epigenetic patterns can be identified between tissues within an individual as well as between individuals themselves. Identical genes that have differences only in their epigenetic state are called epialleles. Epialleles can be placed into three categories: those directly determined by an individual’s genotype, those influenced by genotype, and those entirely independent of genotype. The epigenome is also influenced significantly by environmental factors. Diet, toxins, and hormones impact the epigenetic state. Studies in dietary manipulation have demonstrated that methyl-deficient diets are associated with hypomethylation of the epigenome. Such studies establish epigenetics as an important interface between the environment and the genome.[69]

Loop quantum cosmology



From Wikipedia, the free encyclopedia

Loop quantum cosmology (LQC) is a finite, symmetry-reduced model of loop quantum gravity (LQG) that predicts a "quantum bridge" between contracting and expanding cosmological branches.
The distinguishing feature of LQC is the prominent role played by the quantum geometry effects of loop quantum gravity (LQG). In particular, quantum geometry creates a brand new repulsive force which is totally negligible at low space-time curvature but rises very rapidly in the Planck regime, overwhelming the classical gravitational attraction and thereby resolving singularities of general relativity. Once singularities are resolved, the conceptual paradigm of cosmology changes and one has to revisit many of the standard issues—e.g., the "horizon problem"—from a new perspective.
Since LQG is based on a specific quantum theory of Riemannian geometry,[1] geometric observables display a fundamental discreteness that play a key role in quantum dynamics: While predictions of LQC are very close to those of quantum geometrodynamics (QGD) away from the Planck regime, there is a dramatic difference once densities and curvatures enter the Planck scale. In LQC the big bang is replaced by a quantum bounce.

Study of LQC has led to many successes, including the emergence of a possible mechanism for cosmic inflation, resolution of gravitational singularities, as well as the development of effective semi-classical Hamiltonians.

This subfield was originally started in 1999 by Martin Bojowald, and further developed in particular by Abhay Ashtekar and Jerzy Lewandowski. In late 2012 LQC represents a very active field in physics, with about three hundred papers on the subject published in the literature. There has also recently been work by Carlo Rovelli, et al. on relating LQC to the spinfoam-based spinfoam cosmology.

However, the results obtained in LQC are subject to the usual restriction that a truncated classical theory, then quantized, might not display the true behaviour of the full theory due to artificial suppression of degrees of freedom that might have large quantum fluctuations in the full theory. It has been argued that singularity avoidance in LQC are by mechanisms only available in these restrictive models and that singularity avoidance in the full theory can still be obtained but by a more subtle feature of LQG.[2][3]

Does dark matter cause mass extinctions and geologic upheavals?

Original link:  http://phys.org/news/2015-02-dark-mass-extinctions-geologic-upheavals.html

dark matter 
A massive cluster of yellowish galaxies, seemingly caught in a red and blue spider web of eerily distorted background galaxies, makes for a spellbinding picture from the new Advanced Camera for Surveys aboard NASA's Hubble Space Telescope. To …more
Research by New York University Biology Professor Michael Rampino concludes that Earth's infrequent but predictable path around and through our Galaxy's disc may have a direct and significant effect on geological and biological phenomena occurring on Earth. In a new paper in Monthly Notices of the Royal Astronomical Society, he concludes that movement through dark matter may perturb the orbits of comets and lead to additional heating in the Earth's core, both of which could be connected with mass extinction events.

The Galactic disc is the region of the Milky Way Galaxy where our solar system resides. It is crowded with stars and clouds of gas and dust, and also a concentration of elusive dark matter—small subatomic particles that can be detected only by their gravitational effects.

Previous studies have shown that Earth rotates around the disc-shaped Galaxy once every 250 million years. But the Earth's path around the Galaxy is wavy, with the Sun and planets weaving through the crowded disc approximately every 30 million years. Analyzing the pattern of the Earth's passes through the Galactic disc, Rampino notes that these disc passages seem to correlate with times of comet impacts and mass extinctions of life. The famous comet strike 66 million ago that led to the extinction of the dinosaurs is just one example.

What causes this correlation between Earth's passes through the Galactic disc, and the impacts and extinctions that seem to follow?

While traveling through the disc, the dark matter concentrated there disturbs the pathways of comets typically orbiting far from the Earth in the outer Solar System, Rampino observes. This means that comets that would normally travel at great distances from the Earth instead take unusual paths, causing some of them to collide with the planet.

But even more remarkably, with each dip through the disc, the dark matter can apparently accumulate within the Earth's core. Eventually, the annihilate each other, producing considerable heat. The heat created by the annihilation of dark matter in Earth's core could trigger events such as volcanic eruptions, mountain building, magnetic field reversals, and changes in sea level, which also show peaks every 30 million years. Rampino therefore suggests that astrophysical phenomena derived from the Earth's winding path through the Galactic disc, and the consequent accumulation of dark matter in the planet's interior, can result in dramatic changes in Earth's geological and biological activity.

His model of dark matter interactions with the Earth as it cycles through the Galaxy could have a broad impact on our understanding of the geological and biological development of Earth, as well as other planets within the Galaxy.

"We are fortunate enough to live on a planet that is ideal for the development of complex life," Rampino says. "But the history of the Earth is punctuated by large scale extinction events, some of which we struggle to explain. It may be that dark matter - the nature of which is still unclear but which makes up around a quarter of the universe - holds the answer. As well as being important on the largest scales, dark matter may have a direct influence on life on Earth."

In the future, he suggests, geologists might incorporate these astrophysical findings in order to better understand events that are now thought to result purely from causes inherent to the Earth. This model, Rampino adds, likewise provides new knowledge of the possible distribution and behaviour of within the Galaxy.

How The Nature of Information Could Resolve One of The Great Paradoxes Of Cosmology

Stephen Hawking described it as the most spectacular failure of any physical theory in history. Can a new theory of information rescue cosmologists?

Original link:  https://medium.com/the-physics-arxiv-blog/how-the-nature-of-information-could-resolve-one-of-the-great-paradoxes-of-cosmology-8c16fc714756

One of the biggest puzzles in science is the cosmological constant paradox. This arises when physicists attempt to calculate the energy density of the universe from first principles. Using quantum mechanics, the number they come up with is 10^94 g/cm^3.

And yet the observed energy density, calculated from the density of mass in the cosmos and the way the universe is expanding, is about 10^-27 g/cm^3. In other words, our best theory of the universe misses the mark by 120 orders of magnitude.

That’s left cosmologists somewhat red-faced. Indeed, Stephen Hawking has famously described this as the most spectacular failure of any physical theory in history. This huge discrepancy is all the more puzzling because quantum mechanics makes such accurate predictions in other circumstances. Just why it goes so badly wrong here is unknown.

Today, Chris Fields, an independent researcher formerly with New Mexico State University in Las Cruces, puts forward a simple explanation. His idea is that the discrepancy arises because large objects, such as planets and stars, behave classically rather than demonstrating quantum properties. And he’s provided some simple calculations to make his case.

One of the key properties of quantum objects is that they can exist in a superposition of states until they are observed. When that happens, these many possibilities “collapse” and become one specific outcome, a process known as quantum decoherence.

For example, a photon can be in a superposition of states that allow it to be in several places at the same time. However, as soon as the photon is observed the superposition decoheres and the photon appears in one place.

This process of decoherence must apply to everything that has a specific position, says Fields. Even to large objects such as stars, whose position is known with respect to the cosmic microwave background, the echo of the big bang which fills the universe.

In fact, Fields argues that it is the interaction between the cosmic microwave background and all large objects in the universe that causes them to decohere giving them specific positions which astronomers observe.

But there is an important consequence from having a specific position — there must be some information associated with this location in 3D space. If a location is unknown, then the amount of information must be small. But if it is known with precision, the information content is much higher.

And given that there are some 10^25 stars in the universe, that’s a lot of information. Fields calculates that encoding the location of each star to within 10 cubic kilometres requires some 10^93 bits.

That immediately leads to an entirely new way of determining the energy density of the cosmos. Back in the 1960s, the physicist Rolf Landauer suggested that every bit of information had an energy associated with it, an idea that has gained considerable traction since then.

So Fields uses Landauer’s principle to calculate the energy associated with the locations of all the stars in the universe. This turns out to be about 10^-30 g /cm^3, very similar to the observed energy density of the universe.

But here’s the thing. That calculation requires the position of each star to be encoded only to within 10 km^3. Fields also asks how much information is required to encode the position of stars to the much higher resolution associated with the Planck length. “Encoding 10^25 stellar positions at [the Planck length] would incur a free-energy cost ∼ 10^117 larger than that found here,” he says.

That difference is remarkably similar to the 120 orders of magnitude discrepancy between the observed energy density and that calculated using quantum mechanics. Indeed, Fields says that the discrepancy arises because the positions of the stars can be accounted for using quantum mechanics. “It seems reasonable to suggest that the discrepancy between these numbers may be due to the assumption that encoding classical information at [the Planck scale] can be considered physically meaningful.”

That’s a fascinating result that raises important questions about the nature of reality. First, there is the hint in Fields’ ideas that information provides the ghostly bedrock on which the laws of physics are based. That’s an idea that has gained traction among other physicists too.

Then there is the role of energy. One important question is where this energy might have come from in the first place. The process of decoherence seems to create it from nothing.

Cosmologists generally overlook violations of the principle of conservation of energy. After all, the big bang itself is the biggest offender. So don’t expect much hand wringing over this. But Fields’ approach also implies that a purely quantum universe would have an energy density of zero, since nothing would have localised position. That’s bizarre.

Beyond this is the even deeper question of how the universe came to be classical at all, given that cosmologists would have us believe that the big bang was a quantum process. Fields suggests that it is the interaction between the cosmic microwave background and the rest of the universe that causes the quantum nature of the universe to decohere and become classical.

Perhaps. What is all too clear is that there are fundamental and fascinating problems in cosmology — and the role that information plays in reality.

Ref: arxiv.org/abs/1502.03424 : Is Dark Energy An Artifact Of Decoherence?

Brane cosmology



From Wikipedia, the free encyclopedia

Brane and bulk

The central idea is that the visible, four-dimensional universe is restricted to a brane inside a higher-dimensional space, called the "bulk" (also known as "hyperspace"). If the additional dimensions are compact, then the observed universe contains the extra dimensions, and then no reference to the bulk is appropriate. In the bulk model, at least some of the extra dimensions are extensive (possibly infinite), and other branes may be moving through this bulk. Interactions with the bulk, and possibly with other branes, can influence our brane and thus introduce effects not seen in more standard cosmological models.

Why gravity is weak and the cosmological constant is small

Some versions of brane cosmology, based on the large extra dimension idea, can explain the weakness of gravity relative to the other fundamental forces of nature, thus solving the so-called hierarchy problem. In the brane picture, the other three forces (electromagnetism and the weak and strong nuclear forces) are localized on the brane, but gravity has no such constraint and propagates on the full spacetime, called bulk. Much of the gravitational attractive power "leaks" into the bulk. As a consequence, the force of gravity should appear significantly stronger on small (subatomic or at least sub-millimetre) scales, where less gravitational force has "leaked". Various experiments are currently under way to test this.[1] Extensions of the large extra dimension idea with supersymmetry in the bulk appears to be promising in addressing the so-called cosmological constant problem.[2][3][4]

Models of brane cosmology

One of the earliest documented attempts to apply brane cosmology as part of a conceptual theory is dated to 1983.[5]

The authors discussed the possibility that the Universe has (3+N)+1 dimensions, but ordinary particles are confined in a potential well which is narrow along N spatial directions and flat along three others, and proposed a particular five-dimensional model.

In 1998/99 Merab Gogberashvili published on arXiv a number of articles where he showed that if the Universe is considered as a thin shell (a mathematical synonym for "brane") expanding in 5-dimensional space then there is a possibility to obtain one scale for particle theory corresponding to the 5-dimensional cosmological constant and Universe thickness, and thus to solve the hierarchy problem.[6][7][8] It was also shown that the four-dimensionality of the Universe is the result of the stability requirement found in mathematics since the extra component of the Einstein field equations giving the confined solution for matter fields coincides with one of the conditions of stability.

In 1999 there were proposed the closely related Randall–Sundrum (RS1 and RS2; see 5 dimensional warped geometry theory for a nontechnical explanation of RS1) scenarios. These particular models of brane cosmology have attracted a considerable amount of attention.

Later, the pre-big bang, ekpyrotic and cyclic proposals appeared. The ekpyrotic theory hypothesizes that the origin of the observable universe occurred when two parallel branes collided.[9]

Empirical tests

As of now, no experimental or observational evidence of large extra dimensions, as required by the Randall–Sundrum models, has been reported. An analysis of results from the Large Hadron Collider in December 2010 severely constrains theories with large extra dimensions.[10]

Graviton


From Wikipedia, the free encyclopedia

Graviton
Composition Elementary particle
Statistics Bose–Einstein statistics
Interactions Gravitation
Status Theoretical
Symbol G[1]
Antiparticle Self
Theorized 1930s[2]
The name is attributed to Dmitrii Blokhintsev and F. M. Gal'perin in 1934[3]
Discovered Hypothetical
Mass 0
Mean lifetime Stable
Electric charge e
Spin 2

In physics, the graviton is a hypothetical elementary particle that mediates the force of gravitation in the framework of quantum field theory. If it exists, the graviton is expected to be massless (because the gravitational force appears to have unlimited range) and must be a spin-2 boson. The spin follows from the fact that the source of gravitation is the stress–energy tensor, a second-rank tensor (compared to electromagnetism's spin-1 photon, the source of which is the four-current, a first-rank tensor). Additionally, it can be shown that any massless spin-2 field would give rise to a force indistinguishable from gravitation, because a massless spin-2 field must couple to (interact with) the stress–energy tensor in the same way that the gravitational field does. Seeing as the graviton is hypothetical, its discovery would unite quantum theory with gravity.[4] This result suggests that, if a massless spin-2 particle is discovered, it must be the graviton, so that the only experimental verification needed for the graviton may simply be the discovery of a massless spin-2 particle.[5]

Theory

The three other known forces of nature are mediated by elementary particles: electromagnetism by the photon, the strong interaction by the gluons, and the weak interaction by the W and Z bosons. The hypothesis is that the gravitational interaction is likewise mediated by an – as yet undiscovered – elementary particle, dubbed as the graviton. In the classical limit, the theory would reduce to general relativity and conform to Newton's law of gravitation in the weak-field limit.[6][7][8]

Gravitons and renormalization

When describing graviton interactions, the classical theory (i.e., the tree diagrams) and semiclassical corrections (one-loop diagrams) behave normally, but Feynman diagrams with two (or more) loops lead to ultraviolet divergences; that is, infinite results that cannot be removed because the quantized general relativity is not renormalizable, unlike quantum electrodynamics.[dubious ] That is, the usual ways physicists calculate the probability that a particle will emit or absorb a graviton give nonsensical answers and the theory loses its predictive power. These problems, together with some conceptual puzzles, led many physicists[who?] to believe that a theory more complete than quantized general relativity must describe the behavior near the Planck scale.[citation needed]

Comparison with other forces

Unlike the force carriers of the other forces, gravitation plays a special role in general relativity in defining the spacetime in which events take place. In some descriptions, matter modifies the 'shape' of spacetime itself, and gravity is a result of this shape, an idea which at first glance may appear hard to match with the idea of a force acting between particles.[9] Because the diffeomorphism invariance of the theory does not allow any particular space-time background to be singled out as the "true" space-time background, general relativity is said to be background independent. In contrast, the Standard Model is not background independent, with Minkowski space enjoying a special status as the fixed background space-time.[10] A theory of quantum gravity is needed in order to reconcile these differences.[11] Whether this theory should be background independent is an open question. The answer to this question will determine our understanding of what specific role gravitation plays in the fate of the universe.[12]

Gravitons in speculative theories

String theory predicts the existence of gravitons and their well-defined interactions. A graviton in perturbative string theory is a closed string in a very particular low-energy vibrational state. The scattering of gravitons in string theory can also be computed from the correlation functions in conformal field theory, as dictated by the AdS/CFT correspondence, or from matrix theory.[citation needed]

An interesting feature of gravitons in string theory is that, as closed strings without endpoints, they would not be bound to branes and could move freely between them. If we live on a brane (as hypothesized by brane theories) this "leakage" of gravitons from the brane into higher-dimensional space could explain why gravitation is such a weak force, and gravitons from other branes adjacent to our own could provide a potential explanation for dark matter. See brane cosmology.[citation needed]

Experimental observation

Unambiguous detection of individual gravitons, though not prohibited by any fundamental law, is impossible with any physically reasonable detector.[13] The reason is the extremely low cross section for the interaction of gravitons with matter. For example, a detector with the mass of Jupiter and 100% efficiency, placed in close orbit around a neutron star, would only be expected to observe one graviton every 10 years, even under the most favorable conditions. It would be impossible to discriminate these events from the background of neutrinos, since the dimensions of the required neutrino shield would ensure collapse into a black hole.[13]

However, experiments to detect gravitational waves, which may be viewed as coherent states of many gravitons, are underway (e.g., LIGO and VIRGO). Although these experiments cannot detect individual gravitons, they might provide information about certain properties of the graviton.[14] For example, if gravitational waves were observed to propagate slower than c (the speed of light in a vacuum), that would imply that the graviton has mass (however, gravitational waves must propagate slower than "c" in a region with non-zero mass density if they are to be detectable).[15] Astronomical observations of the kinematics of galaxies, especially the galaxy rotation problem and modified Newtonian dynamics, might point toward gravitons having non-zero mass.[16]

Difficulties and outstanding issues

Most theories containing gravitons suffer from severe problems. Attempts to extend the Standard Model or other quantum field theories by adding gravitons run into serious theoretical difficulties at high energies (processes involving energies close to or above the Planck scale) because of infinities arising due to quantum effects (in technical terms, gravitation is nonrenormalizable). Since classical general relativity and quantum mechanics seem to be incompatible at such energies, from a theoretical point of view, this situation is not tenable. One possible solution is to replace particles with strings. String theories are quantum theories of gravity in the sense that they reduce to classical general relativity plus field theory at low energies, but are fully quantum mechanical, contain a graviton, and are believed to be mathematically consistent.[17]

Lie group

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Lie_group In mathematics , a Lie gro...