From Wikipedia, the free encyclopedia

Genomic information
Karyotype.png
Graphical representation of the idealized human diploid karyotype, showing the organization of the genome into chromosomes. This drawing shows both the female (XX) and male (XY) versions of the 23rd chromosome pair. Chromosomes are shown aligned at their centromeres. The mitochondrial DNA is not shown.
NCBI genome ID 51
Ploidy diploid
Genome size 3,234.83 Mb (Mega-basepairs)
Number of chromosomes 23 pairs

The human genome is the complete set of genetic information for humans (Homo sapiens sapiens). This information is encoded as DNA sequences within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. Human genomes include both protein-coding DNA genes and noncoding DNA. Haploid human genomes (contained in egg and sperm cells) consist of three billion DNA base pairs, while diploid genomes (found in somatic cells) have twice the DNA content. While there are significant differences among the genomes of human individuals (on the order of 0.1%)[citation needed], these are considerably smaller than the differences between humans and their closest living relatives, the chimpanzees (approximately 4%[1]) and bonobos.

The Human Genome Project produced the first complete sequences of individual human genomes. As of 2012, thousands of human genomes have been completely sequenced, and many more have been mapped at lower levels of resolution. The resulting data are used worldwide in biomedical science, anthropology, forensics and other branches of science. There is a widely held expectation that genomic studies will lead to advances in the diagnosis and treatment of diseases, and to new insights in many fields of biology, including human evolution.

Although the sequence of the human genome has been (almost) completely determined by DNA sequencing, it is not yet fully understood. Most (though probably not all) genes have been identified by a combination of high throughput experimental and bioinformatics approaches, yet much work still needs to be done to further elucidate the biological functions of their protein and RNA products. Recent results suggest that most of the vast quantities of noncoding DNA within the genome have associated biochemical activities, including regulation of gene expression, organization of chromosome architecture, and signals controlling epigenetic inheritance.

There are an estimated 20,000-25,000 human protein-coding genes. The estimate of the number of human genes has been repeatedly revised down from initial predictions of 100,000 or more as genome sequence quality and gene finding methods have improved, and could continue to drop further,[2][3] Protein-coding sequences account for only a very small fraction of the genome (approximately 1.5%), and the rest is associated with non-coding RNA molecules, regulatory DNA sequences, LINEs, SINEs, introns, and sequences for which as yet no function has been elucidated.[4]

Molecular organization and gene content

The total length of the human genome is over 3 billion base pairs. The genome is organized into 22 paired chromosomes, the X chromosome (one in males, two in females) and, in males only, one Y chromosome, all being large linear DNA molecules contained within the cell nucleus. It also includes the mitochondrial DNA, a comparatively small circular molecule present in each mitochondrion. Basic information about these molecules and their gene content, based on a reference genome that does not represent the sequence of any specific individual, are provided in the following table. (Data source: Ensembl genome browser release 68, July 2012)
Chromosome Length (mm) Base pairs Variations Confirmed proteins Putative proteins Pseudogenes miRNA rRNA snRNA snoRNA Misc ncRNA Links Centromere position (Mbp) Cumulative (%)
1 85 249,250,621 4,401,091 2,012 31 1,130 134 66 221 145 106 EBI 125.0 7.9
2 83 243,199,373 4,607,702 1,203 50 948 115 40 161 117 93 EBI 93.3 16.2
3 67 198,022,430 3,894,345 1,040 25 719 99 29 138 87 77 EBI 91.0 23.0
4 65 191,154,276 3,673,892 718 39 698 92 24 120 56 71 EBI 50.4 29.6
5 62 180,915,260 3,436,667 849 24 676 83 25 106 61 68 EBI 48.4 35.8
6 58 171,115,067 3,360,890 1,002 39 731 81 26 111 73 67 EBI 61.0 41.6
7 54 159,138,663 3,045,992 866 34 803 90 24 90 76 70 EBI 59.9 47.1
8 50 146,364,022 2,890,692 659 39 568 80 28 86 52 42 EBI 45.6 52.0
9 48 141,213,431 2,581,827 785 15 714 69 19 66 51 55 EBI 49.0 56.3
10 46 135,534,747 2,609,802 745 18 500 64 32 87 56 56 EBI 40.2 60.9
11 46 135,006,516 2,607,254 1,258 48 775 63 24 74 76 53 EBI 53.7 65.4
12 45 133,851,895 2,482,194 1,003 47 582 72 27 106 62 69 EBI 35.8 70.0
13 39 115,169,878 1,814,242 318 8 323 42 16 45 34 36 EBI 17.9 73.4
14 36 107,349,540 1,712,799 601 50 472 92 10 65 97 46 EBI 17.6 76.4
15 35 102,531,392 1,577,346 562 43 473 78 13 63 136 39 EBI 19.0 79.3
16 31 90,354,753 1,747,136 805 65 429 52 32 53 58 34 EBI 36.6 82.0
17 28 81,195,210 1,491,841 1,158 44 300 61 15 80 71 46 EBI 24.0 84.8
18 27 78,077,248 1,448,602 268 20 59 32 13 51 36 25 EBI 17.2 87.4
19 20 59,128,983 1,171,356 1,399 26 181 110 13 29 31 15 EBI 26.5 89.3
20 21 63,025,520 1,206,753 533 13 213 57 15 46 37 34 EBI 27.5 91.4
21 16 48,129,895 787,784 225 8 150 16 5 21 19 8 EBI 13.2 92.6
22 17 51,304,566 745,778 431 21 308 31 5 23 23 23 EBI 14.7 93.8
X 53 155,270,560 2,174,952 815 23 780 128 22 85 64 52 EBI 60.6 99.1
Y 20 59,373,566 286,812 45 8 327 15 7 17 3 2 EBI 12.5 100.0
mtDNA 0.0054 16,569 929 13 0 0 0 2 0 0 22 EBI N/A 100.0

Table 1 (above) summarizes the physical organization and gene content of the human reference genome, with links to the original analysis, as published in the Ensembl database at the European Bioinformatics Institute (EBI) and Wellcome Trust Sanger Institute. Chromosome lengths were estimated by multiplying the number of base pairs by 0.34 nanometers, the distance between base pairs in the DNA double helix. The number of proteins is based on the number of initial precursor mRNA transcripts, and does not include products of alternative pre-mRNA splicing, or modifications to protein structure that occur after translation.

The number of variations is a summary of unique DNA sequence changes that have been identified within the sequences analyzed by Ensembl as of July, 2012; that number is expected to increase as further personal genomes are sequenced and examined. In addition to the gene content shown in this table, a large number of non-expressed functional sequences have been identified throughout the human genome (see below). Links open windows to the reference chromosome sequence in the EBI genome browser. The table also describes prevalence of genes encoding structural RNAs in the genome.

MiRNA, or MicroRNA, functions as a post-transcriptional regulator of gene expression. Ribosomal RNA, or rRNA, makes up the RNA portion of the ribosome and is critical in the synthesis of proteins. Small nuclear RNA, or snRNA, is found in the nucleus of the cell. Its primary function is in the processing of pre-mRNA molecules and also in the regulation of transcription factors. SnoRNA, or Small nucleolar RNA, primarily functions in guiding chemical modifications to other RNA molecules.

Completeness of the human genome sequence

Although the human genome has been completely sequenced for all practical purposes, there are still hundreds of gaps in the sequence. A recent study noted more than 160 euchromatic gaps of which 50 gaps were closed.[5] However, there are still numerous gaps in the heterochromatic parts of the genome which is much harder to sequence due to numerous repeats and other intractable sequence features.

Coding vs. noncoding DNA

The content of the human genome is commonly divided into coding and noncoding DNA sequences. Coding DNA is defined as those sequences that can be transcribed into mRNA and translated into proteins during the human life cycle; these sequences occupy only a small fraction of the genome (<2 a="" href="/wiki/Noncoding_DNA" title="Noncoding DNA">Noncoding DNA