A Medley of Potpourri

Genomic information
	Schematic representation of the human diploid karyotype, showing the organization of the genome into chromosomes, as well as annotated bands and sub-bands as seen on G banding. This drawing shows both the female (XX) and male (XY) versions of the 23rd chromosome pair. Chromosomal changes during the cell cycle are displayed at top center. The mitochondrial genome is shown to scale at bottom left.;
NCBI genome ID	51
Ploidy	diploid
Genome size	3,117,275,501 base pairs (bp)
Number of chromosomes	23 pairs

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

Haploid human genomes, which are contained in germ cells (the egg and sperm gamete cells created in the meiosis phase of sexual reproduction before fertilization) consist of 3,054,815,472 DNA base pairs (if X chromosome is used), while female diploid genomes (found in somatic cells) have twice the DNA content.

While there are significant differences among the genomes of human individuals (on the order of 0.1% due to single-nucleotide variants and 0.6% when considering indels), these are considerably smaller than the differences between humans and their closest living relatives, the bonobos and chimpanzees (~1.1% fixed single-nucleotide variants and 4% when including indels). Size in basepairs can vary too; the telomere length decreases after every round of DNA replication.

Although the sequence of the human genome has been completely determined by DNA sequencing in 2022 (including methylation), it is not yet fully understood. Most, but not all, genes have been identified by a combination of high throughput experimental and bioinformatics approaches, yet much work still needs to be done to further elucidate the biological functions of their protein and RNA products (in particular, annotation of the complete CHM13v2.0 sequence is still ongoing). And yet, overlapping genes are quite common, in some cases allowing two protein coding genes from each strand to reuse base pairs twice (for example, genes DCDC2 and KAAG1). Recent results suggest that most of the vast quantities of noncoding DNA within the genome have associated biochemical activities, including regulation of gene expression, organization of chromosome architecture, and signals controlling epigenetic inheritance. There are also a significant number of retroviruses in human DNA, at least 3 of which have been proven to possess an important function (i.e., HIV-like HERV-K, HERV-W, and HERV-FRD play a role in placenta formation by inducing cell-cell fusion).

In 2003, scientists reported the sequencing of 85% of the entire human genome, but as of 2020 at least 8% was still missing. In 2021, scientists reported sequencing the complete female genome (i.e., without the Y chromosome). This sequence identified 19,969 protein-coding sequences, accounting for approximately 1.5% of the genome, and 63,494 genes in total, most of them being non-coding RNA genes. The genome consists of regulatory DNA sequences, LINEs, SINEs, introns, and sequences for which as yet no function has been determined. The human Y chromosome, consisting of 62,460,029 base pairs from a different cell line and found in all males, was sequenced completely in January 2022.

In 2023, a draft human pangenome reference was published. It is based on 47 genomes from persons of varied ethnicity. Plans are underway for an improved reference capturing still more biodiversity from a still wider sample.

Chromo- some	Length	Base pairs	Varia- tions	Protein- coding genes	Pseudo- genes	Total long ncRNA	Total small ncRNA	miRNA	rRNA	snRNA	snoRNA	Misc ncRNA	Links	Centromere position (Mbp)	Cumu- lative (%)
1	8.5 cm	248,387,328	12,151,146	2058	1220	1200	496	134	66	221	145	192	EBI	125	7.9
2	8.3 cm	242,696,752	12,945,965	1309	1023	1037	375	115	40	161	117	176	EBI	93.3	16.2
3	6.7 cm	201,105,948	10,638,715	1078	763	711	298	99	29	138	87	134	EBI	91	23
4	6.5 cm	193,574,945	10,165,685	752	727	657	228	92	24	120	56	104	EBI	50.4	29.6
5	6.2 cm	182,045,439	9,519,995	876	721	844	235	83	25	106	61	119	EBI	48.4	35.8
6	5.8 cm	172,126,628	9,130,476	1048	801	639	234	81	26	111	73	105	EBI	61	41.6
7	5.4 cm	160,567,428	8,613,298	989	885	605	208	90	24	90	76	143	EBI	59.9	47.1
8	5.0 cm	146,259,331	8,221,520	677	613	735	214	80	28	86	52	82	EBI	45.6	52
9	4.8 cm	150,617,247	6,590,811	786	661	491	190	69	19	66	51	96	EBI	49	56.3
10	4.6 cm	134,758,134	7,223,944	733	568	579	204	64	32	87	56	89	EBI	40.2	60.9
11	4.6 cm	135,127,769	7,535,370	1298	821	710	233	63	24	74	76	97	EBI	53.7	65.4
12	4.5 cm	133,324,548	7,228,129	1034	617	848	227	72	27	106	62	115	EBI	35.8	70
13	3.9 cm	113,566,686	5,082,574	327	372	397	104	42	16	45	34	75	EBI	17.9	73.4
14	3.6 cm	101,161,492	4,865,950	830	523	533	239	92	10	65	97	79	EBI	17.6	76.4
15	3.5 cm	99,753,195	4,515,076	613	510	639	250	78	13	63	136	93	EBI	19	79.3
16	3.1 cm	96,330,374	5,101,702	873	465	799	187	52	32	53	58	51	EBI	36.6	82
17	2.8 cm	84,276,897	4,614,972	1197	531	834	235	61	15	80	71	99	EBI	24	84.8
18	2.7 cm	80,542,538	4,035,966	270	247	453	109	32	13	51	36	41	EBI	17.2	87.4
19	2.0 cm	61,707,364	3,858,269	1472	512	628	179	110	13	29	31	61	EBI	26.5	89.3
20	2.1 cm	66,210,255	3,439,621	544	249	384	131	57	15	46	37	68	EBI	27.5	91.4
21	1.6 cm	45,090,682	2,049,697	234	185	305	71	16	5	21	19	24	EBI	13.2	92.6
22	1.7 cm	51,324,926	2,135,311	488	324	357	78	31	5	23	23	62	EBI	14.7	93.8
X	5.3 cm	154,259,566	5,753,881	842	874	271	258	128	22	85	64	100	EBI	60.6	99.1
Y	2.0 cm	62,460,029	211,643	71	388	71	30	15	7	17	3	8	EBI	10.4	100
mtDNA	5.4 μm	16,569	929	13	0	0	24	0	2	0	0	0	EBI	N/A	100

hapl 1-23 + X	104 cm	3,054,815,472		20328	14212	14656	4983	1741	523	1927	1518	2205
hapl 1-23 + Y	101 cm	2,963,015,935		19557	13726	14456	4755	1628	508	1859	1457	2113
dipl + mt`♀`	208.23 cm	6,109,647,513		40669	28424	29312	9990	3482	1048	3854	3036	4410
dipl + mt`♂`	205.00 cm	6,017,847,976		39898	27938	29112	9762	3369	1033	3786	2975	4318

	Gencode	Ensembl	Refseq	CHESS
protein-coding genes	19,901	20,376	20,345	21,306
lncRNA genes	15,779	14,720	17,712	18,484
antisense RNA	5501		28	2694
miscellaneous RNA	2213	2222	13,899	4347
Pseudogenes	14,723	1740	15,952
total transcripts	203,835	203,903	154,484	328,827

Protein	Chrom	Gene	Length	Exons	Exon length	Intron length	Alt splicing
Breast cancer type 2 susceptibility protein	13	BRCA2	83,736	27	11,386	72,350	yes
Cystic fibrosis transmembrane conductance regulator	7	CFTR	202,881	27	4,440	198,441	yes
Cytochrome b	MT	MTCYB	1,140	1	1,140	0	no
Dystrophin	X	DMD	2,220,381	79	10,500	2,209,881	yes
Glyceraldehyde-3-phosphate dehydrogenase	12	GAPDH	4,444	9	1,425	3,019	yes
Hemoglobin beta subunit	11	HBB	1,605	3	626	979	no
Histone H1A	6	HIST1H1A	781	1	781	0	no
Titin	2	TTN	281,434	364	104,301	177,133	yes

Disorder	Prevalence	Chromosome or gene involved
Chromosomal conditions
Down syndrome	1:600	Chromosome 21
Klinefelter syndrome	1:500–1000 males	Additional X chromosome
Turner syndrome	1:2000 females	Loss of X chromosome
Sickle cell anemia	1 in 50 births in parts of Africa; rarer elsewhere	β-globin (on chromosome 11)
Bloom syndrome	1:48000 Ashkenazi Jews	BLM
Cancers
Breast/Ovarian cancer (susceptibility)	~5% of cases of these cancer types	BRCA1, BRCA2
FAP (hereditary nonpolyposis coli)	1:3500	APC
Lynch syndrome	5–10% of all cases of bowel cancer	MLH1, MSH2, MSH6, PMS2
Fanconi anemia	1:130000 births	FANCC
Neurological conditions
Huntington disease	1:20000	Huntingtin
Alzheimer disease - early onset	1:2500	PS1, PS2, APP
Tay-Sachs	1:3600 births in Ashkenazi Jews	HEXA gene (on chromosome 15)
Canavan disease	2.5% Eastern European Jewish ancestry	ASPA gene (on chromosome 17)
Familial dysautonomia	600 known cases worldwide since discovery	IKBKAP gene (on chromosome 9)
Fragile X syndrome	1.4:10000 in males, 0.9:10000 in females	FMR1 gene (on X chromosome)
Mucolipidosis type IV	1:90 to 1:100 in Ashkenazi Jews	MCOLN1
Other conditions
Cystic fibrosis	1:2500	CFTR
Duchenne muscular dystrophy	1:3500 boys	Dystrophin
Becker muscular dystrophy	1.5–6:100000 males	DMD
Beta thalassemia	1:100000	HBB
Congenital adrenal hyperplasia	1:280 in Native Americans and Yupik Eskimos 1:15000 in American Caucasians	CYP21A2
Glycogen storage disease type I	1:100000 births in America	G6PC
Maple syrup urine disease	1:180000 in the U.S. 1:176 in Mennonite/Amish communities 1:250000 in Austria	BCKDHA, BCKDHB, DBT, DLD
Niemann–Pick disease, SMPD1-associated	1,200 cases worldwide	SMPD1
Usher syndrome	1:23000 in the U.S. 1:28000 in Norway 1:12500 in Germany	CDH23, CLRN1, DFNB31, GPR98, MYO7A, PCDH15, USH1C, USH1G, USH2A

compound	relative contribution	amount emitted (Tg/y)
isoprene	62.2%	594±34
terpenes	10.9%	95±3
pinene isomers	5.6%	48.7±0.8
sesquiterpenes	2.4%	20±1
methanol	6.4%	130±4

Search This Blog

Monday, December 4, 2023

Human genome

Sequencing

Achieving completeness

Molecular organization and gene content

Information content

Coding vs. noncoding DNA

Coding sequences (protein-coding genes)

Noncoding DNA (ncDNA)

Pseudogenes

Genes for noncoding RNA (ncRNA)

Introns and untranslated regions of mRNA

Regulatory DNA sequences

Repetitive DNA sequences

Mobile genetic elements (transposons) and their relics

Genomic variation in humans

Human reference genome

Measuring human genetic variation

Mapping human genomic variation

Structural variation

SNP frequency across the human genome

Personal genomes

Human knockouts

Human genetic disorders

Evolution

Mitochondrial DNA

Epigenome

Statistical model

Introduction

Formal definition

An example

General remarks

Dimension of a model

Nested models

Comparing models

Volatile organic compound

Definitions

Canada

European Union

China

India

United States

Biologically generated VOCs

Anthropogenic sources

Indoor VOCs

Indoor air quality measurements

Regulation of indoor VOC emissions

Health risks

Ingestion

Dermal absorption

Limit values for VOC emissions

VOCs in healthcare settings

Analytical methods

Sampling

Principle and measurement methods

Chemical fingerprinting and breath analysis

Metrology for VOC measurements

Schrödinger's cat