A Medley of Potpourri

Friday, September 21, 2018

Eukaryote

From Wikipedia, the free encyclopedia

Eukaryotes Temporal range: Orosirian - Present 1850–0Ma Had'n Archean Proterozoic Pha.

Eukaryotes and some examples of their diversity – clockwise from top left: Red mason bee, Boletus edulis, Common chimpanzee, Isotricha intestinalis, Persian buttercup, and Volvox carteri
Scientific classification
Domain:	Eukaryota (Chatton, 1925) Whittaker & Margulis, 1978
Supergroups and kingdoms
Archaeplastida Kingdom Plantae – Plants Hacrobia SAR (Stramenopiles + Alveolata + Rhizaria) Excavata Amoebozoa Opisthokonta Kingdom Animalia – Animals Kingdom Fungi Eukaryotic organisms that cannot be classified under the kingdoms Plantae, Animalia or Fungi are sometimes grouped in the kingdom Protista.

Eukaryotes (/juːˈkærioʊt, -ət/) are organisms whose cells have a nucleus enclosed within membranes, unlike prokaryotes (Bacteria and Archaea). Eukaryotes belong to the domain Eukaryota or Eukarya. Their name comes from the Greek εὖ (eu, "well" or "true") and κάρυον (karyon, "nut" or "kernel"). Eukaryotic cells also contain other membrane-bound organelles such as mitochondria and the Golgi apparatus, and in addition, some cells of plants and algae contain chloroplasts. Unlike unicellular archaea and bacteria, eukaryotes may also be multicellular and include organisms consisting of many cell types forming different kinds of tissue. Animals and plants are the most familiar eukaryotes.

Eukaryotes can reproduce both asexually through mitosis and sexually through meiosis and gamete fusion. In mitosis, one cell divides to produce two genetically identical cells. In meiosis, DNA replication is followed by two rounds of cell division to produce four haploid daughter cells. These act as sex cells (gametes). Each gamete has just one set of chromosomes, each a unique mix of the corresponding pair of parental chromosomes resulting from genetic recombination during meiosis.
The domain Eukaryota appears to be monophyletic, and makes up one of the domains of life in the three-domain system. The two other domains, Bacteria and Archaea, are prokaryotes and have none of the above features. Eukaryotes represent a tiny minority of all living things. However, due to their generally much larger size, their collective worldwide biomass is estimated to be about equal to that of prokaryotes. Eukaryotes evolved approximately 1.6–2.1 billion years ago, during the Proterozoic eon.

History

Konstantin Mereschkowski proposed a symbiotic origin for cells with nuclei.

The concept of the eukaryote has been attributed to the French biologist Edouard Chatton (1883-1947). The terms prokaryote and eukaryote were more definitively reintroduced by the Canadian microbiologist Roger Stanier and the Dutch-American microbiologist C. B. van Niel in 1962. In his 1938 work Titres et Travaux Scientifiques, Chatton had proposed the two terms, calling the bacteria prokaryotes and organisms with nuclei in their cells eukaryotes. However he mentioned this in only one paragraph, and the idea was effectively ignored until Chatton's statement was rediscovered by Stanier and van Niel.

In 1905 and 1910, the Russian biologist Konstantin Mereschkowski (1855–1921) argued that plastids were reduced cyanobacteria in a symbiosis with a non-photosynthetic (heterotrophic) host that was itself formed by symbiosis between an amoeba-like host and a bacterium-like cell that formed the nucleus. Plants had thus inherited photosynthesis from cyanobacteria.

In 1967, Lynn Margulis provided microbiological evidence for endosymbiosis as the origin of chloroplasts and mitochondria in eukaryotic cells in her paper, On the origin of mitosing cells. In the 1970s, Carl Woese explored microbial phylogenetics, studying variations in 16S ribosomal RNA. This helped to uncover the origin of the eukaryotes and the symbiogenesis of two important eukaryote organelles, mitochondria and chloroplasts. In 1977, Woese and George Fox introduced a "third form of life", which they called the Archaebacteria; in 1990, Woese, Otto Kandler and Mark L. Wheeler renamed this the Archaea.

In 1979, G. W. Gould and G. J. Dring suggested that the eukaryotic cell's nucleus came from the ability of Gram-positive bacteria to form endospores. In 1987 and later papers, Thomas Cavalier-Smith proposed instead that the membranes of the nucleus and endoplasmic reticulum first formed by infolding a prokaryote's plasma membrane. In the 1990s, several other biologists proposed endosymbiotic origins for the nucleus, effectively reviving Mereschkowsky's theory.

Cell features

Eukaryotic cells are typically much larger than those of prokaryotes having a volume of around 10,000 times greater than the prokaryotic cell. They have a variety of internal membrane-bound structures, called organelles, and a cytoskeleton composed of microtubules, microfilaments, and intermediate filaments, which play an important role in defining the cell's organization and shape. Eukaryotic DNA is divided into several linear bundles called chromosomes, which are separated by a microtubular spindle during nuclear division.

Internal membrane

The endomembrane system and its components

Eukaryote cells include a variety of membrane-bound structures, collectively referred to as the endomembrane system. Simple compartments, called vesicles and vacuoles, can form by budding off other membranes. Many cells ingest food and other materials through a process of endocytosis, where the outer membrane invaginates and then pinches off to form a vesicle. It is probable that most other membrane-bound organelles are ultimately derived from such vesicles. Alternatively some products produced by the cell can leave in a vesicle through exocytosis.

The nucleus is surrounded by a double membrane (commonly referred to as a nuclear membrane or nuclear envelope), with pores that allow material to move in and out. Various tube- and sheet-like extensions of the nuclear membrane form the endoplasmic reticulum, which is involved in protein transport and maturation. It includes the rough endoplasmic reticulum where ribosomes are attached to synthesize proteins, which enter the interior space or lumen. Subsequently, they generally enter vesicles, which bud off from the smooth endoplasmic reticulum. In most eukaryotes, these protein-carrying vesicles are released and further modified in stacks of flattened vesicles (cisternae), the Golgi apparatus.

Vesicles may be specialized for various purposes. For instance, lysosomes contain digestive enzymes that break down most biomolecules in the cytoplasm. Peroxisomes are used to break down peroxide, which is otherwise toxic. Many protozoans have contractile vacuoles, which collect and expel excess water, and extrusomes, which expel material used to deflect predators or capture prey. In higher plants, most of a cell's volume is taken up by a central vacuole, which mostly contains water and primarily maintains its osmotic pressure.

Mitochondria and plastids

Simplified structure of a mitochondrion

Mitochondria are organelles found in all but one eukaryote. Mitochondria provide energy to the eukaryote cell by converting sugars into ATP. They have two surrounding membranes, each a phospholipid bi-layer; the inner of which is folded into invaginations called cristae where aerobic respiration takes place.

The outer mitochondrial membrane is freely permeable and allows almost anything to enter into the intermembrane space while the inner mitochondrial membrane is semi permeable so allows only some required things into the mitochondrial matrix.

Mitochondria contain their own DNA, which has close structural similarities to bacterial DNA, and which encodes rRNA and tRNA genes that produce RNA which is closer in structure to bacterial RNA than to eukaryote RNA. They are now generally held to have developed from endosymbiotic prokaryotes, probably proteobacteria.

Some eukaryotes, such as the metamonads such as Giardia and Trichomonas, and the amoebozoan Pelomyxa, appear to lack mitochondria, but all have been found to contain mitochondrion-derived organelles, such as hydrogenosomes and mitosomes, and thus have lost their mitochondria secondarily. They obtain energy by enzymatic action on nutrients absorbed from the environment. The metamonad Monocercomonoides has also acquired, by lateral gene transfer, a cytosolic sulfur mobilisation system which provides the clusters of iron and sulfur required for protein synthesis. The normal mitochondrial iron-sulfur cluster pathway has been lost secondarily.

Plants and various groups of algae also have plastids. Plastids also have their own DNA and are developed from endosymbionts, in this case cyanobacteria. They usually take the form of chloroplasts which, like cyanobacteria, contain chlorophyll and produce organic compounds (such as glucose) through photosynthesis. Others are involved in storing food. Although plastids probably had a single origin, not all plastid-containing groups are closely related. Instead, some eukaryotes have obtained them from others through secondary endosymbiosis or ingestion. The capture and sequestering of photosynthetic cells and chloroplasts occurs in many types of modern eukaryotic organisms and is known as kleptoplasty.

Endosymbiotic origins have also been proposed for the nucleus, and for eukaryotic flagella.

Cytoskeletal structures

Longitudinal section through the flagellum of Chlamydomonas reinhardtii

Many eukaryotes have long slender motile cytoplasmic projections, called flagella, or similar structures called cilia. Flagella and cilia are sometimes referred to as undulipodia, and are variously involved in movement, feeding, and sensation. They are composed mainly of tubulin. These are entirely distinct from prokaryotic flagellae. They are supported by a bundle of microtubules arising from a centriole, characteristically arranged as nine doublets surrounding two singlets. Flagella also may have hairs, or mastigonemes, and scales connecting membranes and internal rods. Their interior is continuous with the cell's cytoplasm.

Microfilamental structures composed of actin and actin binding proteins, e.g., α-actinin, fimbrin, filamin are present in submembraneous cortical layers and bundles, as well. Motor proteins of microtubules, e.g., dynein or kinesin and actin, e.g., myosins provide dynamic character of the network.

Centrioles are often present even in cells and groups that do not have flagella, but conifers and flowering plants have neither. They generally occur in groups that give rise to various microtubular roots. These form a primary component of the cytoskeletal structure, and are often assembled over the course of several cell divisions, with one flagellum retained from the parent and the other derived from it. Centrioles produce the spindle during nuclear division.

The significance of cytoskeletal structures is underlined in the determination of shape of the cells, as well as their being essential components of migratory responses like chemotaxis and chemokinesis. Some protists have various other microtubule-supported organelles. These include the radiolaria and heliozoa, which produce axopodia used in flotation or to capture prey, and the haptophytes, which have a peculiar flagellum-like organelle called the haptonema.

Cell wall

The cells of plants and algae, fungi and most chromalveolates have a cell wall, a layer outside the cell membrane, providing the cell with structural support, protection, and a filtering mechanism. The cell wall also prevents over-expansion when water enters the cell.

The major polysaccharides making up the primary cell wall of land plants are cellulose, hemicellulose, and pectin. The cellulose microfibrils are linked via hemicellulosic tethers to form the cellulose-hemicellulose network, which is embedded in the pectin matrix. The most common hemicellulose in the primary cell wall is xyloglucan.

Differences among eukaryotic cells

There are many different types of eukaryotic cells, though animals and plants are the most familiar eukaryotes, and thus provide an excellent starting point for understanding eukaryotic structure. Fungi and many protists have some substantial differences, however.

Animal cell

Structure of a typical animal cell

Structure of a typical plant cell

All animals are eukaryotic. Animal cells are distinct from those of other eukaryotes, most notably plants, as they lack cell walls and chloroplasts and have smaller vacuoles. Due to the lack of a cell wall, animal cells can transform into a variety of shapes. A phagocytic cell can even engulf other structures.

Plant cell

Plant cells are quite different from the cells of the other eukaryotic organisms. Their distinctive features are:

A large central vacuole (enclosed by a membrane, the tonoplast), which maintains the cell's turgor and controls movement of molecules between the cytosol and sap
A primary cell wall containing cellulose, hemicellulose and pectin, deposited by the protoplast on the outside of the cell membrane; this contrasts with the cell walls of fungi, which contain chitin, and the cell envelopes of prokaryotes, in which peptidoglycans are the main structural molecules
The plasmodesmata, pores in the cell wall that link adjacent cells and allow plant cells to communicate with adjacent cells. Animals have a different but functionally analogous system of gap junctions between adjacent cells.
Plastids, especially chloroplasts that contain chlorophyll, the pigment that gives plants their green color and allows them to perform photosynthesis
Bryophytes and seedless vascular plants only have flagellae and centrioles in the sperm cells. Sperm of cycads and Ginkgo are large, complex cells that swim with hundreds to thousands of flagellae.
Conifers (Pinophyta) and flowering plants (Angiospermae) lack the flagellae and centrioles that are present in animal cells.

Fungal cell

Fungal Hyphae Cells
1- Hyphal wall 2- Septum 3- Mitochondrion 4- Vacuole 5- Ergosterol crystal 6- Ribosome 7- Nucleus 8- Endoplasmic reticulum 9- Lipid body 10- Plasma membrane 11- Spitzenkörper 12- Golgi apparatus

The cells of fungi are most similar to animal cells, with the following exceptions:

A cell wall that contains chitin
Less definition between cells; the hyphae of higher fungi have porous partitions called septa, which allow the passage of cytoplasm, organelles, and, sometimes, nuclei. Primitive fungi have few or no septa, so each organism is essentially a giant multinucleate supercell; these fungi are described as coenocytic.
Only the most primitive fungi, chytrids, have flagella.

Other eukaryotic cells

Some groups of eukaryotes have unique organelles, such as the cyanelles (unusual chloroplasts) of the glaucophytes, the haptonema of the haptophytes, or the ejectosomes of the cryptomonads. Other structures, such as pseudopodia, are found in various eukaryote groups in different forms, such as the lobose amoebozoans or the reticulose foraminiferans.

Reproduction

This diagram illustrates the twofold cost of sex. If each individual were to contribute to the same number of offspring (two), (a) the sexual population remains the same size each generation, where the (b) asexual population doubles in size each generation.

Cell division generally takes place asexually by mitosis, a process that allows each daughter nucleus to receive one copy of each chromosome. Most eukaryotes also have a life cycle that involves sexual reproduction, alternating between a haploid phase, where only one copy of each chromosome is present in each cell and a diploid phase, wherein two copies of each chromosome are present in each cell. The diploid phase is formed by fusion of two haploid gametes to form a zygote, which may divide by mitosis or undergo chromosome reduction by meiosis. There is considerable variation in this pattern. Animals have no multicellular haploid phase, but each plant generation can consist of haploid and diploid multicellular phases.

Eukaryotes have a smaller surface area to volume ratio than prokaryotes, and thus have lower metabolic rates and longer generation times.

The evolution of sexual reproduction may be a primordial and fundamental characteristic of eukaryotes. Based on a phylogenetic analysis, Dacks and Roger proposed that facultative sex was present in the common ancestor of all eukaryotes. A core set of genes that function in meiosis is present in both Trichomonas vaginalis and Giardia intestinalis, two organisms previously thought to be asexual. Since these two species are descendants of lineages that diverged early from the eukaryotic evolutionary tree, it was inferred that core meiotic genes, and hence sex, were likely present in a common ancestor of all eukaryotes. Eukaryotic species once thought to be asexual, such as parasitic protozoa of the genus Leishmania, have been shown to have a sexual cycle. Also, evidence now indicates that amoebae, previously regarded as asexual, are anciently sexual and that the majority of present-day asexual groups likely arose recently and independently.

Classification

Phylogenetic and symbiogenetic tree of living organisms, showing a view of the origins of eukaryotes & prokaryotes

One hypothesis of eukaryotic relationships. The Opisthokonta group includes both animals (Metazoa) and fungi. Plants (Plantae) are placed in Archaeplastida.

A pie chart of described eukaryote species (except for Excavata), together with a tree showing possible relationships between the groups

In antiquity, the two lineages of animals and plants were recognized. They were given the taxonomic rank of Kingdom by Linnaeus. Though he included the fungi with plants with some reservations, it was later realized that they are quite distinct and warrant a separate kingdom, the composition of which was not entirely clear until the 1980s. The various single-cell eukaryotes were originally placed with plants or animals when they became known. In 1830, the German biologist Georg A. Goldfuss coined the word protozoa to refer to organisms such as ciliates, and this group was expanded until it encompassed all single-celled eukaryotes, and given their own kingdom, the Protista, by Ernst Haeckel in 1866. The eukaryotes thus came to be composed of four kingdoms:

Kingdom Protista
Kingdom Plantae
Kingdom Fungi
Kingdom Animalia

The protists were understood to be "primitive forms", and thus an evolutionary grade, united by their primitive unicellular nature. The disentanglement of the deep splits in the tree of life only really started with DNA sequencing, leading to a system of domains rather than kingdoms as top level rank being put forward by Carl Woese, uniting all the eukaryote kingdoms under the eukaryote domain. At the same time, work on the protist tree intensified, and is still actively going on today. Several alternative classifications have been forwarded, though there is no consensus in the field.

Eukaryotes are a clade usually assessed to be sister to Heimdallarchaeota in the Asgard grouping in the Archaea. The basal groupings are the Opimoda, Diphoda, the Discoba, and the Loukozoa. The Eukaryote root is usually assessed to be near or even in Discoba.

A classification produced in 2005 for the International Society of Protistologists, which reflected the consensus of the time, divided the eukaryotes into six supposedly monophyletic 'supergroups'. However, in the same year (2005), doubts were expressed as to whether some of these supergroups were monophyletic, particularly the Chromalveolata, and a review in 2006 noted the lack of evidence for several of the supposed six supergroups. A revised classification in 2012 recognizes five supergroups.

Archaeplastida (or Primoplantae)	Land plants, green algae, red algae, and glaucophytes
SAR supergroup	Stramenopiles (brown algae, diatoms, etc.), Alveolata, and Rhizaria (Foraminifera, Radiolaria, and various other amoeboid protozoa).
Excavata	Various flagellate protozoa
Amoebozoa	Most lobose amoeboids and slime molds
Opisthokonta	Animals, fungi, choanoflagellates, etc.

There are also smaller groups of eukaryotes whose position is uncertain or seems to fall outside the major groups — in particular, Haptophyta, Cryptophyta, Centrohelida, Telonemia, Picozoa, Apusomonadida, Ancyromonadida, Breviatea, and the genus Collodictyon. Overall, it seems that, although progress has been made, there are still very significant uncertainties in the evolutionary history and classification of eukaryotes. As Roger & Simpson said in 2009 "with the current pace of change in our understanding of the eukaryote tree of life, we should proceed with caution."

In an article published in Nature Microbiology in April 2016 the authors, "reinforced once again that the life we see around us – plants, animals, humans and other so-called eukaryotes – represent a tiny percentage of the world's biodiversity." They classified eukaryote "based on the inheritance of their information systems as opposed to lipid or other cellular structures." Jillian F. Banfield of the University of California, Berkeley and fellow scientists used a super computer to generate a diagram of a new tree of life based on DNA from 3000 species including 2,072 known species and 1,011 newly reported microbial organisms, whose DNA they had gathered from diverse environments. As the capacity to sequence DNA became easier, Banfield and team were able to do metagenomic sequencing—"sequencing whole communities of organisms at once and picking out the individual groups based on their genes alone."

Phylogeny

The rRNA trees constructed during the 1980s and 1990s left most eukaryotes in an unresolved "crown" group (not technically a true crown), which was usually divided by the form of the mitochondrial cristae; see crown eukaryotes. The few groups that lack mitochondria branched separately, and so the absence was believed to be primitive; but this is now considered an artifact of long-branch attraction, and they are known to have lost them secondarily.

As of 2011, there is widespread agreement that the Rhizaria belong with the Stramenopiles and the Alveolata, in a clade dubbed the SAR supergroup, so that Rhizaria is not one of the main eukaryote groups; also that the Amoebozoa and Opisthokonta are each monophyletic and form a clade, often called the unikonts. Beyond this, there does not appear to be a consensus.

It has been estimated that there may be 75 distinct lineages of eukaryotes. Most of these lineages are protists.

The known eukaryote genome sizes vary from 8.2 megabases (Mb) in Babesia bovis to 112,000–220,050 Mb in the dinoflagellate Prorocentrum micans, showing that the genome of the ancestral eukaryote has undergone considerable variation during its evolution. The last common ancestor of all eukaryotes is believed to have been a phagotrophic protist with a nucleus, at least one centriole and cilium, facultatively aerobic mitochondria, sex (meiosis and syngamy), a dormant cyst with a cell wall of chitin and/or cellulose and peroxisomes. Later endosymbiosis led to the spread of plastids in some lineages.

Five supergroups

A global tree of eukaryotes from a consensus of phylogenetic evidence (in particular, phylogenomics), rare genomic signatures, and morphological characteristics is presented in Adl et al. 2012 and Burki 2014/2016 with the Cryptophyta and picozoa having emerged within the Archaeplastida. A similar inclusion of Glaucophyta, Cryptista (and also, unusually, Haptista) has also been made.

In some analyses, the Hacrobia group (Haptophyta + Cryptophyta) is placed next to Archaeplastida, but in other ones it is nested inside the Archaeplastida. However, several recent studies have concluded that Haptophyta and Cryptophyta do not form a monophyletic group. The former could be a sister group to the SAR group, the latter cluster with the Archaeplastida (plants in the broad sense).^\

The division of the eukaryotes into two primary clades, bikonts (Archaeplastida + SAR + Excavata) and unikonts (Amoebozoa + Opisthokonta), derived from an ancestral biflagellar organism and an ancestral uniflagellar organism, respectively, had been suggested earlier. A 2012 study produced a somewhat similar division, although noting that the terms "unikonts" and "bikonts" were not used in the original sense.

A highly converged and congruent set of trees appears in Derelle et al (2015), Ren et al (2016), Yang et al (2017) and Cavalier-Smith (2015) including the supplementary information, resulting in a more conservative and consolidated tree. It is combined with some results from Cavalier-Smith for the basal Opimoda. The main remaining controversies are the root, and the exact positioning of the Rhodophyta and the bikonts Rhizaria, Haptista, Cryptista, Picozoa and Telonemia, many of which may be endosymbyotic eukaryote-eukaryote hybrids. Archeaplastida developed the Chloroplasts probably by endosymbiosis of an ancestor related to a currently extant cyanobacterium, Gloeomargarita lithophora.

Cavalier-Smith's tree

Thomas Cavalier-Smith 2010, 2013, 2014, 2017, and 2018 places the eukaryotic tree's root between Excavata (with ventral feeding groove supported by a microtubular root) and the grooveless Euglenozoa, and monophyletic Chromista, correlated to a single endosymbyotic event of capturing a red-algae. He et al specifically supports rooting eukaryotic tree between a monophyletic Discoba (Discicristata + Jakobida) and a Amorphea-Diaphoretickes clade.

Origin of eukaryotes

The three-domains tree and the Eocyte hypothesis

Phylogenetic tree showing a possible relationship between the eukaryotes and other forms of life; eukaryotes are colored red, archaea green and bacteria blue.

Eocyte tree.

Fossils

The origin of the eukaryotic cell is a milestone in the evolution of life, since eukaryotes include all complex cells and almost all multicellular organisms. The timing of this series of events is hard to determine; Knoll (2006) suggests they developed approximately 1.6–2.1 billion years ago. Some acritarchs are known from at least 1.65 billion years ago, and the possible alga Grypania has been found as far back as 2.1 billion years ago. The Geosiphon-like fossil fungus Diskagma has been found in paleosols 2.2 billion years old.

Organized living structures have been found in the black shales of the Palaeoproterozoic Francevillian B Formation in Gabon, dated at 2.1 billion years old. Eukaryotic life could have evolved at that time. Fossils that are clearly related to modern groups start appearing an estimated 1.2 billion years ago, in the form of a red alga, though recent work suggests the existence of fossilized filamentous algae in the Vindhya basin dating back perhaps to 1.6 to 1.7 billion years ago.

Biomarkers suggest that at least stem eukaryotes arose even earlier. The presence of steranes in Australian shales indicates that eukaryotes were present in these rocks dated at 2.7 billion years old, although it was suggested they could originate from samples contamination.

Whenever their origins, eukaryotes may not have become ecologically dominant until much later; a massive uptick in the zinc composition of marine sediments 800 million years ago has been attributed to the rise of substantial populations of eukaryotes, which preferentially consume and incorporate zinc relative to prokaryotes.

Relationship to Archaea

The nuclear DNA and genetic machinery of eukaryotes is more similar to Archaea than Bacteria, leading to a controversial suggestion that eukaryotes should be grouped with Archaea in the clade Neomura. In other respects, such as membrane composition, eukaryotes are similar to Bacteria. Three main explanations for this have been proposed:

Eukaryotes resulted from the complete fusion of two or more cells, wherein the cytoplasm formed from a eubacterium, and the nucleus from an archaeon, from a virus, or from a pre-cell.
Eukaryotes developed from Archaea, and acquired their eubacterial characteristics through the endosymbiosis of a proto-mitochondrion of eubacterial origin.
Eukaryotes and Archaea developed separately from a modified eubacterium.

Diagram of the origin of life with the Eukaryotes appearing early, not derived from Prokaryotes, as proposed by Richard Egel in 2012. This view implies that the UCA was relatively large and complex.

Alternative proposals include:

The chronocyte hypothesis postulates that a primitive eukaryotic cell was formed by the endosymbiosis of both archaea and bacteria by a third type of cell, termed a chronocyte.
The universal common ancestor (UCA) of the current tree of life was a complex organism that survived a mass extinction event rather than an early stage in the evolution of life. Eukaryotes and in particular akaryotes (Bacteria and Archaea) evolved through reductive loss, so that similarities result from differential retention of original features.

Assuming no other group is involved, there are three possible phylogenies for the Bacteria, Archaea and Eukaryota in which each is monophyletic. These are labelled 1 to 3 in the table below. The eocyte hypothesis is a modification of hypothesis 2 in which the Archaea are paraphyletic.

In recent years, most researchers have favoured either the three domains (3D) or the eocyte hypotheses. An rRNA analyses supports the eocyte scenario, apparently with the Eukaryote root in Excavata.

In this scenario, the Asgard group is seen as a sister taxon of the TACK group, which comprises Crenarchaeota (formerly named eocytes), Thaumarchaeota, and others.

In 2017, there has been significant pushback against this scenario, arguing that the eukaryotes did not emerge within the Archaea. Cunha et al. produced analyses supporting the three domains (3D) or Woese hypothesis (2 in the table above) and rejecting the eocyte hypothesis (4 above). Harish and Kurland found strong support for the earlier two empires (2D) or Mayr hypothesis (1 in the table above), based on analyses of the coding sequences of protein domains. They rejected the eocyte hypothesis as the least likely. A possible interpretation of their analysis is that the universal common ancestor (UCA) of the current tree of life was a complex organism that survived an evolutionary bottleneck, rather than a simpler organism arising early in the history of life.

Endomembrane system and mitochondria

The origins of the endomembrane system and mitochondria are also unclear. The phagotrophic hypothesis proposes that eukaryotic-type membranes lacking a cell wall originated first, with the development of endocytosis, whereas mitochondria were acquired by ingestion as endosymbionts. The syntrophic hypothesis proposes that the proto-eukaryote relied on the proto-mitochondrion for food, and so ultimately grew to surround it. Here the membranes originated after the engulfment of the mitochondrion, in part thanks to mitochondrial genes (the hydrogen hypothesis is one particular version).

In a study using genomes to construct supertrees, Pisani et al. (2007) suggest that, along with evidence that there was never a mitochondrion-less eukaryote, eukaryotes evolved from a syntrophy between an archaea closely related to Thermoplasmatales and an α-proteobacterium, likely a symbiosis driven by sulfur or hydrogen. The mitochondrion and its genome is a remnant of the α-proteobacterial endosymbiont.

Hypotheses

Different hypotheses have been proposed as to how eukaryotic cells came into existence. These hypotheses can be classified into two distinct classes – autogenous models and chimeric models.

Autogenous models

An autogenous model for the origin of eukaryotes.

Autogenous models propose that a proto-eukaryotic cell containing a nucleus existed first, and later acquired mitochondria. According to this model, a large prokaryote developed invaginations in its plasma membrane in order to obtain enough surface area to service its cytoplasmic volume. As the invaginations differentiated in function, some became separate compartments—giving rise to the endomembrane system, including the endoplasmic reticulum, golgi apparatus, nuclear membrane, and single membrane structures such as lysosomes. Mitochondria are proposed to come from the endosymbiosis of an aerobic proteobacterium, and it is assumed that all the eukaryotic lineages that did not acquire mitochondria became extinct. Chloroplasts came about from another endosymbiotic event involving cyanobacteria. Since all eukaryotes have mitochondria, but not all have chloroplasts, the serial endosymbiosis theory proposes that mitochondria came first.

Chimeric models

Chimeric models claim that two prokaryotic cells existed initially – an archaeon and a bacterium. These cells underwent a merging process, either by a physical fusion or by endosymbiosis, thereby leading to the formation of a eukaryotic cell. Within these chimeric models, some studies further claim that mitochondria originated from a bacterial ancestor while others emphasize the role of endosymbiotic processes behind the origin of mitochondria.

Based on the process of mutualistic symbiosis, the hypotheses can be categorized as – the serial endosymbiotic theory (SET), the hydrogen hypothesis (mostly a process of symbiosis where hydrogen transfer takes place among different species), and the syntrophy hypothesis.

According to serial endosymbiotic theory (championed by Lynn Margulis), a union between a motile anaerobic bacterium (like Spirochaeta) and a thermoacidophilic crenarchaeon (like Thermoplasma which is sulfidogenic in nature) gave rise to the present day eukaryotes. This union established a motile organism capable of living in the already existing acidic and sulfurous waters. Oxygen is known to cause toxicity to organisms that lack the required metabolic machinery. Thus, the archaeon provided the bacterium with a highly beneficial reduced environment (sulfur and sulfate were reduced to sulfide). In microaerophilic conditions, oxygen was reduced to water thereby creating a mutual benefit platform. The bacterium on the other hand, contributed the necessary fermentation products and electron acceptors along with its motility feature to the archaeon thereby gaining a swimming motility for the organism. From a consortium of bacterial and archaeal DNA originated the nuclear genome of eukaryotic cells. Spirochetes gave rise to the motile features of eukaryotic cells. Endosymbiotic unifications of the ancestors of alpha-proteobacteria and cyanobacteria, led to the origin of mitochondria and plastids respectively. For example, Thiodendron has been known to have originated via an ectosymbiotic process based on a similar syntrophy of sulfur existing between the two types of bacteria – Desulphobacter and Spirochaeta. However, such an association based on motile symbiosis have never been observed practically. Also there is no evidence of archaeans and spirochetes adapting to intense acid-based environments.

In the hydrogen hypothesis, the symbiotic linkage of an anaerobic and autotrophic methanogenic archaeon (host) with an alpha-proteobacterium (the symbiont) gave rise to the eukaryotes. The host utilized hydrogen (H₂) and carbon dioxide (CO₂) to produce methane while the symbiont, capable of aerobic respiration, expelled H₂ and CO₂ as byproducts of anaerobic fermentation process. The host's methanogenic environment worked as a sink for H₂, which resulted in heightened bacterial fermentation. Endosymbiotic gene transfer (EGT) acted as a catalyst for the host to acquire the symbionts' carbohydrate metabolism and turn heterotrophic in nature. Subsequently, the host's methane forming capability was lost. Thus, the origins of the heterotrophic organelle (symbiont) are identical to the origins of the eukaryotic lineage. In this hypothesis, the presence of H₂ represents the selective force that forged eukaryotes out of prokaryotes.

The syntrophy hypothesis was developed in contrast to the hydrogen hypothesis and proposes the existence of two symbiotic events. According to this theory, the origin of eukaryotic cells was based on metabolic symbiosis (syntrophy) between a methanogenic archaeon and a delta-proteobacterium. This syntrophic symbiosis was initially facilitated by H₂ transfer between different species under anaerobic environments. In earlier stages, an alpha-proteobacterium became a member of this integration, and later developed into the mitochondrion. Gene transfer from a delta-proteobacterium to an archaeon led to the methanogenic archaeon developing into a nucleus. The archaeon constituted the genetic apparatus, while the delta-proteobacterium contributed towards the cytoplasmic features. This theory incorporates two selective forces at the time of nucleus evolution – (a) presence of metabolic partitioning to avoid the harmful effects of the co-existence of anabolic and catabolic cellular pathways, and (b) prevention of abnormal protein biosynthesis due to a vast spread of introns in the archaeal genes after acquiring the mitochondrion and losing methanogenesis.

Human Microbiome Project

From Wikipedia, the free encyclopedia

Human Microbiome Project (HMP)

Owner	US National Institutes of Health
Launched	2007
Website	hmpdacc.org

The Human Microbiome Project (HMP) was a United States National Institutes of Health (NIH) research initiative to improve understanding of the microbial flora involved in human health and disease. Launched in 2007, the first phase (HMP1) focused on identifying and characterizing human microbial flora. The second phase, known as the Integrative Human Microbiome Project (iHMP) launched in 2014 with the aim of generating resources to characterize the microbiome and elucidating the roles of microbes in health and disease states. The program received $170 million in funding by the NIH Common Fund from 2007 to 2016.

Important components of the HMP were culture-independent methods of microbial community characterization, such as metagenomics (which provides a broad genetic perspective on a single microbial community), as well as extensive whole genome sequencing (which provides a "deep" genetic perspective on certain aspects of a given microbial community, i.e. of individual bacterial species). The latter served as reference genomic sequences — 3000 such sequences of individual bacterial isolates are currently planned — for comparison purposes during subsequent metagenomic analysis. The project also financed deep sequencing of bacterial 16S rRNA sequences amplified by polymerase chain reaction from human subjects.

Introduction

Depiction of prevalences of various classes of bacteria at selected sites on human skin

Prior to the HMP launch, it was often reported in popular media and scientific literature that there are about 10 times as many microbial cells and 100 times as many microbial genes in the human body as there are human cells; this figure was based on estimates that the human microbiome includes around 100 trillion bacterial cells and an adult human typically has around 10 trillion human cells. In 2014 the American Academy of Microbiology published a FAQ that emphasized that the number of microbial cells and the number of human cells are both estimates, and noted that recent research had arrived at a new estimate of the number of human cells at around 37 trillion cells, meaning that the ratio of microbial to human cells is probably about 3:1. In 2016 another group published a new estimate of ratio as being roughly 1:1 (1.3:1, with "an uncertainty of 25% and a variation of 53% over the population of standard 70 kg males").

Despite the staggering number of microbes in and on the human body, little was known about their roles in human health and disease. Many of the organisms that make up the microbiome have not been successfully cultured, identified, or otherwise characterized. Organisms thought to be found in the human microbiome, however, may generally be categorized as bacteria, members of domain Archaea, yeasts, and single-celled eukaryotes as well as various helminth parasites and viruses, the latter including viruses that infect the cellular microbiome organisms (e.g., bacteriophages). The HMP set out to discover and characterize the human microbiome, emphasizing oral, skin, vaginal, gastrointestinal, and respiratory sites.

The HMP will address some of the most inspiring, vexing and fundamental scientific questions today. Importantly, it also has the potential to break down the artificial barriers between medical microbiology and environmental microbiology. It is hoped that the HMP will not only identify new ways to determine health and predisposition to diseases but also define the parameters needed to design, implement and monitor strategies for intentionally manipulating the human microbiota, to optimize its performance in the context of an individual's physiology.

The HMP has been described as "a logical conceptual and experimental extension of the Human Genome Project." In 2007 the HMP was listed on the NIH Roadmap for Medical Research as one of the New Pathways to Discovery. Organized characterization of the human microbiome is also being done internationally under the auspices of the International Human Microbiome Consortium. The Canadian Institutes of Health Research, through the CIHR Institute of Infection and Immunity, is leading the Canadian Microbiome Initiative to develop a coordinated and focused research effort to analyze and characterize the microbes that colonize the human body and their potential alteration during chronic disease states.

Contributing Institutions

The HMP involved participation from many research institutions, including Stanford University, the Broad Institute, Virginia Commonwealth University, Washington University, Northeastern University, MIT, the Baylor College of Medicine, and many others. Contributions included data evaluation, construction of reference sequence data sets, ethical and legal studies, technology development, and more.

Phase One (2007-2014)

The HMP1 included research efforts from many institutions. The HMP1 set the following goals:

Develop a reference set of microbial genome sequences and to perform preliminary characterization of the human microbiome
Explore the relationship between disease and changes in the human microbiome
Develop new technologies and tools for computational analysis
Establish a resource repository
Study the ethical, legal, and social implications of human microbiome research

Phase Two (2014-2016)

In 2014, the NIH launched the second phase of the project, known as the Integrative Human Microbiome Project (iHMP). The goal of the iHMP was to produce resources to create a complete characterization of the human microbiome, with a focus on understanding the presence of microbiota in health and disease states. The project mission, as stated by the NIH, was as follows:

The iHMP will create integrated longitudinal datasets of biological properties from both the microbiome and host from three different cohort studies of microbiome-associated conditions using multiple "omics" technologies.

The project encompassed three sub-projects carried out at multiple institutions. Study methods included 16S rRNA gene profiling, whole metagenome shotgun sequencing, whole genome sequencing, metatranscriptomics, metabolomics/lipidomics, and immunoproteomics. The key findings of the iHMP are due for publishing in 2018.

Pregnancy & Preterm Birth

The Vaginal Microbiome Consortium team at Virginia Commonwealth University led research on the Pregnancy & Preterm Birth project with a goal of understanding how the microbiome changes during the gestational period and influences the neonatal microbiome. The project was also concerned with the role of the microbiome in the occurrence of preterm births, which, according to the CDC, account for nearly 10% of all births and constitutes the second leading cause of neonatal death. The project received $7.44 million in NIH funding.

Onset of Inflammatory Bowel Disease (IBD)

The Inflammatory Bowel Disease Multi'omics Data (IBDMDB) team was a multi-institution group of researchers focused on understanding how the gut microbiome changes longitudinally in adults and children suffering from IBD. IBD is an inflammatory autoimmune disorder that manifests as either Crohn's disease or ulcerative colitis and affects about one million Americans. Research participants included cohorts from Massachusetts General Hospital, Emory University Hospital/Cincinnati Children's Hospital, and Cedars-Sinai Medical Center.

Onset of Type 2 Diabetes (T2D)

Researchers from Standford University and the Jackson Laboratory of Genomic Medicine worked together to perform a longitudinal analysis on the biological processes that occur in the microbiome of patients at risk for Type 2 Diabetes. T2D affects nearly 20 million Americans with at least 79 million pre-diabetic patients, and is partially characterized by marked shifts in the microbiome compared to healthy individuals. The project aimed to identify molecules and signaling pathways that play a role in the etiology of the disease.

Achievements

The impact to date of the HMP may be partially assessed by examination of research sponsored by the HMP. Over 190 peer-reviewed publications are listed on the HMP website from June 2009 through August 2012.

Major categories of work funded by HMP include:

Development of new database systems allowing efficient organization, storage, access, search and annotation of massive amounts of data. These include IMG, the Integrated Microbial Genomes database and comparative analysis system; IMG/M, a related system that integrates metagenome data sets with isolate microbial genomes from the IMG system; CharProtDB, a database of experimentally characterized protein annotations; and the Genomes OnLine Database (GOLD), for monitoring the status of genomic and metagenomic projects worldwide and their associated metadata.
Development of tools for comparative analysis that facilitate the recognition of common patterns, major themes and trends in complex data sets. These include RAPSearch2, a fast and memory-efficient protein similarity search tool for next-generation sequencing data; Boulder ALignment Editor (ALE), a web-based RNA alignment tool; WebMGA, a customizable web server for fast metagenomic sequence analysis; and DNACLUST, a tool for accurate and efficient clustering of phylogenetic marker genes
Development of new methods and systems for assembly of massive sequence data sets. No single assembly algorithm addresses all the known problems of assembling short-length sequences, so next-generation assembly programs such as AMOS are modular, offering a wide range of tools for assembly. Novel algorithms have been developed for improving the quality and utility of draft genome sequences.
Assembly of a catalog of sequenced reference genomes of pure bacterial strains from multiple body sites, against which metagenomic results can be compared. The original goal of 600 genomes has been far surpassed; the current goal is for 3000 genomes to be in this reference catalog, sequenced to at least a high-quality draft stage. As of March 2012, 742 genomes have been cataloged.
Establishment of the Data Analysis and Coordination Center (DACC),^[36] which serves as the central repository for all HMP data.
Various studies exploring legal and ethical issues associated with whole genome sequencing research.

Developments funded by HMP include:

New predictive methods for identifying active transcription factor binding sites.
Identification, on the basis of bioinformatic evidence, of a widely distributed, ribosomally produced electron carrier precursor
Time-lapse "moving pictures" of the human microbiome.
Identification of unique adaptations adopted by segmented filamentous bacteria (SFB) in their role as gut commensals. SFB are medically important because they stimulate T helper 17 cells, thought to play a key role in autoimmune disease.
Identification of factors distinguishing the microbiota of healthy and diseased gut.
Identification of a hitherto unrecognized dominant role of Verrucomicrobia in soil bacterial communities.
Identification of factors determining the virulence potential of Gardnerella vaginalis strains in vaginosis.
Identification of a link between oral microbiota and atherosclerosis.
Demonstration that pathogenic species of Neisseria involved in meningitis, septicemia, and sexually transmitted disease exchange virulence factors with commensal species.

Milestones

Reference database established

On 13 June 2012, a major milestone of the HMP was announced by the NIH director Francis Collins. The announcement was accompanied with a series of coordinated articles published in Nature and several journals including the Public Library of Science (PLoS) on the same day. By mapping the normal microbial make-up of healthy humans using genome sequencing techniques, the researchers of the HMP have created a reference database and the boundaries of normal microbial variation in humans.

From 242 healthy U.S. volunteers, more than 5,000 samples were collected from tissues from 15 (men) to 18 (women) body sites such as mouth, nose, skin, lower intestine (stool) and vagina. All the DNA, human and microbial, were analyzed with DNA sequencing machines. The microbial genome data were extracted by identifying the bacterial specific ribosomal RNA, 16S rRNA. The researchers calculated that more than 10,000 microbial species occupy the human ecosystem and they have identified 81 – 99% of the genera. In addition to establishing the human microbiome reference database, the HMP project also discovered several "surprises", which include:

Microbes contribute more genes responsible for human survival than humans' own genes. It is estimated that bacterial protein-coding genes are 360 times more abundant than human genes.
Microbial metabolic activities; for example, digestion of fats; are not always provided by the same bacterial species. The presence of the activities seems to matter more.
Components of the human microbiome change over time, affected by a patient disease state and medication. However, the microbiome eventually returns to a state of equilibrium, even though the composition of bacterial types has changed.

Clinical application

Among the first clinical applications utilizing the HMP data, as reported in several PLoS papers, the researchers found a shift to less species diversity in vaginal microbiome of pregnant women in preparation for birth, and high viral DNA load in the nasal microbiome of children with unexplained fevers. Other studies using the HMP data and techniques include role of microbiome in various diseases in the digestive tract, skin, reproductive organs and childhood disorders.

Pharmaceutical application

Pharmaceutical microbiologists have considered the implications of the HMP data in relation to the presence / absence of 'objectionable' microorganisms in non-sterile pharmaceutical products and in relation to the monitoring of microorganisms within the controlled environments in which products are manufactured. The latter also has implications for media selection and disinfectant efficacy studies.

Human Genome Project

From Wikipedia, the free encyclopedia

Logo of the HGP – the Vitruvian Man by Leonardo da Vinci.

The Human Genome Project (HGP) was an international scientific research project with the goal of determining the sequence of nucleotide base pairs that make up human DNA, and of identifying and mapping all of the genes of the human genome from both a physical and a functional standpoint. It remains the world's largest collaborative biological project. After the idea was picked up in 1984 by the US government when the planning started, the project formally launched in 1990 and was declared complete on April 14, 2003. Funding came from the US government through the National Institutes of Health (NIH) as well as numerous other groups from around the world. A parallel project was conducted outside government by the Celera Corporation, or Celera Genomics, which was formally launched in 1998. Most of the government-sponsored sequencing was performed in twenty universities and research centers in the United States, the United Kingdom, Japan, France, Germany, Spain and China.

The Human Genome Project originally aimed to map the nucleotides contained in a human haploid reference genome (more than three billion). The "genome" of any given individual is unique; mapping the "human genome" involved sequencing a small number of individuals and then assembling these together to get a complete sequence for each chromosome. Therefore, the finished human genome is a mosaic, not representing any one individual.

Human Genome Project

History

The Human Genome Project was a 15-year-long, publicly funded project initiated in 1990 with the objective of determining the DNA sequence of the entire euchromatic human genome within 15 years. In May 1985, Robert Sinsheimer organized a workshop to discuss sequencing the human genome, but for a number of reasons the NIH was uninterested in pursuing the proposal. The following March, the Santa Fe Workshop was organized by Charles DeLisi and David Smith of the Department of Energy's Office of Health and Environmental Research (OHER). At the same time Renato Dulbecco proposed whole genome sequencing in an essay in Science. James Watson followed two months later with a workshop held at the Cold Spring Harbor Laboratory.

The fact that the Santa Fe workshop was motivated and supported by a Federal Agency opened a path, albeit a difficult and tortuous one, for converting the idea into a public policy in the United States. In a memo to the Assistant Secretary for Energy Research (Alvin Trivelpiece), Charles DeLisi, who was then Director of the OHER, outlined a broad plan for the project. This started a long and complex chain of events which led to approved reprogramming of funds that enabled the OHER to launch the Project in 1986, and to recommend the first line item for the HGP, which was in President Reagan's 1988 budget submission, and ultimately approved by the Congress. Of particular importance in Congressional approval was the advocacy of Senator Peter Domenici, whom DeLisi had befriended. Domenici chaired the Senate Committee on Energy and Natural Resources, as well as the Budget Committee, both of which were key in the DOE budget process. Congress added a comparable amount to the NIH budget, thereby beginning official funding by both agencies.
Alvin Trivelpiece sought and obtained the approval of DeLisi's proposal by Deputy Secretary William Flynn Martin. This chart was used in the spring of 1986 by Trivelpiece, then Director of the Office of Energy Research in the Department of Energy, to brief Martin and Under Secretary Joseph Salgado regarding his intention to reprogram $4 million to initiate the project with the approval of Secretary Herrington. This reprogramming was followed by a line item budget of $16 million in the Reagan Administration’s 1987 budget submission to Congress. It subsequently passed both Houses. The Project was planned for 15 years.

Candidate technologies were already being considered for the proposed undertaking at least as early as 1985.

In 1990, the two major funding agencies, DOE and NIH, developed a memorandum of understanding in order to coordinate plans and set the clock for the initiation of the Project to 1990. At that time, David Galas was Director of the renamed “Office of Biological and Environmental Research” in the U.S. Department of Energy’s Office of Science and James Watson headed the NIH Genome Program. In 1993, Aristides Patrinos succeeded Galas and Francis Collins succeeded James Watson, assuming the role of overall Project Head as Director of the U.S. National Institutes of Health (NIH) National Center for Human Genome Research (which would later become the National Human Genome Research Institute). A working draft of the genome was announced in 2000 and the papers describing it were published in February 2001. A more complete draft was published in 2003, and genome "finishing" work continued for more than a decade.

The $3-billion project was formally founded in 1990 by the US Department of Energy and the National Institutes of Health, and was expected to take 15 years. In addition to the United States, the international consortium comprised geneticists in the United Kingdom, France, Australia, China and myriad other spontaneous relationships.

Due to widespread international cooperation and advances in the field of genomics (especially in sequence analysis), as well as major advances in computing technology, a 'rough draft' of the genome was finished in 2000 (announced jointly by U.S. President Bill Clinton and the British Prime Minister Tony Blair on June 26, 2000). This first available rough draft assembly of the genome was completed by the Genome Bioinformatics Group at the University of California, Santa Cruz, primarily led by then graduate student Jim Kent. Ongoing sequencing led to the announcement of the essentially complete genome on April 14, 2003, two years earlier than planned. In May 2006, another milestone was passed on the way to completion of the project, when the sequence of the last chromosome was published in Nature.

State of completion

The project was not able to sequence all the DNA found in human cells. It sequenced only euchromatic regions of the genome, which make up 92% of the human genome. The other regions, called heterochromatic, are found in centromeres and telomeres, and were not sequenced under the project.

The Human Genome Project was declared complete in April 2003. An initial rough draft of the human genome was available in June 2000 and by February 2001 a working draft had been completed and published followed by the final sequencing mapping of the human genome on April 14, 2003. Although this was reported to cover 99% of the euchromatic human genome with 99.99% accuracy, a major quality assessment of the human genome sequence was published on May 27, 2004 indicating over 92% of sampling exceeded 99.99% accuracy which was within the intended goal. Further analyses and papers on the HGP continue to occur.

Applications and proposed benefits

The sequencing of the human genome holds benefits for many fields, from molecular medicine to human evolution. The Human Genome Project, through its sequencing of the DNA, can help us understand diseases including: genotyping of specific viruses to direct appropriate treatment; identification of mutations linked to different forms of cancer; the design of medication and more accurate prediction of their effects; advancement in forensic applied sciences; biofuels and other energy applications; agriculture, animal husbandry, bioprocessing; risk assessment; bioarcheology, anthropology and evolution. Another proposed benefit is the commercial development of genomics research related to DNA based products, a multibillion-dollar industry.

The sequence of the DNA is stored in databases available to anyone on the Internet. The U.S. National Center for Biotechnology Information (and sister organizations in Europe and Japan) house the gene sequence in a database known as GenBank, along with sequences of known and hypothetical genes and proteins. Other organizations, such as the UCSC Genome Browser at the University of California, Santa Cruz, and Ensembl present additional data and annotation and powerful tools for visualizing and searching it. Computer programs have been developed to analyze the data, because the data itself is difficult to interpret without such programs. Generally speaking, advances in genome sequencing technology have followed Moore’s Law, a concept from computer science which states that integrated circuits can increase in complexity at an exponential rate. This means that the speeds at which whole genomes can be sequenced can increase at a similar rate, as was seen during the development of the above-mentioned Human Genome Project.

Techniques and analysis

The process of identifying the boundaries between genes and other features in a raw DNA sequence is called genome annotation and is in the domain of bioinformatics. While expert biologists make the best annotators, their work proceeds slowly, and computer programs are increasingly used to meet the high-throughput demands of genome sequencing projects. Beginning in 2008, a new technology known as RNA-seq was introduced that allowed scientists to directly sequence the messenger RNA in cells. This replaced previous methods of annotation, which relied on inherent properties of the DNA sequence, with direct measurement, which was much more accurate. Today, annotation of the human genome and other genomes relies primarily on deep sequencing of the transcripts in every human tissue using RNA-seq. These experiments have revealed that over 90% of genes contain at least one and usually several alternative splice variants, in which the exons are combined in different ways to produce 2 or more gene products from the same locus.

The genome published by the HGP does not represent the sequence of every individual's genome. It is the combined mosaic of a small number of anonymous donors, all of European origin. The HGP genome is a scaffold for future work in identifying differences among individuals. Subsequent projects sequenced the genomes of multiple distinct ethnic groups, though as of today there is still only one "reference genome."

Findings

Key findings of the draft (2001) and complete (2004) genome sequences include:

There are approximately 22,300 protein-coding genes in human beings, the same range as in other mammals.
The human genome has significantly more segmental duplications (nearly identical, repeated sections of DNA) than had been previously suspected.
At the time when the draft sequence was published fewer than 7% of protein families appeared to be vertebrate specific.

Accomplishment

The first printout of the human genome to be presented as a series of books, displayed at the Wellcome Collection, London

The Human Genome Project was started in 1990 with the goal of sequencing and identifying all three billion chemical units in the human genetic instruction set, finding the genetic roots of disease and then developing treatments. It is considered a megaproject because the human genome has approximately 3.3 billion base pairs. With the sequence in hand, the next step was to identify the genetic variants that increase the risk for common diseases like cancer and diabetes.

It was far too expensive at that time to think of sequencing patients’ whole genomes. So the National Institutes of Health embraced the idea for a "shortcut", which was to look just at sites on the genome where many people have a variant DNA unit. The theory behind the shortcut was that, since the major diseases are common, so too would be the genetic variants that caused them. Natural selection keeps the human genome free of variants that damage health before children are grown, the theory held, but fails against variants that strike later in life, allowing them to become quite common (In 2002 the National Institutes of Health started a $138 million project called the HapMap to catalog the common variants in European, East Asian and African genomes).

The genome was broken into smaller pieces; approximately 150,000 base pairs in length. These pieces were then ligated into a type of vector known as "bacterial artificial chromosomes", or BACs, which are derived from bacterial chromosomes which have been genetically engineered. The vectors containing the genes can be inserted into bacteria where they are copied by the bacterial DNA replication machinery. Each of these pieces was then sequenced separately as a small "shotgun" project and then assembled. The larger, 150,000 base pairs go together to create chromosomes. This is known as the "hierarchical shotgun" approach, because the genome is first broken into relatively large chunks, which are then mapped to chromosomes before being selected for sequencing.

Funding came from the US government through the National Institutes of Health in the United States, and a UK charity organization, the Wellcome Trust, as well as numerous other groups from around the world. The funding supported a number of large sequencing centers including those at Whitehead Institute, the Wellcome Sanger Institute (then called The Sanger Centre) based at the Wellcome Genome Campus, Washington University in St. Louis, and Baylor College of Medicine.

The United Nations Educational, Scientific and Cultural Organization (UNESCO) served as an important channel for the involvement of developing countries in the Human Genome Project.

Public versus private approaches

In 1998, a similar, privately funded quest was launched by the American researcher Craig Venter, and his firm Celera Genomics. Venter was a scientist at the NIH during the early 1990s when the project was initiated. The $300,000,000 Celera effort was intended to proceed at a faster pace and at a fraction of the cost of the roughly $3 billion publicly funded project. The Celera approach was able to proceed at a much more rapid rate, and at a lower cost than the public project because it relied upon data made available by the publicly funded project.

Celera used a technique called whole genome shotgun sequencing, employing pairwise end sequencing, which had been used to sequence bacterial genomes of up to six million base pairs in length, but not for anything nearly as large as the three billion base pair human genome.

Celera initially announced that it would seek patent protection on "only 200–300" genes, but later amended this to seeking "intellectual property protection" on "fully-characterized important structures" amounting to 100–300 targets. The firm eventually filed preliminary ("place-holder") patent applications on 6,500 whole or partial genes. Celera also promised to publish their findings in accordance with the terms of the 1996 "Bermuda Statement", by releasing new data annually (the HGP released its new data daily), although, unlike the publicly funded project, they would not permit free redistribution or scientific use of the data. The publicly funded competitors were compelled to release the first draft of the human genome before Celera for this reason. On July 7, 2000, the UCSC Genome Bioinformatics Group released a first working draft on the web. The scientific community downloaded about 500 GB of information from the UCSC genome server in the first 24 hours of free and unrestricted access.

In March 2000, President Clinton announced that the genome sequence could not be patented, and should be made freely available to all researchers. The statement sent Celera's stock plummeting and dragged down the biotechnology-heavy Nasdaq. The biotechnology sector lost about $50 billion in market capitalization in two days.

Although the working draft was announced in June 2000, it was not until February 2001 that Celera and the HGP scientists published details of their drafts. Special issues of Nature (which published the publicly funded project's scientific paper) and Science (which published Celera's paper) described the methods used to produce the draft sequence and offered analysis of the sequence. These drafts covered about 83% of the genome (90% of the euchromatic regions with 150,000 gaps and the order and orientation of many segments not yet established). In February 2001, at the time of the joint publications, press releases announced that the project had been completed by both groups. Improved drafts were announced in 2003 and 2005, filling in to approximately 92% of the sequence currently.

Genome donors

In the IHGSC international public-sector HGP, researchers collected blood (female) or sperm (male) samples from a large number of donors. Only a few of many collected samples were processed as DNA resources. Thus the donor identities were protected so neither donors nor scientists could know whose DNA was sequenced. DNA clones from many different libraries were used in the overall project, with most of those libraries being created by Pieter J. de Jong's. Much of the sequence (>70%) of the reference genome produced by the public HGP came from a single anonymous male donor from Buffalo, New York (code name RP11).

HGP scientists used white blood cells from the blood of two male and two female donors (randomly selected from 20 of each) – each donor yielding a separate DNA library. One of these libraries (RP11) was used considerably more than others, due to quality considerations. One minor technical issue is that male samples contain just over half as much DNA from the sex chromosomes (one X chromosome and one Y chromosome) compared to female samples (which contain two X chromosomes). The other 22 chromosomes (the autosomes) are the same for both sexes.

Although the main sequencing phase of the HGP has been completed, studies of DNA variation continued in the International HapMap Project, whose goal was to identify patterns of single-nucleotide polymorphism (SNP) groups (called haplotypes, or “haps”). The DNA samples for the HapMap came from a total of 270 individuals: Yoruba people in Ibadan, Nigeria; Japanese people in Tokyo; Han Chinese in Beijing; and the French Centre d’Etude du Polymorphisme Humain (CEPH) resource, which consisted of residents of the United States having ancestry from Western and Northern Europe.

In the Celera Genomics private-sector project, DNA from five different individuals were used for sequencing. The lead scientist of Celera Genomics at that time, Craig Venter, later acknowledged (in a public letter to the journal Science) that his DNA was one of 21 samples in the pool, five of which were selected for use.

In 2007, a team led by Jonathan Rothberg published James Watson's entire genome, unveiling the six-billion-nucleotide genome of a single individual for the first time.

Developments

The work on interpretation and analysis of genome data is still in its initial stages. It is anticipated that detailed knowledge of the human genome will provide new avenues for advances in medicine and biotechnology. Clear practical results of the project emerged even before the work was finished. For example, a number of companies, such as Myriad Genetics, started offering easy ways to administer genetic tests that can show predisposition to a variety of illnesses, including breast cancer, hemostasis disorders, cystic fibrosis, liver diseases and many others. Also, the etiologies for cancers, Alzheimer's disease and other areas of clinical interest are considered likely to benefit from genome information and possibly may lead in the long term to significant advances in their management.

There are also many tangible benefits for biologists. For example, a researcher investigating a certain form of cancer may have narrowed down their search to a particular gene. By visiting the human genome database on the World Wide Web, this researcher can examine what other scientists have written about this gene, including (potentially) the three-dimensional structure of its product, its function(s), its evolutionary relationships to other human genes, or to genes in mice or yeast or fruit flies, possible detrimental mutations, interactions with other genes, body tissues in which this gene is activated, and diseases associated with this gene or other datatypes. Further, deeper understanding of the disease processes at the level of molecular biology may determine new therapeutic procedures. Given the established importance of DNA in molecular biology and its central role in determining the fundamental operation of cellular processes, it is likely that expanded knowledge in this area will facilitate medical advances in numerous areas of clinical interest that may not have been possible without them.

The analysis of similarities between DNA sequences from different organisms is also opening new avenues in the study of evolution. In many cases, evolutionary questions can now be framed in terms of molecular biology; indeed, many major evolutionary milestones (the emergence of the ribosome and organelles, the development of embryos with body plans, the vertebrate immune system) can be related to the molecular level. Many questions about the similarities and differences between humans and our closest relatives (the primates, and indeed the other mammals) are expected to be illuminated by the data in this project.

The project inspired and paved the way for genomic work in other fields, such as agriculture. For example, by studying the genetic composition of Tritium aestivum, the world’s most commonly used bread wheat, great insight has been gained into the ways that domestication has impacted the evolution of the plant. Which loci are most susceptible to manipulation, and how does this play out in evolutionary terms? Genetic sequencing has allowed these questions to be addressed for the first time, as specific loci can be compared in wild and domesticated strains of the plant. This will allow for advances in genetic modification in the future which could yield healthier, more disease-resistant wheat crops.

Ethical, legal and social issues

At the onset of the Human Genome Project several ethical, legal, and social concerns were raised in regards to how increased knowledge of the human genome could be used to discriminate against people. One of the main concerns of most individuals was the fear that both employers and health insurance companies would refuse to hire individuals or refuse to provide insurance to people because of a health concern indicated by someone's genes. In 1996 the United States passed the Health Insurance Portability and Accountability Act (HIPAA) which protects against the unauthorized and non-consensual release of individually identifiable health information to any entity not actively engaged in the provision of healthcare services to a patient.

Along with identifying all of the approximately 20,000–25,000 genes in the human genome, the Human Genome Project also sought to address the ethical, legal, and social issues that were created by the onset of the project. For that the Ethical, Legal, and Social Implications (ELSI) program was founded in 1990. Five percent of the annual budget was allocated to address the ELSI arising from the project. This budget started at approximately $1.57 million in the year 1990, but increased to approximately $18 million in the year 2014.

Whilst the project may offer significant benefits to medicine and scientific research, some authors have emphasized the need to address the potential social consequences of mapping the human genome. "Molecularising disease and their possible cure will have a profound impact on what patients expect from medical help and the new generation of doctors' perception of illness."

Search This Blog

Friday, September 21, 2018

Eukaryote

History

Cell features

Internal membrane

Mitochondria and plastids

Cytoskeletal structures

Cell wall

Differences among eukaryotic cells

Animal cell

Plant cell

Fungal cell

Other eukaryotic cells

Reproduction

Classification

Phylogeny

Five supergroups

Cavalier-Smith's tree

Origin of eukaryotes

Fossils

Relationship to Archaea

Endomembrane system and mitochondria

Hypotheses

Autogenous models

Chimeric models

Human Microbiome Project

Introduction

Contributing Institutions

Phase One (2007-2014)

Phase Two (2014-2016)

Pregnancy & Preterm Birth

Onset of Inflammatory Bowel Disease (IBD)

Onset of Type 2 Diabetes (T2D)

Achievements

Milestones

Reference database established

Clinical application

Pharmaceutical application

Human Genome Project

Human Genome Project

History

State of completion

Applications and proposed benefits

Techniques and analysis

Findings

Accomplishment

Public versus private approaches

Genome donors

Developments

Ethical, legal and social issues

Royal jelly