A Medley of Potpourri

Wednesday, August 6, 2014

Ebola virus

From Wikipedia, the free encyclopedia

Species Zaire ebolavirus

Virus classification
Group:	Group V ((-)ssRNA)
Order:	Mononegavirales
Family:	Filoviridae
Genus:	Ebolavirus
Species:	Zaire ebolavirus
Member virus (Abbreviation)
Ebola virus (EBOV)

Ebola virus (formerly officially designated Zaire ebolavirus, or EBOV) is a virological taxon species included in the genus Ebolavirus, family Filoviridae, members are called Filovirus,^[1] the order is Mononegavirales.^[2] The Zaire ebolavirus is the most dangerous of the five species of Ebola viruses of the Ebolavirus genus which are the causative agents of Ebola virus disease.^[2] The virus causes an extremely severe hemorrhagic fever in humans and other primates. EBOV is a select agent, World Health Organization Risk Group 4 Pathogen (requiring Biosafety Level 4-equivalent containment), a U.S. National Institutes of Health/National Institute of Allergy and Infectious Diseases Category A Priority Pathogen, U.S. CDC Centers for Disease Control and Prevention Category A Bioterrorism Agent, and listed as a Biological Agent for Export Control by the Australia Group.

The name Zaire ebolavirus is derived from Zaire, the country (now the Democratic Republic of Congo) in which the Ebola virus was first discovered, and the taxonomic suffix ebolavirus (which denotes an ebolavirus species).^[2]

The EBOV genome is approximately 19 kb in length. It encodes seven structural proteins: nucleoprotein (NP), polymerase cofactor (VP35), (VP40), GP, transcription activator (VP30), VP24, and RNA polymerase (L).^[3]

Structure

EBOV carries a negative-sense RNA genome in virions that are cylindrical/tubular, and contain viral envelope, matrix, and nucleocapsid components. The overall cylinders are generally approx. 80 nm in diameter, and having a virally encoded glycoprotein (GP) projecting as 7-10 nm long spikes from its lipid bilayer surface.^[4] The cylinders are of variable length, typically 800 nm, but sometimes up to 1000 nm long. The outer viral envelope of the virion is derived by budding from domains of host cell membrane into which the GP spikes have been inserted during their biosynthesis.^{[citation needed]} Individual GP molecules appear with spacings of about 10 nm.^{[citation needed]} Viral proteins VP40 and VP24 are located between the envelope and the nucleocapsid (see following), in the matrix space.^[5]

At the center of the virion structure is the nucleocapsid, which is composed of a series of viral proteins attached to a 18-19 kb linear, negative-sense RNA without 3′-polyadenylation or 5′-capping (see following);^{[citation needed]} the RNA is helically wound and complexed with the NP, VP35, VP30, and L proteins;^[6]^{[better source needed]} this helix has a diameter of 80 nm and contains a central channel of 20–30 nm in diameter.

The overall shape of the virions after purification and visualization (e.g., by ultracentrifugation and electron microscopy, respectively) varies considerably; simple cylinders are far less prevalent than structures showing reversed direction, branches, and loops (i.e., U-, shepherd's crook-, 9- or eye bolt-shapes, or other or circular/coiled appearances), the origin of which may be in the laboratory techniques applied.^[7] The characteristic "threadlike" structure is, however, a more general morphologic characteristic of filoviruses (alongside their GP-decorated viral envelope, RNA nucleocapsid, etc.).^[8]

Genome

Each virion contains one molecule of linear, single-stranded, negative-sense RNA, 18,959 to 18,961 nucleotides in length. The 3′ terminus is not polyadenylated and the 5′ end is not capped. It was found that 472 nucleotides from the 3' end and 731 nucleotides from the 5' end are sufficient for replication.^[9] It codes for seven structural proteins and one non-structural protein. The gene order is 3′ – leader – NP – VP35 – VP40 – GP/sGP – VP30 – VP24 – L – trailer – 5′; with the leader and trailer being non-transcribed regions, which carry important signals to control transcription, replication, and packaging of the viral genomes into new virions. The genomic material by itself is not infectious, because viral proteins, among them the RNA-dependent RNA polymerase, are necessary to transcribe the viral genome into mRNAs because it is a negative sense RNA virus, as well as for replication of the viral genome. Sections of the NP and the L genes from filoviruses have been identified as endogenous in the genomes of several groups of small mammals.^[10]

Entry

Host-encoded Niemann–Pick C1 (NPC1), a cholesterol transporter protein, appears to be essential for entry of Ebola virions into the host cell, and for its ultimate replication.^[11]^[12] In one study, mice that were heterozygous for NPC1 were shown to be protected from lethal challenge with mouse-adapted Ebola virus.^[ambiguous]^[jargon]^[11] In another study, small molecules were shown to inhibit Ebola virus infection by preventing viral envelope glycoprotein (GP) from binding to NPC1.^[12]^[13] Hence, NPC1 was shown to be critical to entry of this filovirus, because it mediates infection by binding directly to viral GP.^[12]

When cells from Niemann Pick Type C patients lacking this transporter were exposed to Ebola virus in the laboratory, the cells survived and appeared impervious to the virus, further indicating that Ebola relies on NPC1 to enter cells;^{[citation needed]} mutations in the NPC1 gene in humans were conjectured as a possible mode to make some individuals resistant to this deadly viral disease.^{[citation needed]}^{[speculation?]} The same studies^[which?] described similar results regarding NPC1's role in virus entry for Marburg virus, a related filovirus. A further study has also presented evidence that NPC1 is critical receptor mediating Ebola infection via its direct binding to the viral GP, and that it is the second "lysosomal" domain of NPC1 that mediates this binding.^[14] Together, these studies suggest NPC1 may be potential therapeutic target for an Ebola anti-viral drug.^{[citation needed]}

Replication

Being acellular, viruses such as Ebola do not replicate through any type of cell division; rather, they use a combination of host- and virally encoded enzymes, alongside host cell structures, to produce multiple copies of themselves; these then self-assemble into viral macromolecular structures in the host cell.^[6]^{[better source needed]} Specific steps for Ebola virus include:^{[citation needed]}

The virus attaches to host receptors through the glycoprotein (GP) surface peplomer and is endocytosed into macropinosomes in the host cell.^[15]^{[non-primary source needed]}
Viral membrane fuses with vesicle membrane, nucleocapsid is released into the cytoplasm.
Encapsidated, negative-sense genomic ssRNA is used as a template for the synthesis (3'-5') of polyadenylated, monocistronic mRNAs.^[jargon]
Using the host cell's ribosomes, tRNA molecules, etc., the mRNA is translated into individual viral proteins.
Viral proteins are processed, glycoprotein precursor (GP0) is cleaved to GP1 and GP2, which are then heavily glycosylated using cellular enzymes and substrates. These two molecules assemble, first into heterodimers, and then into trimers to give the surface peplomers. Secreted glycoprotein (sGP) precursor is cleaved to sGP and delta peptide, both of which are released from the cell.^{[citation needed]}
As viral protein levels rise, a switch occurs from translation to replication. Using the negative-sense genomic RNA as a template, a complementary +ssRNA is synthesized; this is then used as a template for the synthesis of new genomic (-)ssRNA, which is rapidly encapsidated.
The newly formed nucleocapsids and envelope proteins associate at the host cell's plasma membrane; budding occurs, destroying the cell.

Types

The five characterised Ebola species are:

Zaire ebolavirus (EBOV; previously ZEBOV): Also known simply as the Zaire virus, ZEBOV has the highest case-fatality rate of the ebolaviruses, up to 90% in some epidemics, with an average case fatality rate of approximately 83% over 27 years. There have been more outbreaks of Zaire ebolavirus than of any other species. The first outbreak occurred on 26 August 1976 in Yambuku.^[16] The first recorded case was Mabalo Lokela, a 44‑year-old schoolteacher. The symptoms resembled malaria, and subsequent patients received quinine. Transmission has been attributed to reuse of unsterilized needles and close personal contact.
Sudan ebolavirus (SUDV; previously SEBOV): Like the Zaire virus, SEBOV emerged in 1976; it was at first assumed identical with the Zaire species.^[17] SEBOV is believed to have broken out first among cotton factory workers in Nzara, Sudan (now South Sudan), with the first case reported as a worker exposed to a potential natural reservoir. The virus was not found in any of the local animals and insects that were tested in response. The carrier is still unknown. The lack of barrier nursing (or "bedside isolation") facilitated the spread of the disease. The most recent outbreak occurred in May, 2004. Twenty confirmed cases were reported in Yambio County, Sudan (now South Sudan), with five deaths resulting. The average fatality rates for SEBOV were 54% in 1976, 68% in 1979, and 53% in 2000 and 2001.
Reston ebolavirus (RESTV; previously REBOV): Discovered during an outbreak of simian hemorrhagic fever virus (SHFV) in crab-eating macaques from Hazleton Laboratories (now Covance) in 1989. Since the initial outbreak in Reston, Virginia, it has since been found in non-human primates in Pennsylvania, Texas and Siena, Italy. In each case, the affected animals had been imported from a facility in the Philippines,^[18] where the virus has also infected pigs.^[19] Despite having a Biosafety status of Level‑4 and its apparent pathogenicity in monkeys, REBOV did not cause disease in exposed human laboratory workers.^[20]
Côte d'Ivoire ebolavirus (TAFV; previously CIEBOV): Also referred to as Taï Forest ebolavirus and by the English place name, "Ivory Coast", it was first discovered among chimpanzees from the Taï Forest in Côte d'Ivoire, Africa, in 1994. Necropsies showed blood within the heart was brown, no obvious marks were seen on the organs, and one necropsy showed lungs filled with blood. Studies of tissue taken from the chimpanzees showed results similar to human cases during the 1976 Ebola outbreaks in Zaire and Sudan. As more dead chimpanzees were discovered, many tested positive for Ebola using molecular techniques. Experts believed the source of the virus was the meat of infected Western Red Colobus monkeys, upon which the chimpanzees preyed. One of the scientists performing the necropsies on the infected chimpanzees contracted Ebola. She developed symptoms similar to those of dengue fever approximately a week after the necropsy, and was transported to Switzerland for treatment. She was discharged from the hospital after two weeks and had fully recovered six weeks after the infection.^[21]
Bundibugyo ebolavirus (BDBV; previously BEBOV): On 24 November 2007, the Uganda Ministry of Health confirmed an outbreak of Ebolavirus in the Bundibugyo District. After confirmation of samples tested by the United States National Reference Laboratories and the CDC, the World Health Organization confirmed the presence of the new species. On 20 February 2008, the Uganda Ministry officially announced the end of the epidemic in Bundibugyo, with the last infected person discharged on 8 January 2008.^[22] An epidemiological study conducted by WHO and Uganda Ministry of Health scientists determined there were 116 confirmed and probable cases of the new Ebola species, and that the outbreak had a mortality rate of 34% (39 deaths). In 2012, there was an outbreak of Bundibugyo ebolavirus in a northeastern province of the Democratic Republic of the Congo. There were 15 confirmed cases and 10 fatalities.^[23]

History

Zaire ebolavirus is pronounced /zɑːˈɪər iːˈboʊləvaɪərəs/ (zah-EER ee-BOH-lə-vy-rəs). Strictly speaking, the pronunciation of "Ebola virus" (/iːˌboʊlə ˈvaɪərəs/) should be distinct from that of the genus-level taxonomic designation "ebolavirus/Ebolavirus/ebolavirus", as "Ebola" is named for the tributary of the Congo River that is pronounced "Ébola" in French,^[24] whereas "ebola-virus" is an "artificial contraction" of the words "Ebola" and "virus," written without a diacritical mark for ease of use by scientific databases and English speakers. According to the rules for taxon naming established by the International Committee on Taxonomy of Viruses (ICTV), the name Zaire ebolavirus is always to be capitalized, italicized, and to be preceded by the word "species". The names of its members (Zaire ebolaviruses) are to be capitalized, are not italicized, and used without articles.^[2]

Ebola virus (abbreviated EBOV) was first described in 1976.^[25]^[26]^[27] Today, the International Committee on Taxonomy of Viruses lists the virus is the single member of the species Zaire ebolavirus, which is included into the genus Ebolavirus, family Filoviridae, order Mononegavirales.
The name Ebola virus is derived from the Ebola River - a river that was at first thought to be in close proximity to the area in Democratic Republic of Congo, previously called Zaire, where the first recorded Ebola virus disease outbreak occurred - and the taxonomic suffix virus.^[2]

The species was introduced in 1998 as Zaire Ebola virus.^[28]^[29] In 2002, the name was changed to Zaire ebolavirus.^[30]^[31]

Previous names

Ebola virus was first introduced as a possible new "strain" of Marburg virus in 1977 by two different research teams.^[25]^[26] At the same time, a third team introduced the name Ebola virus.^[27] In 2000, the virus name was changed to Zaire Ebola virus,^[32]^[33] and in 2005 to Zaire ebolavirus.^[30]^[34] However, most scientific articles continued to refer to Ebola virus or used the terms Ebola virus and Zaire ebolavirus in parallel. Consequently, in 2010, the name Ebola virus was reinstated.^[2] Previous abbreviations for the virus were EBOV-Z (for Ebola virus Zaire) and most recently ZEBOV (for Zaire Ebola virus or Zaire ebolavirus). In 2010, EBOV was reinstated as the abbreviation for the virus.^[2]

Species inclusion criteria

it is found in the Democratic Republic of the Congo, Gabon, or the Republic of the Congo
it has a genome with two or three gene overlaps (VP35/VP40, GP/VP30, VP24/L)
it has a genomic sequence that differs from the type virus by less than 30%

A virus of the species Zaire ebolavirus is an Ebola virus if it has the properties of Zaire ebolaviruses and if its genome diverges from that of the prototype Zaire ebolavirus, Ebola virus variant Mayinga (EBOV/May), by ≤10% at the nucleotide level.^[2]

Epidemiology

EBOV is one of four ebolaviruses that causes Ebola virus disease (EVD) in humans (in the literature also often referred to as Ebola hemorrhagic fever, EHF). In the past, EBOV has caused the following EVD outbreaks:

Ebola virus disease (EVD) outbreaks due to Ebola virus (EBOV) infection
Year	Geographic location	Human cases/deaths (case-fatality rate)
1976	Yambuku, Zaire	318/280 (88%)
1976	Sudan, Sudan	284/151 (53%)
1977	Bonduni, Zaire	1/1 (100%)
1988	Porton Down, United Kingdom	1/0 (0%) [laboratory accident]
1994–1995-	Woleu-Ntem and Ogooué-Ivindo Provinces, Gabon	52/32 (62%)
1995	Kikwit, Zaire	317/245 (77%)
1996	Mayibout 2, Gabon	31/21 (68%)
1996	Sergiyev Posad, Russia	1/1 (100%) [laboratory accident]
1996–1997	Ogooué-Ivindo Province, Gabon; Cuvette-Ouest Department, Republic of the Congo	62/46 (74%)
2001–2002	Ogooué-Ivindo Province, Gabon; Cuvette-Ouest Department, Republic of the Congo	124/97 (78%)
2002	Ogooué-Ivindo Province, Gabon; Cuvette-Ouest Department, Republic of the Congo	11/10 (91%)
2002–2003	Cuvette-Ouest Department, Republic of the Congo; Ogooué-Ivindo Province, Gabon	143/128 (90%)
2003–2004	Cuvette-Ouest Department, Republic of the Congo	35/29 (83%)
2004	Koltsovo, Russia	1/1 (100%) [laboratory accident]
2005	Cuvette-Ouest Department, Republic of the Congo	11/9 (82%)
2008–2009	Kasai Occidental Province, Democratic Republic of the Congo	32/15 (47%)
2014	Guinea, Sierra Leone, Liberia (2014 West Africa Ebola outbreak)	1711/932 (54%) (6 August 2014)

DNA sequencing

Condensed from Wikipedia, the free encyclopedia

Genetics
Part of a series on

Key components
Chromosome DNA RNA Genome Heredity Mutation Nucleotide Variation
Outline Index Glossary
History and topics
Introduction History
Research
DNA sequencing Genetic engineering Genomics ( template) Medical genetics
Branches of genetics
Personalized Medicine
Personalized Medicine
Biology portal
v t e

DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. It includes any method or technology that is used to determine the order of the four bases—adenine, guanine, cytosine, and thymine—in a strand of DNA. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.

Knowledge of DNA sequences has become indispensable for basic biological research, and in numerous applied fields such as diagnostic, biotechnology, forensic biology, and biological systematics. The rapid speed of sequencing attained with modern DNA sequencing technology has been instrumental in the sequencing of complete DNA sequences, or genomes of numerous types and species of life, including the human genome and other complete DNA sequences of many animal, plant, and microbial species.

An example of the results of automated chain-termination DNA sequencing.

The first DNA sequences were obtained in the early 1970s by academic researchers using laborious methods based on two-dimensional chromatography. Following the development of fluorescence-based sequencing methods with automated analysis,^[1] DNA sequencing has become easier and orders of magnitude faster.^[2]

Basic methods

Maxam-Gilbert sequencing

Allan Maxam and Walter Gilbert published a DNA sequencing method in 1977 based on chemical modification of DNA and subsequent cleavage at specific bases.^[15] Also known as chemical sequencing, this method allowed purified samples of double-stranded DNA to be used without further cloning. This method's use of radioactive labeling and its technical complexity discouraged extensive use after refinements in the Sanger methods had been made.

Maxam-Gilbert sequencing requires radioactive labeling at one 5' end of the DNA and purification of the DNA fragment to be sequenced. Chemical treatment then generates breaks at a small proportion of one or two of the four nucleotide bases in each of four reactions (G, A+G, C, C+T). The concentration of the modifying chemicals is controlled to introduce on average one modification per DNA molecule. Thus a series of labeled fragments is generated, from the radiolabeled end to the first "cut" site in each molecule. The fragments in the four reactions are electrophoresed side by side in denaturing acrylamide gels for size separation. To visualize the fragments, the gel is exposed to X-ray film for autoradiography, yielding a series of dark bands each corresponding to a radiolabeled DNA fragment, from which the sequence may be inferred.^[15]

Chain-termination methods

The chain-termination method developed by Frederick Sanger and coworkers in 1977 soon became the method of choice, owing to its relative ease and reliability.^[14]^[35] When invented, the chain-terminator method used fewer toxic chemicals and lower amounts of radioactivity than the Maxam and Gilbert method. Because of its comparative ease, the Sanger method was soon automated and was the method used in the first generation of DNA sequencers.

Sanger sequencing is the method which prevailed from the 80's until the mid-2000s. Over that period, great advances were made in the technique, such as fluorescent labelling, capillary electrophoresis, and general automation. These developments allowed much more efficient sequencing, leading to lower costs. The Sanger method, in mass production form, is the technology which produced the first human genome in 2001, ushering in the age of genomics. However, later in the decade, radically different approaches reached the market, bringing the cost per genome down from $100 million in 2001 to $10,000 in 2011.^[36]

Advanced methods and de novo sequencing

Genomic DNA is fragmented into random pieces and cloned as a bacterial library. DNA from individual bacterial clones is sequenced and the sequence is assembled by using overlapping DNA regions.(click to expand)

Large-scale sequencing often aims at sequencing very long DNA pieces, such as whole chromosomes, although large-scale sequencing can also be used to generate very large numbers of short sequences, such as found in phage display. For longer targets such as chromosomes, common approaches consist of cutting (with restriction enzymes) or shearing (with mechanical forces) large DNA fragments into shorter DNA fragments. The fragmented DNA may then be cloned into a DNA vector and amplified in a bacterial host such as Escherichia coli. Short DNA fragments purified from individual bacterial colonies are individually sequenced and assembled electronically into one long, contiguous sequence. Studies have shown that adding a size selection step to collect DNA fragments of uniform size can improve sequencing efficiency and accuracy of the genome assembly. In these studies, automated sizing has proven to be more reproducible and precise than manual gel sizing.^[37]^[38]^[39]

The term "de novo sequencing" specifically refers to methods used to determine the sequence of DNA with no previously known sequence. De novo translates from Latin as "from the beginning". Gaps in the assembled sequence may be filled by primer walking. The different strategies have different tradeoffs in speed and accuracy; shotgun methods are often used for sequencing large genomes, but its assembly is complex and difficult, particularly with sequence repeats often causing gaps in genome assembly.

Most sequencing approaches use an in vitro cloning step to amplify individual DNA molecules, because their molecular detection methods are not sensitive enough for single molecule sequencing.
Emulsion PCR^[40] isolates individual DNA molecules along with primer-coated beads in aqueous droplets within an oil phase. A polymerase chain reaction (PCR) then coats each bead with clonal copies of the DNA molecule followed by immobilization for later sequencing. Emulsion PCR is used in the methods developed by Marguilis et al. (commercialized by 454 Life Sciences), Shendure and Porreca et al. (also known as "Polony sequencing") and SOLiD sequencing, (developed by Agencourt, later Applied Biosystems, now Life Technologies).^[32]^[41]^[42]

Shotgun sequencing

Shotgun sequencing is a sequencing method designed for analysis of DNA sequences longer than 1000 base pairs, up to and including entire chromosomes. This method requires the target DNA to be broken into random fragments. After sequencing individual fragments, the sequences can be reassembled on the basis of their overlapping regions.^[43]

Massively parallel signature sequencing (MPSS)

The first of the next-generation sequencing technologies, massively parallel signature sequencing (or MPSS), was developed in the 1990s at Lynx Therapeutics, a company founded in 1992 by Sydney Brenner and Sam Eletr. MPSS was a bead-based method that used a complex approach of adapter ligation followed by adapter decoding, reading the sequence in increments of four nucleotides. This method made it susceptible to sequence-specific bias or loss of specific sequences. Because the technology was so complex, MPSS was only performed 'in-house' by Lynx Therapeutics and no DNA sequencing machines were sold to independent laboratories. Lynx Therapeutics merged with Solexa (later acquired by Illumina) in 2004, leading to the development of sequencing-by-synthesis, a simpler approach acquired from Manteia Predictive Medicine, which rendered MPSS obsolete.
However, the essential properties of the MPSS output were typical of later "next-generation" data types, including hundreds of thousands of short DNA sequences. In the case of MPSS, these were typically used for sequencing cDNA for measurements of gene expression levels.^[65]

Polony sequencing

The Polony sequencing method, developed in the laboratory of George M. Church at Harvard, was among the first next-generation sequencing systems and was used to sequence a full genome in 2005. It combined an in vitro paired-tag library with emulsion PCR, an automated microscope, and ligation-based sequencing chemistry to sequence an E. coli genome at an accuracy of >99.9999% and a cost approximately 1/9 that of Sanger sequencing.^[66] The technology was licensed to Agencourt Biosciences, subsequently spun out into Agencourt Personal Genomics, and eventually incorporated into the Applied Biosystems SOLiD platform, which is now owned by Life Technologies, which was recently bought by Thermo Fisher Scientific.

454 pyrosequencing

A parallelized version of pyrosequencing was developed by 454 Life Sciences, which has since been acquired by Roche Diagnostics. The method amplifies DNA inside water droplets in an oil solution (emulsion PCR), with each droplet containing a single DNA template attached to a single primer-coated bead that then forms a clonal colony. The sequencing machine contains many picoliter-volume wells each containing a single bead and sequencing enzymes. Pyrosequencing uses luciferase to generate light for detection of the individual nucleotides added to the nascent DNA, and the combined data are used to generate sequence read-outs.^[32] This technology provides intermediate read length and price per base compared to Sanger sequencing on one end and Solexa and SOLiD on the other.^[33]

Illumina (Solexa) sequencing

Solexa, now part of Illumina, was founded by Shankar Balasubramanian and David Klenerman in 1998, and developed a sequencing method based on reversible dye-terminators technology, and engineered polymerases.^[67] The terminated chemistry was developed internally at Solexa and the concept of the Solexa system was invented by Balasubramanian and Klenerman from Cambridge University's chemistry department. In 2004, Solexa acquired the company Manteia Predictive Medicine in order to gain a massivelly parallel sequencing technology based on "DNA Clusters", which involves the clonal amplification of DNA on a surface. The cluster technology was co-acquired with Lynx Therapeutics of California. Solexa Ltd. later merged with Lynx to form Solexa Inc.

An Illumina HiSeq 2500 sequencer

In this method, DNA molecules and primers are first attached on a slide and amplified with polymerase so that local clonal DNA colonies, later coined "DNA clusters", are formed. To determine the sequence, four types of reversible terminator bases (RT-bases) are added and non-incorporated nucleotides are washed away. A camera takes images of the fluorescently labeled nucleotides, then the dye, along with the terminal 3' blocker, is chemically removed from the DNA, allowing for the next cycle to begin. Unlike pyrosequencing, the DNA chains are extended one nucleotide at a time and image acquisition can be performed at a delayed moment, allowing for very large arrays of DNA colonies to be captured by sequential images taken from a single camera.

An Illumina MiSeq sequencer

Decoupling the enzymatic reaction and the image capture allows for optimal throughput and theoretically unlimited sequencing capacity. With an optimal configuration, the ultimately reachable instrument throughput is thus dictated solely by the analog-to-digital conversion rate of the camera, multiplied by the number of cameras and divided by the number of pixels per DNA colony required for visualizing them optimally (approximately 10 pixels/colony). In 2012, with cameras operating at more than 10 MHz A/D conversion rates and available optics, fluidics and enzymatics, throughput can be multiples of 1 million nucleotides/second, corresponding roughly to 1 human genome equivalent at 1x coverage per hour per instrument, and 1 human genome re-sequenced (at approx. 30x) per day per instrument (equipped with a single camera).^[68]

SOLiD sequencing

Library preparation for the SOLiD platform

Applied Biosystems' (now a Life Technologies brand) SOLiD technology employs sequencing by ligation. Here, a pool of all possible oligonucleotides of a fixed length are labeled according to the sequenced position. Oligonucleotides are annealed and ligated; the preferential ligation by DNA ligase for matching sequences results in a signal informative of the nucleotide at that position. Before sequencing, the DNA is amplified by emulsion PCR. The resulting beads, each containing single copies of the same DNA molecule, are deposited on a glass slide.^[69] The result is sequences of quantities and lengths comparable to Illumina sequencing.^[33] This sequencing by ligation method has been reported to have some issue sequencing palindromic sequences.^[64]

Ion Torrent semiconductor sequencing

Ion Torrent Systems Inc. (now owned by Life Technologies) developed a system based on using standard sequencing chemistry, but with a novel, semiconductor based detection system. This method of sequencing is based on the detection of hydrogen ions that are released during the polymerisation of DNA, as opposed to the optical methods used in other sequencing systems. A microwell containing a template DNA strand to be sequenced is flooded with a single type of nucleotide. If the introduced nucleotide is complementary to the leading template nucleotide it is incorporated into the growing complementary strand. This causes the release of a hydrogen ion that triggers a hypersensitive ion sensor, which indicates that a reaction has occurred. If homopolymer repeats are present in the template sequence multiple nucleotides will be incorporated in a single cycle. This leads to a corresponding number of released hydrogens and a proportionally higher electronic signal.^[70]

Sequencing of the TAGGCT template with IonTorrent, PacBioRS and GridION

DNA nanoball sequencing

DNA nanoball sequencing is a type of high throughput sequencing technology used to determine the entire genomic sequence of an organism. The company Complete Genomics uses this technology to sequence samples submitted by independent researchers. The method uses rolling circle replication to amplify small fragments of genomic DNA into DNA nanoballs. Unchained sequencing by ligation is then used to determine the nucleotide sequence.^[71] This method of DNA sequencing allows large numbers of DNA nanoballs to be sequenced per run and at low reagent costs compared to other next generation sequencing platforms.^[72] However, only short sequences of DNA are determined from each DNA nanoball which makes mapping the short reads to a reference genome difficult.^[71] This technology has been used for multiple genome sequencing projects and is scheduled to be used for more.^[73]

Heliscope single molecule sequencing

Heliscope sequencing is a method of single-molecule sequencing developed by Helicos Biosciences.
It uses DNA fragments with added poly-A tail adapters which are attached to the flow cell surface.
The next steps involve extension-based sequencing with cyclic washes of the flow cell with fluorescently labeled nucleotides (one nucleotide type at a time, as with the Sanger method). The reads are performed by the Heliscope sequencer. The reads are short, up to 55 bases per run, but recent improvements allow for more accurate reads of stretches of one type of nucleotides.^[74]^[75]
This sequencing method and equipment were used to sequence the genome of the M13 bacteriophage.^[76]

Single molecule real time (SMRT) sequencing

SMRT sequencing is based on the sequencing by synthesis approach. The DNA is synthesized in zero-mode wave-guides (ZMWs) – small well-like containers with the capturing tools located at the bottom of the well. The sequencing is performed with use of unmodified polymerase (attached to the ZMW bottom) and fluorescently labelled nucleotides flowing freely in the solution. The wells are constructed in a way that only the fluorescence occurring by the bottom of the well is detected. The fluorescent label is detached from the nucleotide upon its incorporation into the DNA strand, leaving an unmodified DNA strand. According to Pacific Biosciences, the SMRT technology developer, this methodology allows detection of nucleotide modifications (such as cytosine methylation). This happens through the observation of polymerase kinetics. This approach allows reads of 20,000 nucleotides or more, with average read lengths of 5 kilobases.^[59]^[77]

Methods in development

DNA sequencing methods currently under development include labeling the DNA polymerase,^[78] reading the sequence as a DNA strand transits through nanopores,^[79]^[80] and microscopy-based techniques, such as atomic force microscopy or transmission electron microscopy that are used to identify the positions of individual nucleotides within long DNA fragments (>5,000 bp) by nucleotide labeling with heavier elements (e.g., halogens) for visual detection and recording.^[81]^[82] Third generation technologies aim to increase throughput and decrease the time to result and cost by eliminating the need for excessive reagents and harnessing the processivity of DNA polymerase.^[83]

Nanopore DNA sequencing

This method is based on the readout of electrical signals occurring at nucleotides passing by alpha-hemolysin pores covalently bound with cyclodextrin. The DNA passing through the nanopore changes its ion current. This change is dependent on the shape, size and length of the DNA sequence. Each type of the nucleotide blocks the ion flow through the pore for a different period of time. The method has a potential of development as it does not require modified nucleotides, however single nucleotide resolution is not yet available.^[84]

Two main areas of nanopore sequencing in development are solid state nanopore sequencing, and protein based nanopore sequencing. Protein nanopore sequencing utilizes membrane protein complexes ∝-Hemolysin and MspA (Mycobacterium Smegmatis Porin A), which show great promise given their ability to distinguish between individual and groups of nucleotides.^[85] Whereas, solid-state nanopore sequencing utilizes synthetic materials such as silicon nitride and aluminum oxide and it is preferred for its superior mechanical ability and thermal and chemical stability.^[86] The fabrication method is essential for this type of sequencing given that the nanopore array can contain hundreds of pores with diameters smaller than eight nanometers.^[85]

The concept originated from the idea that single stranded DNA or RNA molecules can be electrophoretically driven in a strict linear sequence through a biological pore that can be less than eight nanometers, and can be detected given that the molecules release an ionic current while moving through the pore. The pore contains a detection region capable of recognizing different bases, with each base generating various time specific signals corresponding to the sequence of bases as they cross the pore which are then evaluated.^[86] When implementing this process it is important to note that precise control over the DNA transport through the pore is crucial for success. Various enzymes such as exonucleases and polymerases have been used to moderate this process by positioning them near the pore’s entrance.^[87]

Tunnelling currents DNA sequencing

Another approach uses measurements of the electrical tunnelling currents across single-strand DNA as it moves through a channel. Depending on its electronic structure each base affects the tunnelling current differently, allowing differentiation between different bases.^[88]

The use of tunnelling currents has the potential to sequence orders of magnitude faster than ionic current methods and the sequencing of several DNA oligomers and micro-RNA has already been achieved.^[89]

Sequencing by hybridization

Sequencing by hybridization is a non-enzymatic method that uses a DNA microarray. A single pool of DNA whose sequence is to be determined is fluorescently labeled and hybridized to an array containing known sequences. Strong hybridization signals from a given spot on the array identifies its sequence in the DNA being sequenced.^[90]

This method of sequencing utilizes binding characteristics of a library of short single stranded DNA molecules (oligonucleotides) also called DNA probes to reconstruct a target DNA sequence. Non-specific hybrids are removed by washing and the target DNA is eluted.^[91] Hybrids are re-arranged such that the DNA sequence can be reconstructed. The benefit of this sequencing type is its ability to capture a large number of targets with a homogenous coverage.^[92] Although a large number of chemicals and starting DNA is usually required. But, with the advent of solution based hybridization much less equipment and chemicals are necessary.^[91]

Sequencing with mass spectrometry

Mass spectrometry may be used to determine DNA sequences. Matrix-assisted laser desorption ionization time-of-flight mass spectrometry, or MALDI-TOF MS, has specifically been investigated as an alternative method to gel electrophoresis for visualizing DNA fragments. With this method, DNA fragments generated by chain-termination sequencing reactions are compared by mass rather than by size. The mass of each nucleotide is different from the others and this difference is detectable by mass spectrometry. Single-nucleotide mutations in a fragment can be more easily detected with MS than by gel electrophoresis alone. MALDI-TOF MS can more easily detect differences between RNA fragments, so researchers may indirectly sequence DNA with MS-based methods by converting it to RNA first.^[93]

The higher resolution of DNA fragments permitted by MS-based methods is of special interest to researchers in forensic science, as they may wish to find single-nucleotide polymorphisms in human DNA samples to identify individuals. These samples may be highly degraded so forensic researchers often prefer mitochondrial DNA for its higher stability and applications for lineage studies. MS-based sequencing methods have been used to compare the sequences of human mitochondrial DNA from samples in a Federal Bureau of Investigation database^[94] and from bones found in mass graves of World War I soldiers.^[95]

Early chain-termination and TOF MS methods demonstrated read lengths of up to 100 base pairs.^[96]
Researchers have been unable to exceed this average read size; like chain-termination sequencing alone, MS-based DNA sequencing may not be suitable for large de novo sequencing projects. Even so, a recent study did use the short sequence reads and mass spectroscopy to compare single-nucleotide polymorphisms in pathogenic Streptococcus strains.^[97]

Microfluidic Sanger sequencing

In microfluidic Sanger sequencing the entire thermocycling amplification of DNA fragments as well as their separation by electrophoresis is done on a single glass wafer (approximately 10 cm in diameter) thus reducing the reagent usage as well as cost.^[98] In some instances researchers have shown that they can increase the throughput of conventional sequencing through the use of microchips.^[99] Research will still need to be done in order to make this use of technology effective.

Microscopy-based techniques

This approach directly visualizes the sequence of DNA molecules using electron microscopy. The first identification of DNA base pairs within intact DNA molecules by enzymatically incorporating modified bases, which contain atoms of increased atomic number, direct visualization and identification of individually labeled bases within a synthetic 3,272 base-pair DNA molecule and a 7,249 base-pair viral genome has been demonstrated.^[100]

RNAP sequencing

This method is based on use of RNA polymerase (RNAP), which is attached to a polystyrene bead.
One end of DNA to be sequenced is attached to another bead, with both beads being placed in optical traps. RNAP motion during transcription brings the beads in closer and their relative distance changes, which can then be recorded at a single nucleotide resolution. The sequence is deduced based on the four readouts with lowered concentrations of each of the four nucleotide types, similarly to the
Sanger method.^[101]

RNA polymerase is attached to one end of a polystyrene bead and the other end is attached to the distal end of a DNA fragment. Each bead is then stuck in to an optical trap that levitates the beads.
The interactions between the RNAP and the DNA result in a change in the length of the DNA between the two beads. This change is the measured with precision resulting in a single base resolution on a single DNA molecule. This is then repeated four times where each time there is a lower concentration of one of the four nucleotides, this shares some similarity with the primers used in the Sanger Sequencing method. A comparison is made between regions and sequence information is deduced by comparing the known sequence regions to the unknown sequence regions.^[102]

In vitro virus high-throughput sequencing

A method has been developed to analyze full sets of protein interactions using a combination of 454 pyrosequencing and an in vitro virus mRNA display method. Specifically, this method covalently links proteins of interest to the mRNAs encoding them, then detects the mRNA pieces using reverse transcription PCRs. The mRNA may then be amplified and sequenced. The combined method was titled IVV-HiTSeq and can be performed under cell-free conditions, though its results may not be representative of in vivo conditions.^[103]

Development initiatives

Total cost of sequencing a human genome over time as calculated by the NHGRI.

In October 2006, the X Prize Foundation established an initiative to promote the development of full genome sequencing technologies, called the Archon X Prize, intending to award $10 million to "the first Team that can build a device and use it to sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 100,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $10,000 (US) per genome."^[104]

Each year the National Human Genome Research Institute, or NHGRI, promotes grants for new research and developments in genomics. 2010 grants and 2011 candidates include continuing work in microfluidic, polony and base-heavy sequencing methodologies.^[105]

Computational challenges

The sequencing technologies described here produce raw data that needs to be assembled into longer sequences such as complete genomes (sequence assembly). There are many computational challenges to achieve this, such as the evaluation of the raw sequence data which is done by programs and algorithms such as Phred and Phrap. Other challenges have to deal with repetitive sequences that often prevent complete genome assemblies because they occur in many places of the genome. As a consequence, many sequences may not be assigned to particular chromosomes. The production of raw sequence data is only the beginning of its detailed bioinformatical analysis.^[106] Yet new methods for sequencing and correcting sequencing errors were developed.^[107] [1]

Search This Blog

Wednesday, August 6, 2014

Ebola virus

Ebola virus

Structure

Genome

Entry

Replication

Types

History

Previous names

Species inclusion criteria

Epidemiology

DNA sequencing

DNA sequencing

Basic methods

Maxam-Gilbert sequencing

Chain-termination methods

Advanced methods and de novo sequencing

Shotgun sequencing

Massively parallel signature sequencing (MPSS)

Polony sequencing

454 pyrosequencing

Illumina (Solexa) sequencing

SOLiD sequencing

Ion Torrent semiconductor sequencing

DNA nanoball sequencing

Heliscope single molecule sequencing

Single molecule real time (SMRT) sequencing

Methods in development

Nanopore DNA sequencing

Tunnelling currents DNA sequencing

Sequencing by hybridization

Sequencing with mass spectrometry

Microfluidic Sanger sequencing

Microscopy-based techniques

RNAP sequencing

In vitro virus high-throughput sequencing

Development initiatives

Computational challenges

Climate change scenario