From Wikipedia, the free encyclopedia
Third-generation sequencing (also known as long-read sequencing) is a class of DNA sequencing methods currently under active development. Third generation sequencing works by reading the nucleotide sequences at the single molecule level, in contrast to existing methods that require breaking long strands of DNA into small segments then inferring nucleotide sequences by amplification and synthesis. Critical challenges exist in the engineering of the necessary molecular instruments for whole genome sequencing to make the technology commercially available.

Second-generation sequencing, often referred to as Next-generation sequencing (NGS), has dominated the DNA sequencing space since its development. It has dramatically reduced the cost of DNA sequencing by enabling a massively-paralleled approach capable of producing large numbers of reads at exceptionally high coverages throughout the genome.

Since eukaryotic genomes contain many repetitive regions, a major limitation to this class of sequencing methods is the length of reads it produces. Briefly, second generation sequencing works by first amplifying the DNA molecule and then conducting sequencing by synthesis. The collective fluorescent signal resulting from synthesizing a large number of amplified identical DNA strands allows the inference of nucleotide identity. However, due to random errors, DNA synthesis between the amplified DNA strands would become progressively out-of-sync. Quickly, the signal quality deteriorates as the read-length grows. In order to preserve read quality, long DNA molecules must be broken up into small segments, resulting in a critical limitation of second generation sequencing technologies. Computational efforts aimed to overcome this challenge often rely on approximative heuristics that may not result in accurate assemblies.

By enabling direct sequencing of single DNA molecules, third generation sequencing technologies have the capability to produce substantially longer reads than second generation sequencing. Such an advantage has critical implications for both genome science and the study of biology in general. However, third generation sequencing data have much higher error rates than previous technologies, which can complicate downstream genome assembly and analysis of the resulting data. These technologies are undergoing active development and it is expected that there will be improvements to the high error rates. For applications that are more tolerant to error rates, such as structural variant calling, third generation sequencing has been found to outperform existing methods.

Current technologies