Search This Blog

Wednesday, February 19, 2025

Ancient DNA

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Ancient_DNA
Cross-linked DNA extracted from the 4,000-year-old liver of the ancient Egyptian priest Nekht-Ankh

Ancient DNA (aDNA) is DNA isolated from ancient sources (typically specimens, but also environmental DNA). Due to degradation processes (including cross-linking, deamination and fragmentation) ancient DNA is more degraded in comparison with contemporary genetic material. Genetic material has been recovered from paleo/archaeological and historical skeletal material, mummified tissues, archival collections of non-frozen medical specimens, preserved plant remains, ice and from permafrost cores, marine and lake sediments and excavation dirt.

Even under the best preservation conditions, there is an upper boundary of 0.4–1.5 million years for a sample to contain sufficient DNA for sequencing technologies. The oldest DNA sequenced from physical specimens are from mammoth molars in Siberia over 1 million years old. In 2022, two-million-year-old genetic material was recovered from sediments in Greenland, and is currently considered the oldest DNA discovered so far.

History of ancient DNA studies

1980s

Quagga (Equus quagga quagga), an extinct sub-species of zebra.

The first study of what would come to be called aDNA was conducted in 1984, when Russ Higuchi and colleagues at the University of California, Berkeley reported that traces of DNA from a museum specimen of the Quagga not only remained in the specimen over 150 years after the death of the individual, but could be extracted and sequenced. Over the next two years, through investigations into natural and artificially mummified specimens, Svante Pääbo confirmed that this phenomenon was not limited to relatively recent museum specimens but could apparently be replicated in a range of mummified human samples that dated as far back as several thousand years.

The laborious processes that were required at that time to sequence such DNA (through bacterial cloning) were an effective brake on the study of ancient DNA (aDNA) and the field of museomics. However, with the development of the Polymerase Chain Reaction (PCR) in the late 1980s, the field began to progress rapidly. Double primer PCR amplification of aDNA (jumping-PCR) can produce highly skewed and non-authentic sequence artifacts. Multiple primer, nested PCR strategy was used to overcome those shortcomings.

1990s

A diptera (Mycetophilidae) from the Eocene (40-50 million years ago) in a piece of transparent Baltic amber along with other smaller inclusions. Shown under daylight (big photograph) and under UV light (small photograph).

The post-PCR era heralded a wave of publications as numerous research groups claimed success in isolating aDNA. Soon a series of incredible findings had been published, claiming authentic DNA could be extracted from specimens that were millions of years old, into the realms of what Lindahl (1993b) has labelled Antediluvian DNA. The majority of such claims were based on the retrieval of DNA from organisms preserved in amber. Insects such as stingless bees, termites, and wood gnats, as well as plant and bacterial sequences were said to have been extracted from Dominican amber dating to the Oligocene epoch. Still older sources of Lebanese amber-encased weevils, dating to within the Cretaceous epoch, reportedly also yielded authentic DNA. Claims of DNA retrieval were not limited to amber.

Reports of several sediment-preserved plant remains dating to the Miocene were published. Then in 1994, Woodward et al. reported what at the time was called the most exciting results to date — mitochondrial cytochrome b sequences that had apparently been extracted from dinosaur bones dating to more than 80 million years ago. When in 1995 two further studies reported dinosaur DNA sequences extracted from a Cretaceous egg, it seemed that the field would revolutionize knowledge of the Earth's evolutionary past. Even these extraordinary ages were topped by the claimed retrieval of 250-million-year-old halobacterial sequences from halite.

The development of a better understanding of the kinetics of DNA preservation, the risks of sample contamination and other complicating factors led the field to view these results more skeptically. Numerous careful attempts failed to replicate many of the findings, and all of the decade's claims of multi-million year old aDNA would come to be dismissed as inauthentic.

2000s

Single primer extension amplification was introduced in 2007 to address postmortem DNA modification damage. Since 2009 the field of aDNA studies has been revolutionized with the introduction of much cheaper research techniques. The use of high-throughput Next Generation Sequencing (NGS) techniques in the field of ancient DNA research has been essential for reconstructing the genomes of ancient or extinct organisms. A single-stranded DNA (ssDNA) library preparation method has sparked great interest among ancient DNA (aDNA) researchers.

Svante Pääbo (left) with his medal for the Nobel Prize on Physiology or Medicine.

In addition to these technical innovations, the start of the decade saw the field begin to develop better standards and criteria for evaluating DNA results, as well as a better understanding of the potential pitfalls.

2020s

Autumn of 2022, the Nobel Prize of Physiology or Medicine was awarded to Svante Pääbo "for his discoveries concerning the genomes of extinct hominins and human evolution". A few days later, on the 7th of December 2022, a study in Nature reported that two-million year old genetic material was found in Greenland, and is currently considered the oldest DNA discovered so far.

Problems and errors

Degradation processes

Due to degradation processes (including cross-linking, deamination and fragmentation), ancient DNA is of lower quality than modern genetic material. The damage characteristics and ability of aDNA to survive through time restricts possible analyses and places an upper limit on the age of successful samples. There is a theoretical correlation between time and DNA degradation, although differences in environmental conditions complicate matters. Samples subjected to different conditions are unlikely to predictably align to a uniform age-degradation relationship. The environmental effects may even matter after excavation, as DNA decay-rates may increase, particularly under fluctuating storage conditions. Even under the best preservation conditions, there is an upper boundary of 0.4 to 1.5 million years for a sample to contain sufficient DNA for contemporary sequencing technologies.

Research into the decay of mitochondrial and nuclear DNA in moa bones has modelled mitochondrial DNA degradation to an average length of 1 base pair after 6,830,000 years at −5 °C. The decay kinetics have been measured by accelerated aging experiments, further displaying the strong influence of storage temperature and humidity on DNA decay. Nuclear DNA degrades at least twice as fast as mtDNA. Early studies that reported recovery of much older DNA, for example from Cretaceous dinosaur remains, may have stemmed from contamination of the sample.

Age limit

A critical review of ancient DNA literature through the development of the field highlights that few studies have succeeded in amplifying DNA from remains older than several hundred thousand years. A greater appreciation for the risks of environmental contamination and studies on the chemical stability of DNA have raised concerns over previously reported results. The alleged dinosaur DNA was later revealed to be human Y-chromosome. The DNA reported from encapsulated halobacteria has been criticized based on its similarity to modern bacteria, which hints at contamination, or they may be the product of long-term, low-level metabolic activity.

aDNA may contain a large number of postmortem mutations, increasing with time. Some regions of polynucleotide are more susceptible to this degradation, allowing erroneous sequence data to bypass statistical filters used to check the validity of data. Due to sequencing errors, great caution should be applied to interpretation of population size. Substitutions resulting from deamination of cytosine residues are vastly over-represented in the ancient DNA sequences. Miscoding of C to T and G to A accounts for the majority of errors.

Contamination

Another problem with ancient DNA samples is contamination by modern human DNA and by microbial DNA (most of which is also ancient). New methods have emerged in recent years to prevent possible contamination of aDNA samples, including conducting extractions under extreme sterile conditions, using special adapters to identify endogenous molecules of the sample (distinguished from those introduced during analysis), and applying bioinformatics to resulting sequences based on known reads in order to approximate rates of contamination.

Authentication of aDNA

Development in the aDNA field in the 2000s increased the importance of authenticating recovered DNA to confirm that it is indeed ancient and not the result of recent contamination. As DNA degrades over time, the nucleotides that make up the DNA may change, especially at the ends of the DNA molecules. The deamination of cytosine to uracil at the ends of DNA molecules has become a way of authentication. During DNA sequencing, the DNA polymerases will incorporate an adenine (A) across from the uracil (U), leading to cytosine (C) to thymine (T) substitutions in the aDNA data. These substitutions increase in frequency as the sample gets older. Frequency measurement of the C-T level, ancient DNA damage, can be made using various software such as mapDamage2.0 or PMDtools and interactively on metaDMG. Due to hydrolytic depurination, DNA fragments into smaller pieces, leading to single-stranded breaks. Combined with the damage pattern, this short fragment length can also help differentiate between modern and ancient DNA.

Non-human aDNA

Despite the problems associated with aDNA, a wide and ever-increasing range of aDNA sequences have now been published from a range of animal and plant taxa. Tissues examined include artificially or naturally mummified animal remains, bone, shells, paleofaeces, alcohol preserved specimens, rodent middens, dried plant remains, and recently, extractions of animal and plant DNA directly from soil samples.

In June 2013, a group of researchers including Eske Willerslev, Marcus Thomas Pius Gilbert and Orlando Ludovic of the Centre for Geogenetics, Natural History Museum of Denmark at the University of Copenhagen, announced that they had sequenced the DNA of a 560–780 thousand year old horse, using material extracted from a leg bone found buried in permafrost in Canada's Yukon territory. A German team also reported in 2013 the reconstructed mitochondrial genome of a bear, Ursus deningeri, more than 300,000 years old, proving that authentic ancient DNA can be preserved for hundreds of thousand years outside of permafrost. The DNA sequence of even older nuclear DNA was reported in 2021 from the permafrost-preserved teeth of two Siberian mammoths, both over a million years old.

Researchers in 2016 measured chloroplast DNA in marine sediment cores, and found diatom DNA dating back to 1.4 million years. This DNA had a half-life significantly longer than previous research, of up to 15,000 years. Kirkpatrick's team also found that DNA only decayed along a half-life rate until about 100 thousand years, at which point it followed a slower, power-law decay rate.

Human aDNA

Map of human fossils with an age of at least ~40,000 years that yielded genome-wide data

Due to the considerable anthropological, archaeological, and public interest directed toward human remains, they have received considerable attention from the DNA community. There are also more profound contamination issues, since the specimens belong to the same species as the researchers collecting and evaluating the samples.

Sources

Due to the morphological preservation in mummies, many studies from the 1990s and 2000s used mummified tissue as a source of ancient human DNA. Examples include both naturally preserved specimens, such as the Ötzi the Iceman frozen in a glacier and bodies preserved through rapid desiccation at high altitude in the Andes, as well as various chemically treated preserved tissue such as the mummies of ancient Egypt. However, mummified remains are a limited resource. The majority of human aDNA studies have focused on extracting DNA from two sources much more common in the archaeological record: bones and teeth. The bone that is most often used for DNA extraction is the petrous ear bone, since its dense structure provides good conditions for DNA preservation. Several other sources have also yielded DNA, including paleofaeces, and hair. Contamination remains a major problem when working on ancient human material.

Ancient pathogen DNA has been successfully retrieved from samples dating to more than 5,000 years old in humans and as long as 17,000 years ago in other species. In addition to the usual sources of mummified tissue, bones and teeth, such studies have also examined a range of other tissue samples, including calcified pleura, tissue embedded in paraffin, and formalin-fixed tissue. Efficient computational tools have been developed for pathogen and microorganism aDNA analyses in a small (QIIME) and large scale (FALCON).

Results

Taking preventative measures in their procedure against such contamination though, a 2012 study analyzed bone samples of a Neanderthal group in the El Sidrón cave, finding new insights on potential kinship and genetic diversity from the aDNA. In November 2015, scientists reported finding a 110,000-year-old tooth containing DNA from the Denisovan hominin, an extinct species of human in the genus Homo.

The research has added new complexity to the peopling of Eurasia. A study from 2018 showed that a Bronze Age mass migration had greatly impacted the genetic makeup of the British Isles, bringing with it the Bell Beaker culture from mainland Europe.

It has also revealed new information about links between the ancestors of Central Asians and the indigenous peoples of the Americas. In Africa, older DNA degrades quickly due to the warmer tropical climate, although, in September 2017, ancient DNA samples, as old as 8,100 years old, have been reported.

Moreover, ancient DNA has helped researchers to estimate modern human divergence. By sequencing African genomes from three Stone Age hunter gatherers (2000 years old) and four Iron Age farmers (300 to 500 years old), Schlebusch and colleagues were able to push back the date of the earliest divergence between human populations to 350,000 to 260,000 years ago.

As of 2021, the oldest completely reconstructed human genomes are ~45,000 years old. Such genetic data provides insights into the migration and genetic history – e.g. of Europe – including about interbreeding between archaic and modern humans like a common admixture between initial European modern humans and Neanderthals.

Transgene


From Wikipedia, the free encyclopedia

A transgene is a gene that has been transferred naturally, or by any of a number of genetic engineering techniques, from one organism to another. The introduction of a transgene, in a process known as transgenesis, has the potential to change the phenotype of an organism. Transgene describes a segment of DNA containing a gene sequence that has been isolated from one organism and is introduced into a different organism. This non-native segment of DNA may either retain the ability to produce RNA or protein in the transgenic organism or alter the normal function of the transgenic organism's genetic code. In general, the DNA is incorporated into the organism's germ line. For example, in higher vertebrates this can be accomplished by injecting the foreign DNA into the nucleus of a fertilized ovum. This technique is routinely used to introduce human disease genes or other genes of interest into strains of laboratory mice to study the function or pathology involved with that particular gene.

The construction of a transgene requires the assembly of a few main parts. The transgene must contain a promoter, which is a regulatory sequence that will determine where and when the transgene is active, an exon, a protein coding sequence (usually derived from the cDNA for the protein of interest), and a stop sequence. These are typically combined in a bacterial plasmid and the coding sequences are typically chosen from transgenes with previously known functions.

Transgenic or genetically modified organisms, be they bacteria, viruses or fungi, serve many research purposes. Transgenic plants, insects, fish and mammals (including humans) have been bred. Transgenic plants such as corn and soybean have replaced wild strains in agriculture in some countries (e.g. the United States). Transgene escape has been documented for GMO crops since 2001 with persistence and invasiveness. Transgenetic organisms pose ethical questions and may cause biosafety problems.

History

The idea of shaping an organism to fit a specific need is not a new science. However, until the late 1900s farmers and scientists could breed new strains of a plant or organism only from closely related species because the DNA had to be compatible for offspring to be able to reproduce.

In the 1970 and 1980s, scientists passed this hurdle by inventing procedures for combining the DNA of two vastly different species with genetic engineering. The organisms produced by these procedures were termed transgenic. Transgenesis is the same as gene therapy in the sense that they both transform cells for a specific purpose. However, they are completely different in their purposes, as gene therapy aims to cure a defect in cells, and transgenesis seeks to produce a genetically modified organism by incorporating the specific transgene into every cell and changing the genome. Transgenesis will therefore change the germ cells, not only the somatic cells, in order to ensure that the transgenes are passed down to the offspring when the organisms reproduce. Transgenes alter the genome by blocking the function of a host gene; they can either replace the host gene with one that codes for a different protein, or introduce an additional gene.

The first transgenic organism was created in 1974 when Annie Chang and Stanley Cohen expressed Staphylococcus aureus genes in Escherichia coli. In 1978, yeast cells were the first eukaryotic organisms to undergo gene transfer. Mouse cells were first transformed in 1979, followed by mouse embryos in 1980. Most of the very first transmutations were performed by microinjection of DNA directly into cells. Scientists were able to develop other methods to perform the transformations, such as incorporating transgenes into retroviruses and then infecting cells; using electroinfusion, which takes advantage of an electric current to pass foreign DNA through the cell wall; biolistics, which is the procedure of shooting DNA bullets into cells; and also delivering DNA into the newly fertilized egg.

The first transgenic animals were only intended for genetic research to study the specific function of a gene, and by 2003, thousands of genes had been studied.

Use in plants

A variety of transgenic plants have been designed for agriculture to produce genetically modified crops, such as corn, soybean, rapeseed oil, cotton, rice and more. As of 2012, these GMO crops were planted on 170 million hectares globally.

Golden rice

One example of a transgenic plant species is golden rice. In 1997, five million children developed xerophthalmia, a medical condition caused by vitamin A deficiency, in Southeast Asia alone. Of those children, a quarter million went blind. To combat this, scientists used biolistics to insert the daffodil phytoene synthase gene into Asia indigenous rice cultivars. The daffodil insertion increased the production of β-carotene. The product was a transgenic rice species rich in vitamin A, called golden rice. Little is known about the impact of golden rice on xerophthalmia because anti-GMO campaigns have prevented the full commercial release of golden rice into agricultural systems in need.

Transgene escape

The escape of genetically-engineered plant genes via hybridization with wild relatives was first discussed and examined in Mexico and Europe in the mid-1990s. There is agreement that escape of transgenes is inevitable, even "some proof that it is happening". Up until 2008 there were few documented cases.

Corn

Corn sampled in 2000 from the Sierra Juarez, Oaxaca, Mexico contained a transgenic 35S promoter, while a large sample taken by a different method from the same region in 2003 and 2004 did not. A sample from another region from 2002 also did not, but directed samples taken in 2004 did, suggesting transgene persistence or re-introduction. A 2009 study found recombinant proteins in 3.1% and 1.8% of samples, most commonly in southeast Mexico. Seed and grain import from the United States could explain the frequency and distribution of transgenes in west-central Mexico, but not in the southeast. Also, 5.0% of corn seed lots in Mexican corn stocks expressed recombinant proteins despite the moratorium on GM crops.

Cotton

In 2011, transgenic cotton was found in Mexico among wild cotton, after 15 years of GMO cotton cultivation.

Rapeseed (canola)

Transgenic rapeseed Brassicus napus – hybridized with a native Japanese species, Brassica rapa – was found in Japan in 2011 after having been identified in 2006 in Québec, Canada. They were persistent over a six-year study period, without herbicide selection pressure and despite hybridization with the wild form. This was the first report of the introgression—the stable incorporation of genes from one gene pool into another—of an herbicide-resistance transgene from Brassica napus into the wild form gene pool.

Creeping bentgrass

Transgenic creeping bentgrass, engineered to be glyphosate-tolerant as "one of the first wind-pollinated, perennial, and highly outcrossing transgenic crops", was planted in 2003 as part of a large (about 160 ha) field trial in central Oregon near Madras, Oregon. In 2004, its pollen was found to have reached wild growing bentgrass populations up to 14 kilometres away. Cross-pollinating Agrostis gigantea was even found at a distance of 21 kilometres. The grower, Scotts Company could not remove all genetically engineered plants, and in 2007, the U.S. Department of Agriculture fined Scotts $500,000 for noncompliance with regulations.

Risk assessment

The long-term monitoring and controlling of a particular transgene has been shown not to be feasible. The European Food Safety Authority published a guidance for risk assessment in 2010.

Use in mice

Genetically modified mice are the most common animal model for transgenic research. Transgenic mice are currently being used to study a variety of diseases including cancer, obesity, heart disease, arthritis, anxiety, and Parkinson's disease. The two most common types of genetically modified mice are knockout mice and oncomice. Knockout mice are a type of mouse model that uses transgenic insertion to disrupt an existing gene's expression. In order to create knockout mice, a transgene with the desired sequence is inserted into an isolated mouse blastocyst using electroporation. Then, homologous recombination occurs naturally within some cells, replacing the gene of interest with the designed transgene. Through this process, researchers were able to demonstrate that a transgene can be integrated into the genome of an animal, serve a specific function within the cell, and be passed down to future generations.

Oncomice are another genetically modified mouse species created by inserting transgenes that increase the animal's vulnerability to cancer. Cancer researchers utilize oncomice to study the profiles of different cancers in order to apply this knowledge to human studies.

Use in Drosophila

Multiple studies have been conducted concerning transgenesis in Drosophila melanogaster, the fruit fly. This organism has been a helpful genetic model for over 100 years, due to its well-understood developmental pattern. The transfer of transgenes into the Drosophila genome has been performed using various techniques, including P element, Cre-loxP, and ΦC31 insertion. The most practiced method used thus far to insert transgenes into the Drosophila genome utilizes P elements. The transposable P elements, also known as transposons, are segments of bacterial DNA that are translocated into the genome, without the presence of a complementary sequence in the host's genome. P elements are administered in pairs of two, which flank the DNA insertion region of interest. Additionally, P elements often consist of two plasmid components, one known as the P element transposase and the other, the P transposon backbone. The transposase plasmid portion drives the transposition of the P transposon backbone, containing the transgene of interest and often a marker, between the two terminal sites of the transposon. Success of this insertion results in the nonreversible addition of the transgene of interest into the genome. While this method has been proven effective, the insertion sites of the P elements are often uncontrollable, resulting in an unfavorable, random insertion of the transgene into the Drosophila genome.

To improve the location and precision of the transgenic process, an enzyme known as Cre has been introduced. Cre has proven to be a key element in a process known as recombinase-mediated cassette exchange (RMCE). While it has shown to have a lower efficiency of transgenic transformation than the P element transposases, Cre greatly lessens the labor-intensive abundance of balancing random P insertions. Cre aids in the targeted transgenesis of the DNA gene segment of interest, as it supports the mapping of the transgene insertion sites, known as loxP sites. These sites, unlike P elements, can be specifically inserted to flank a chromosomal segment of interest, aiding in targeted transgenesis. The Cre transposase is important in the catalytic cleavage of the base pairs present at the carefully positioned loxP sites, permitting more specific insertions of the transgenic donor plasmid of interest.

To overcome the limitations and low yields that transposon-mediated and Cre-loxP transformation methods produce, the bacteriophage ΦC31 has recently been utilized. Recent breakthrough studies involve the microinjection of the bacteriophage ΦC31 integrase, which shows improved transgene insertion of large DNA fragments that are unable to be transposed by P elements alone. This method involves the recombination between an attachment (attP) site in the phage and an attachment site in the bacterial host genome (attB). Compared to usual P element transgene insertion methods, ΦC31 integrates the entire transgene vector, including bacterial sequences and antibiotic resistance genes. Unfortunately, the presence of these additional insertions has been found to affect the level and reproducibility of transgene expression.

Use in livestock and aquaculture

One agricultural application is to selectively breed animals for particular traits: Transgenic cattle with an increased muscle phenotype has been produced by overexpressing a short hairpin RNA with homology to the myostatin mRNA using RNA interference. Transgenes are being used to produce milk with high levels of proteins or silk from the milk of goats. Another agricultural application is to selectively breed animals, which are resistant to diseases or animals for biopharmaceutical production.

Future potential

The application of transgenes is a rapidly growing area of molecular biology. As of 2005 it was predicted that in the next two decades, 300,000 lines of transgenic mice will be generated. Researchers have identified many applications for transgenes, particularly in the medical field. Scientists are focusing on the use of transgenes to study the function of the human genome in order to better understand disease, adapting animal organs for transplantation into humans, and the production of pharmaceutical products such as insulin, growth hormone, and blood anti-clotting factors from the milk of transgenic cows.

As of 2004 there were five thousand known genetic diseases, and the potential to treat these diseases using transgenic animals is, perhaps, one of the most promising applications of transgenes. There is a potential to use human gene therapy to replace a mutated gene with an unmutated copy of a transgene in order to treat the genetic disorder. This can be done through the use of Cre-Lox or knockout. Moreover, genetic disorders are being studied through the use of transgenic mice, pigs, rabbits, and rats. Transgenic rabbits have been created to study inherited cardiac arrhythmias, as the rabbit heart markedly better resembles the human heart as compared to the mouse. More recently, scientists have also begun using transgenic goats to study genetic disorders related to fertility.

Transgenes may be used for xenotransplantation from pig organs. Through the study of xeno-organ rejection, it was found that an acute rejection of the transplanted organ occurs upon the organ's contact with blood from the recipient due to the recognition of foreign antibodies on endothelial cells of the transplanted organ. Scientists have identified the antigen in pigs that causes this reaction, and therefore are able to transplant the organ without immediate rejection by removal of the antigen. However, the antigen begins to be expressed later on, and rejection occurs. Therefore, further research is being conducted. Transgenic microorganisms capable of producing catalytic proteins or enzymes which increase the rate of industrial reactions.

Ethical controversy

Transgene use in humans is currently fraught with issues. Transformation of genes into human cells has not been perfected yet. The most famous example of this involved certain patients developing T-cell leukemia after being treated for X-linked severe combined immunodeficiency (X-SCID). This was attributed to the close proximity of the inserted gene to the LMO2 promoter, which controls the transcription of the LMO2 proto-oncogene.

Computer cluster

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Computer_cluster
Technicians working on a large Linux cluster at the Chemnitz University of Technology, Germany
Sun Microsystems Solaris Cluster, with In-Row cooling
Taiwania series uses cluster architecture.

A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newest manifestation of cluster computing is cloud computing.

The components of a cluster are usually connected to each other through fast local area networks, with each node (computer used as a server) running its own instance of an operating system. In most circumstances, all of the nodes use the same hardware and the same operating system, although in some setups (e.g. using Open Source Cluster Application Resources (OSCAR)), different operating systems can be used on each computer, or different hardware.

Clusters are usually deployed to improve performance and availability over that of a single computer, while typically being much more cost-effective than single computers of comparable speed or availability.

Computer clusters emerged as a result of the convergence of a number of computing trends including the availability of low-cost microprocessors, high-speed networks, and software for high-performance distributed computing. They have a wide range of applicability and deployment, ranging from small business clusters with a handful of nodes to some of the fastest supercomputers in the world such as IBM's Sequoia. Prior to the advent of clusters, single-unit fault tolerant mainframes with modular redundancy were employed; but the lower upfront cost of clusters, and increased speed of network fabric has favoured the adoption of clusters. In contrast to high-reliability mainframes, clusters are cheaper to scale out, but also have increased complexity in error handling, as in clusters error modes are not opaque to running programs.

Basic concepts

A simple, home-built Beowulf cluster

The desire to get more computing power and better reliability by orchestrating a number of low-cost commercial off-the-shelf computers has given rise to a variety of architectures and configurations.

The computer clustering approach usually (but not always) connects a number of readily available computing nodes (e.g. personal computers used as servers) via a fast local area network. The activities of the computing nodes are orchestrated by "clustering middleware", a software layer that sits atop the nodes and allows the users to treat the cluster as by and large one cohesive computing unit, e.g. via a single system image concept.

Computer clustering relies on a centralized management approach which makes the nodes available as orchestrated shared servers. It is distinct from other approaches such as peer-to-peer or grid computing which also use many nodes, but with a far more distributed nature.

A computer cluster may be a simple two-node system which just connects two personal computers, or may be a very fast supercomputer. A basic approach to building a cluster is that of a Beowulf cluster which may be built with a few personal computers to produce a cost-effective alternative to traditional high-performance computing. An early project that showed the viability of the concept was the 133-node Stone Soupercomputer. The developers used Linux, the Parallel Virtual Machine toolkit and the Message Passing Interface library to achieve high performance at a relatively low cost.

Although a cluster may consist of just a few personal computers connected by a simple network, the cluster architecture may also be used to achieve very high levels of performance. The TOP500 organization's semiannual list of the 500 fastest supercomputers often includes many clusters, e.g. the world's fastest machine in 2011 was the K computer which has a distributed memory, cluster architecture.

History

A VAX 11/780, c. 1977, as used in early VAXcluster development

Greg Pfister has stated that clusters were not invented by any specific vendor but by customers who could not fit all their work on one computer, or needed a backup. Pfister estimates the date as some time in the 1960s. The formal engineering basis of cluster computing as a means of doing parallel work of any sort was arguably invented by Gene Amdahl of IBM, who in 1967 published what has come to be regarded as the seminal paper on parallel processing: Amdahl's Law.

The history of early computer clusters is more or less directly tied to the history of early networks, as one of the primary motivations for the development of a network was to link computing resources, creating a de facto computer cluster.

The first production system designed as a cluster was the Burroughs B5700 in the mid-1960s. This allowed up to four computers, each with either one or two processors, to be tightly coupled to a common disk storage subsystem in order to distribute the workload. Unlike standard multiprocessor systems, each computer could be restarted without disrupting overall operation.

Tandem NonStop II circa 1980

The first commercial loosely coupled clustering product was Datapoint Corporation's "Attached Resource Computer" (ARC) system, developed in 1977, and using ARCnet as the cluster interface. Clustering per se did not really take off until Digital Equipment Corporation released their VAXcluster product in 1984 for the VMS operating system. The ARC and VAXcluster products not only supported parallel computing, but also shared file systems and peripheral devices. The idea was to provide the advantages of parallel processing, while maintaining data reliability and uniqueness. Two other noteworthy early commercial clusters were the Tandem NonStop (a 1976 high-availability commercial product) and the IBM S/390 Parallel Sysplex (circa 1994, primarily for business use).

Within the same time frame, while computer clusters used parallelism outside the computer on a commodity network, supercomputers began to use them within the same computer. Following the success of the CDC 6600 in 1964, the Cray 1 was delivered in 1976, and introduced internal parallelism via vector processing. While early supercomputers excluded clusters and relied on shared memory, in time some of the fastest supercomputers (e.g. the K computer) relied on cluster architectures.

Attributes of clusters

A load balancing cluster with two servers and N user stations

Computer clusters may be configured for different purposes ranging from general purpose business needs such as web-service support, to computation-intensive scientific calculations. In either case, the cluster may use a high-availability approach. Note that the attributes described below are not exclusive and a "computer cluster" may also use a high-availability approach, etc.

"Load-balancing" clusters are configurations in which cluster-nodes share computational workload to provide better overall performance. For example, a web server cluster may assign different queries to different nodes, so the overall response time will be optimized. However, approaches to load-balancing may significantly differ among applications, e.g. a high-performance cluster used for scientific computations would balance load with different algorithms from a web-server cluster which may just use a simple round-robin method by assigning each new request to a different node.

Computer clusters are used for computation-intensive purposes, rather than handling IO-oriented operations such as web service or databases. For instance, a computer cluster might support computational simulations of vehicle crashes or weather. Very tightly coupled computer clusters are designed for work that may approach "supercomputing".

"High-availability clusters" (also known as failover clusters, or HA clusters) improve the availability of the cluster approach. They operate by having redundant nodes, which are then used to provide service when system components fail. HA cluster implementations attempt to use redundancy of cluster components to eliminate single points of failure. There are commercial implementations of High-Availability clusters for many operating systems. The Linux-HA project is one commonly used free software HA package for the Linux operating system.

Benefits

Clusters are primarily designed with performance in mind, but installations are based on many other factors. Fault tolerance (the ability of a system to continue operating despite a malfunctioning node) enables scalability, and in high-performance situations, allows for a low frequency of maintenance routines, resource consolidation (e.g., RAID), and centralized management. Advantages include enabling data recovery in the event of a disaster and providing parallel data processing and high processing capacity.

In terms of scalability, clusters provide this in their ability to add nodes horizontally. This means that more computers may be added to the cluster, to improve its performance, redundancy and fault tolerance. This can be an inexpensive solution for a higher performing cluster compared to scaling up a single node in the cluster. This property of computer clusters can allow for larger computational loads to be executed by a larger number of lower performing computers.

When adding a new node to a cluster, reliability increases because the entire cluster does not need to be taken down. A single node can be taken down for maintenance, while the rest of the cluster takes on the load of that individual node.

If you have a large number of computers clustered together, this lends itself to the use of distributed file systems and RAID, both of which can increase the reliability and speed of a cluster.

Design and configuration

A typical Beowulf configuration

One of the issues in designing a cluster is how tightly coupled the individual nodes may be. For instance, a single computer job may require frequent communication among nodes: this implies that the cluster shares a dedicated network, is densely located, and probably has homogeneous nodes. The other extreme is where a computer job uses one or few nodes, and needs little or no inter-node communication, approaching grid computing.

In a Beowulf cluster, the application programs never see the computational nodes (also called slave computers) but only interact with the "Master" which is a specific computer handling the scheduling and management of the slaves. In a typical implementation the Master has two network interfaces, one that communicates with the private Beowulf network for the slaves, the other for the general purpose network of the organization. The slave computers typically have their own version of the same operating system, and local memory and disk space. However, the private slave network may also have a large and shared file server that stores global persistent data, accessed by the slaves as needed.

A special purpose 144-node DEGIMA cluster is tuned to running astrophysical N-body simulations using the Multiple-Walk parallel tree code, rather than general purpose scientific computations.

Due to the increasing computing power of each generation of game consoles, a novel use has emerged where they are repurposed into High-performance computing (HPC) clusters. Some examples of game console clusters are Sony PlayStation clusters and Microsoft Xbox clusters. Another example of consumer game product is the Nvidia Tesla Personal Supercomputer workstation, which uses multiple graphics accelerator processor chips. Besides game consoles, high-end graphics cards too can be used instead. The use of graphics cards (or rather their GPU's) to do calculations for grid computing is vastly more economical than using CPU's, despite being less precise. However, when using double-precision values, they become as precise to work with as CPU's and are still much less costly (purchase cost).

Computer clusters have historically run on separate physical computers with the same operating system. With the advent of virtualization, the cluster nodes may run on separate physical computers with different operating systems which are painted above with a virtual layer to look similar. The cluster may also be virtualized on various configurations as maintenance takes place; an example implementation is Xen as the virtualization manager with Linux-HA.

Data sharing and communication

Data sharing

A NEC Nehalem cluster

As the computer clusters were appearing during the 1980s, so were supercomputers. One of the elements that distinguished the three classes at that time was that the early supercomputers relied on shared memory. Clusters do not typically use physically shared memory, while many supercomputer architectures have also abandoned it.

However, the use of a clustered file system is essential in modern computer clusters. Examples include the IBM General Parallel File System, Microsoft's Cluster Shared Volumes or the Oracle Cluster File System.

Message passing and communication

Two widely used approaches for communication between cluster nodes are MPI (Message Passing Interface) and PVM (Parallel Virtual Machine).

PVM was developed at the Oak Ridge National Laboratory around 1989 before MPI was available. PVM must be directly installed on every cluster node and provides a set of software libraries that paint the node as a "parallel virtual machine". PVM provides a run-time environment for message-passing, task and resource management, and fault notification. PVM can be used by user programs written in C, C++, or Fortran, etc.

MPI emerged in the early 1990s out of discussions among 40 organizations. The initial effort was supported by ARPA and National Science Foundation. Rather than starting anew, the design of MPI drew on various features available in commercial systems of the time. The MPI specifications then gave rise to specific implementations. MPI implementations typically use TCP/IP and socket connections. MPI is now a widely available communications model that enables parallel programs to be written in languages such as C, Fortran, Python, etc. Thus, unlike PVM which provides a concrete implementation, MPI is a specification which has been implemented in systems such as MPICH and Open MPI.

Cluster management

Low-cost and low energy tiny-cluster of Cubieboards, using Apache Hadoop on Lubuntu
A pre-release sample of the Ground Electronics/AB Open Circumference C25 cluster computer system, fitted with 8x Raspberry Pi 3 Model B+ and 1x UDOO x86 boards

One of the challenges in the use of a computer cluster is the cost of administrating it which can at times be as high as the cost of administrating N independent machines, if the cluster has N nodes. In some cases this provides an advantage to shared memory architectures with lower administration costs. This has also made virtual machines popular, due to the ease of administration.

Task scheduling

When a large multi-user cluster needs to access very large amounts of data, task scheduling becomes a challenge. In a heterogeneous CPU-GPU cluster with a complex application environment, the performance of each job depends on the characteristics of the underlying cluster. Therefore, mapping tasks onto CPU cores and GPU devices provides significant challenges. This is an area of ongoing research; algorithms that combine and extend MapReduce and Hadoop have been proposed and studied.

Node failure management

When a node in a cluster fails, strategies such as "fencing" may be employed to keep the rest of the system operational. Fencing is the process of isolating a node or protecting shared resources when a node appears to be malfunctioning. There are two classes of fencing methods; one disables a node itself, and the other disallows access to resources such as shared disks.

The STONITH method stands for "Shoot The Other Node In The Head", meaning that the suspected node is disabled or powered off. For instance, power fencing uses a power controller to turn off an inoperable node.

The resources fencing approach disallows access to resources without powering off the node. This may include persistent reservation fencing via the SCSI3, fibre channel fencing to disable the fibre channel port, or global network block device (GNBD) fencing to disable access to the GNBD server.

Software development and administration

Parallel programming

Load balancing clusters such as web servers use cluster architectures to support a large number of users and typically each user request is routed to a specific node, achieving task parallelism without multi-node cooperation, given that the main goal of the system is providing rapid user access to shared data. However, "computer clusters" which perform complex computations for a small number of users need to take advantage of the parallel processing capabilities of the cluster and partition "the same computation" among several nodes.

Automatic parallelization of programs remains a technical challenge, but parallel programming models can be used to effectuate a higher degree of parallelism via the simultaneous execution of separate portions of a program on different processors.

Debugging and monitoring

Developing and debugging parallel programs on a cluster requires parallel language primitives and suitable tools such as those discussed by the High Performance Debugging Forum (HPDF) which resulted in the HPD specifications. Tools such as TotalView were then developed to debug parallel implementations on computer clusters which use Message Passing Interface (MPI) or Parallel Virtual Machine (PVM) for message passing.

The University of California, Berkeley Network of Workstations (NOW) system gathers cluster data and stores them in a database, while a system such as PARMON, developed in India, allows visually observing and managing large clusters.

Application checkpointing can be used to restore a given state of the system when a node fails during a long multi-node computation. This is essential in large clusters, given that as the number of nodes increases, so does the likelihood of node failure under heavy computational loads. Checkpointing can restore the system to a stable state so that processing can resume without needing to recompute results.

Implementations

The Linux world supports various cluster software; for application clustering, there is distcc, and MPICH. Linux Virtual Server, Linux-HA – director-based clusters that allow incoming requests for services to be distributed across multiple cluster nodes. MOSIX, LinuxPMI, Kerrighed, OpenSSI are full-blown clusters integrated into the kernel that provide for automatic process migration among homogeneous nodes. OpenSSI, openMosix and Kerrighed are single-system image implementations.

Microsoft Windows computer cluster Server 2003 based on the Windows Server platform provides pieces for high-performance computing like the job scheduler, MSMPI library and management tools.

gLite is a set of middleware technologies created by the Enabling Grids for E-sciencE (EGEE) project.

slurm is also used to schedule and manage some of the largest supercomputer clusters (see top500 list).

Other approaches

Although most computer clusters are permanent fixtures, attempts at flash mob computing have been made to build short-lived clusters for specific computations. However, larger-scale volunteer computing systems such as BOINC-based systems have had more followers.

Humanized mouse

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Humanized_mouse

A humanized mouse is a genetically modified mouse that has functioning human genes, cells, tissues and/or organs. Humanized mice are commonly used as small animal models in biological and medical research for human therapeutics.

A humanized mouse or a humanized mouse model is one that has been xenotransplanted with human cells and/or engineered to express human gene products, so as to be utilized for gaining relevant insights in the in vivo context for understanding of human-specific physiology and pathologies. Several human biological processes have been explored using animal models like rodents and non-human primates. In particular, small animals such as mice are advantageous in such studies owing to their small size, brief reproductive cycle, easy handling and due to the genomic and physiological similarities with humans; moreover, these animals can also be genetically modified easily. Nevertheless, there are several incongruencies of these animal systems with those of humans, especially with regard to the components of the immune system. To overcome these limitations and to realize the full potential of animal models to enable researchers to get a clear picture of the nature and pathogenesis of immune responses mounted against human-specific pathogens, humanized mouse models have been developed. Such mouse models have also become an integral aspect of preclinical biomedical research.

History

The discovery of the athymic mouse, commonly known as the nude mouse, and that of the SCID mouse were major events that paved the way for humanized mice models. The first such mouse model was derived by backcrossing C57BL/Ka and BALB/c mice, featuring a loss of function mutation in the PRKDC gene. The PRKDC gene product is necessary for resolving breaks in DNA strands during the development of T cells and B cells. A mutation in the Foxn1 gene on chromosome 11 resulted in impaired thymus development, leading to a deficiency in mature T lymphocytes. Dysfunctional PRKDC gene leads to impaired development of T and B lymphocytes which gives rise to severe combined immunodeficiency (SCID). In spite of the efforts in developing this mouse model, poor engraftment of human hematopoietic stem cells (HSCs) was a major limitation that called for further advancement in the development humanized mouse models. Nude mice were the earliest immunodeficient mouse model. These mice primarily produced IgM and had minimal or no IgA. As a result, they did not exhibit a rejection response to allogeneic tissue. Commonly utilized strains included BALB/c-nu, Swiss-nu, NC-nu, and NIH-nu, which were extensively employed in the research of immune diseases and tumors. However, due to the retention of B cells and NK cells, they were unable to fully support engraftment of human immune cells, thus making them unsuitable as an ideal humanized mouse model.

The next big step in the development of humanized mice models came with transfer of the scid mutation to a non-obese diabetic mouse. This resulted in the creation of the NOD-scid mice which lacked T cells, B cells, and NK cells. This mouse model permitted for a slightly higher level of human cell reconstitution. Nevertheless, a major breakthrough in this field came with the introduction of the mutant IL-2 receptor (IL2rg) gene in the NOD-scid model. This accounted for the creation of the NOD-scid-γcnull mice (NCG, NSG or NOG) models which were found to have defective signaling of interleukins IL-2, IL-4, IL-7, IL-9, IL-15 and IL-21. Researchers evolved this NSG model by knocking out the RAG1 and RAG2 genes (recombination activation genes), resulting into the RAGnull version of the NSG model that was devoid of major cells of the immune system including the natural killer cells, B lymphocytes and T lymphocytes, macrophages and dendritic cells, causing the greatest immunodeficiency in mice models so far. The limitation with this model was that it lacked the human leukocyte antigen. In accordance to this limitation, the human T cells when engrafted in the mice, failed to recognize human antigen-presenting cells, which consequated in defective immunoglobulin class switching and improper organization of the secondary lymphoid tissue.

To circumvent this limitation, the next development came with the introduction of transgenes encoding for HLA I and HLA II in the NSG RAGnull model that enabled buildout of human T-lymphocyte repertoires as well as the respective immune responses. Mice with such human genes are technically human-animal hybrids.

Types

Engrafting an immunodeficient mouse with functional human cells can be achieved by intravenous injections of human cells and tissue into the mouse, and/or creating a genetically modified mouse from human genes. These models have been instrumental in studying human diseases, immune responses, and therapeutic interventions. This section highlights the various humanized mice models developed using the different methods.

Hu-PBL-scid model

The human peripheral blood lymphocyte-severe combined immunodeficiency mouse model has been employed in a diverse array of research, encompassing investigations into Epstein-Barr virus (EBV)-associated lymphoproliferative disease, toxoplasmosis, human immunodeficiency virus (HIV) infection, and autoimmune diseases. These studies have highlighted the effectiveness of the hu-PBL-SCID mouse model in examining various facets of human diseases, including pathogenesis, immune responses, and therapeutic interventions. Furthermore, the model has been utilized to explore genetic and molecular factors linked to neuropsychiatric disorders such as schizophrenia, offering valuable insights into the pathophysiology and potential therapeutic targets for these conditions. This model is developed by intravenously injecting human PBMCs into immunodeficient mice. The peripheral blood mononuclear cells to be engrafted into the model are obtained from consented adult donors. The advantages associated with this method are that it is comparatively an easy technique, the model takes relatively less time to get established and that the model exhibits functional memory T cells. It is particularly very effective for modelling graft vs. host disease. The model lacks engraftment of B lymphocytes and myeloid cells. Other limitations with this model are that it is suitable for use only in short-term experiments (<3 months) and the possibility that the model itself might develop graft vs. host disease.

Hu-SRC-scid model

The humanized severe combined immunodeficiency (SCID) mouse model, also known as the hu-SRC-scid model, has been extensively utilized in various research areas, including immunology, infectious diseases, cancer, and drug development. This model has been instrumental in studying the human immune response to xenogeneic and allogeneic decellularized biomaterials, providing valuable insights into the biocompatibility and gene expression regulation of these materials. Hu-SRC-scid mice are developed by engrafting CD34+ human hematopoietic stem cells into immunodeficient mice. The cells are obtained from human fetal liver, bone marrow or from blood derived from the umbilical cord, and engrafted via intravenous injection. The advantages of this model are that it offers multilineage development of hematopoietic cells, generation of a naïve immune system, and if engraftment is carried out by intrahepatic injection of newborn mice within 72 hours of birth, it can lead to enhanced human cell reconstitution. Nevertheless, limitations associated with the model are that it takes a minimum of 10 weeks for cell differentiation to occur, it harbors low levels of human RBCs, polymorphonuclear leukocytes, and megakaryocytes.

BLT (bone marrow/liver/thymus) model

The BLT model is constituted with human HSCs, bone marrow, liver, and thymus. The engraftment is carried out by implantation of liver and thymus under the kidney capsule and by transplantation of HSCs obtained from fetal liver. The BLT model has a complete and totally functional human immune system with HLA-restricted T lymphocytes. The model also comprises a mucosal system that is similar to that of humans. Moreover, among all models the BLT model has the highest level of human cell reconstitution.

However, since it requires surgical implantation, this model is the most difficult and time-consuming to develop. Other drawbacks associated with the model are that it portrays weak immune responses to xenobiotics, sub-optimal class switching and may develop GvHD.

Transplanted human organoids

Bio- and electrical engineers have shown that human cerebral organoids transplanted into mice functionally integrate with their visual cortex. Such models may raise similar ethical issues as organoid-based humanization of other animals.

Mouse-human hybrid

A mouse-human hybrid is a genetically modified mouse whose genome has both mouse and human genes, thus being a murine form of a human-animal hybrid. For example, genetically modified mice may be born with human leukocyte antigen genes in order to provide a more realistic environment when introducing human white blood cells into them in order to study immune system responses. One such application is the identification of hepatitis C virus (HCV) peptides that bind to HLA, and that can be recognized by the human immune system, thereby potentially being targets for future vaccines against HCV.

Established models for human diseases

Several mechanisms underlying human maladies are not fully understood. Utilization of humanized mice models in this context allows researchers to determine and unravel important factors that bring about the development of several human diseases and disorders falling under the categories of infectious disease, cancer, autoimmunity, and GvHD.

Infectious diseases

Among the human-specific infectious pathogens studied on humanized mice models, the human immunodeficiency virus has been successfully studied. Besides this, humanized models for studying Ebola virus, Hepatitis B, Hepatitis C, Kaposi's sarcoma-associated herpesvirus, Leishmania major, malaria, and tuberculosis have been reported by various studies.

NOD/scid mice models for dengue virus and varicella-zoster virus, and a Rag2null𝛾cnull model for studying influenza virus have also been developed.

Cancers

On the basis of the type of human cells/tissues that have been used for engraftment, humanized mouse models for cancer can be classified as patient-derived xenografts or cell line-derived xenografts. PDX models are considered to retain the parental malignancy characteristics at a greater extent and hence these are regarded as the more powerful tool for evaluating the effect of anticancer drugs in pre-clinical studies. Humanized mouse models for studying cancers of various organs have been designed. A mouse model for the study of breast cancer has been generated by the intrahepatic engraftment of SK-BR-3 cells in NSG mice. Similarly, NSG mice intravenously engrafted with patient-derived AML cells, and those engrafted (via subcutaneous, intravenous or intra-pancreatic injections) with patient-derived pancreatic cancer tumors have also been developed for the study of leukemia and pancreatic cancer respectively. Several other humanized rodent models for the study of cancer and cancer immunotherapy have also been reported.

Autoimmune diseases

Problems posed by the differences in the human and rodent immune systems have been overcome using a few strategies, so as to enable researchers to study autoimmune disorders using humanized models. As a result, the use of humanized mouse models has extended to various areas of immunology and disease research. For instance, humanized mice have been utilized to study human-tropic pathogens, liver cancer models, and the comparison of mouse models to human diseases NSG mice engrafted with PBMCs and administered with myelin antigens in Freund's adjuvant, and antigen-pulsed autologous dendritic cells have been used to study multiple sclerosis. Similarly, NSG mice engrafted with hematopoietic stem cells and administered with pristane have been used for studying lupus erythematosus. Furthermore, NOG mice engrafted with PBMCs has been used to study mechanisms of allografts rejection in vivo. The development of humanized mouse models has significantly advanced the study of autoimmune disorders and various areas of immunology and disease research. These models have provided a platform for investigating human diseases, immune responses, and therapeutic interventions, bridging the gap between human and rodent immune systems and offering valuable insights into disease pathogenesis and potential therapeutic strategies.

Knockout mouse

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Knockout_mouse   ...