Search This Blog

Thursday, April 17, 2025

Origin of replication

From Wikipedia, the free encyclopedia
Models for bacterial (A) and eukaryotic (B) DNA replication initiation. A) Circular bacterial chromosomes contain a cis-acting element, the replicator, that is located at or near replication origins. i) The replicator recruits initiator proteins in a DNA sequence-specific manner, which results in melting of the DNA helix and loading of the replicative helicase onto each of the single DNA strands (ii). iii) Assembled replisomes bidirectionally replicate DNA to yield two copies of the bacterial chromosome. B) Linear eukaryotic chromosomes contain many replication origins. Initiator binding (i) facilitates replicative helicase loading (ii) onto duplex DNA to license origins. iii) A subset of loaded helicases is activated for replisome assembly. Replication proceeds bidirectionally from origins and terminates when replication forks from adjacent active origins meet (iv).

The origin of replication (also called the replication origin) is a particular sequence in a genome at which replication is initiated. Propagation of the genetic material between generations requires timely and accurate duplication of DNA by semiconservative replication prior to cell division to ensure each daughter cell receives the full complement of chromosomes. This can either involve the replication of DNA in living organisms such as prokaryotes and eukaryotes, or that of DNA or RNA in viruses, such as double-stranded RNA viruses. Synthesis of daughter strands starts at discrete sites, termed replication origins, and proceeds in a bidirectional manner until all genomic DNA is replicated. Despite the fundamental nature of these events, organisms have evolved surprisingly divergent strategies that control replication onset. Although the specific replication origin organization structure and recognition varies from species to species, some common characteristics are shared.

Features

A key prerequisite for DNA replication is that it must occur with extremely high fidelity and efficiency exactly once per cell cycle to prevent the accumulation of genetic alterations with potentially deleterious consequences for cell survival and organismal viability. Incomplete, erroneous, or untimely DNA replication events can give rise to mutations, chromosomal polyploidy or aneuploidy, and gene copy number variations, each of which in turn can lead to diseases, including cancer. To ensure complete and accurate duplication of the entire genome and the correct flow of genetic information to progeny cells, all DNA replication events are not only tightly regulated with cell cycle cues but are also coordinated with other cellular events such as transcription and DNA repair. Additionally, origin sequences commonly have high AT-content across all kingdoms, since repeats of adenine and thymine are easier to separate because their base stacking interactions are not as strong as those of guanine and cytosine.

DNA replication is divided into different stages. During initiation, the replication machineries – termed replisomes – are assembled on DNA in a bidirectional fashion. These assembly loci constitute the start sites of DNA replication or replication origins. In the elongation phase, replisomes travel in opposite directions with the replication forks, unwinding the DNA helix and synthesizing complementary daughter DNA strands using both parental strands as templates. Once replication is complete, specific termination events lead to the disassembly of replisomes. As long as the entire genome is duplicated before cell division, one might assume that the location of replication start sites does not matter; yet, it has been shown that many organisms use preferred genomic regions as origins. The necessity to regulate origin location likely arises from the need to coordinate DNA replication with other processes that act on the shared chromatin template to avoid DNA strand breaks and DNA damage.

Replicon model

More than five decades ago, Jacob, Brenner, and Cuzin proposed the replicon hypothesis to explain the regulation of chromosomal DNA synthesis in E. coli. The model postulates that a diffusible, trans-acting factor, a so-called initiator, interacts with a cis-acting DNA element, the replicator, to promote replication onset at a nearby origin. Once bound to replicators, initiators (often with the help of co-loader proteins) deposit replicative helicases onto DNA, which subsequently drive the recruitment of additional replisome components and the assembly of the entire replication machinery. The replicator thereby specifies the location of replication initiation events, and the chromosome region that is replicated from a single origin or initiation event is defined as the replicon.

A fundamental feature of the replicon hypothesis is that it relies on positive regulation to control DNA replication onset, which can explain many experimental observations in bacterial and phage systems. For example, it accounts for the failure of extrachromosomal DNAs without origins to replicate when introduced into host cells. It further rationalizes plasmid incompatibilities in E. coli, where certain plasmids destabilize each other's inheritance due to competition for the same molecular initiation machinery. By contrast, a model of negative regulation (analogous to the replicon-operator model for transcription) fails to explain the above findings. Nonetheless, research subsequent to Jacob's, Brenner's and Cuzin's proposal of the replicon model has discovered many additional layers of replication control in bacteria and eukaryotes that comprise both positive and negative regulatory elements, highlighting both the complexity and the importance of restricting DNA replication temporally and spatially.

The concept of the replicator as a genetic entity has proven very useful in the quest to identify replicator DNA sequences and initiator proteins in prokaryotes, and to some extent also in eukaryotes, although the organization and complexity of replicators differ considerably between the domains of life. While bacterial genomes typically contain a single replicator that is specified by consensus DNA sequence elements and that controls replication of the entire chromosome, most eukaryotic replicators – with the exception of budding yeast – are not defined at the level of DNA sequence; instead, they appear to be specified combinatorially by local DNA structural and chromatin cues. Eukaryotic chromosomes are also much larger than their bacterial counterparts, raising the need for initiating DNA synthesis from many origins simultaneously to ensure timely replication of the entire genome. Additionally, many more replicative helicases are loaded than activated to initiate replication in a given cell cycle. The context-driven definition of replicators and selection of origins suggests a relaxed replicon model in eukaryotic systems that allows for flexibility in the DNA replication program. Although replicators and origins can be spaced physically apart on chromosomes, they often co-localize or are located in close proximity; for simplicity, we will thus refer to both elements as ‘origins’ throughout this review. Taken together, the discovery and isolation of origin sequences in various organisms represents a significant milestone towards gaining mechanistic understanding of replication initiation. In addition, these accomplishments had profound biotechnological implications for the development of shuttle vectors that can be propagated in bacterial, yeast and mammalian cells.

Bacterial

Origin organization and recognition in bacteria. A) Schematic of the architecture of E. coli origin oriC, Thermotoga maritima oriC, and the bipartite origin in Helicobacter pylori. The DUE is flanked on one side by several high- and weak-affinity DnaA-boxes as indicated for E. coli oriC. B) Domain organization of the E. coli initiator DnaA. Magenta circle indicates the single-strand DNA binding site. C) Models for origin recognition and melting by DnaA. In the two-state model (left panel), the DnaA protomers transition from a dsDNA binding mode (mediated by the HTH-domains recognizing DnaA-boxes) to an ssDNA binding mode (mediated by the AAA+ domains). In the loop-back model, the DNA is sharply bent backwards onto the DnaA filament (facilitated by the regulatory protein IHF) so that a single protomer binds both duplex and single-stranded regions. In either instance, the DnaA filament melts the DNA duplex and stabilizes the initiation bubble prior to loading of the replicative helicase (DnaB in E. coli). HTH – helix-turn-helix domain, DUE – DNA unwinding element, IHF – integration host factor.

Most bacterial chromosomes are circular and contain a single origin of chromosomal replication (oriC). Bacterial oriC regions are surprisingly diverse in size (ranging from 250 bp to 2 kbp), sequence, and organization; nonetheless, their ability to drive replication onset typically depends on sequence-specific readout of consensus DNA elements by the bacterial initiator, a protein called DnaA. Origins in bacteria are either continuous or bipartite and contain three functional elements that control origin activity: conserved DNA repeats that are specifically recognized by DnaA (called DnaA-boxes), an AT-rich DNA unwinding element (DUE), and binding sites for proteins that help regulate replication initiation. Interactions of DnaA both with the double-stranded (ds) DnaA-box regions and with single-stranded (ss) DNA in the DUE are important for origin activation and are mediated by different domains in the initiator protein: a Helix-turn-helix (HTH) DNA binding element and an ATPase associated with various cellular activities (AAA+) domain, respectively. While the sequence, number, and arrangement of origin-associated DnaA-boxes vary throughout the bacterial kingdom, their specific positioning and spacing in a given species are critical for oriC function and for productive initiation complex formation.

Among bacteria, E. coli is a particularly powerful model system to study the organization, recognition, and activation mechanism of replication origins. E. coli oriC comprises an approximately ~260 bp region containing four types of initiator binding elements that differ in their affinities for DnaA and their dependencies on the co-factor ATP. DnaA-boxes R1, R2, and R4 constitute high-affinity sites that are bound by the HTH domain of DnaA irrespective of the nucleotide-binding state of the initiator. By contrast, the I, τ, and C-sites, which are interspersed between the R-sites, are low-affinity DnaA-boxes and associate preferentially with ATP-bound DnaA, although ADP-DnaA can substitute for ATP-DnaA under certain conditions. Binding of the HTH domains to the high- and low-affinity DnaA recognition elements promotes ATP-dependent higher-order oligomerization of DnaA's AAA+ modules into a right-handed filament that wraps duplex DNA around its outer surface, thereby generating superhelical torsion that facilitates melting of the adjacent AT-rich DUE. DNA strand separation is additionally aided by direct interactions of DnaA's AAA+ ATPase domain with triplet repeats, so-called DnaA-trios, in the proximal DUE region. The engagement of single-stranded trinucleotide segments by the initiator filament stretches DNA and stabilizes the initiation bubble by preventing reannealing. The DnaA-trio origin element is conserved in many bacterial species, indicating it is a key element for origin function. After melting, the DUE provides an entry site for the E. coli replicative helicase DnaB, which is deposited onto each of the single DNA strands by its loader protein DnaC.

Although the different DNA binding activities of DnaA have been extensively studied biochemically and various apo, ssDNA-, or dsDNA-bound structures have been determined, the exact architecture of the higher-order DnaA-oriC initiation assembly remains unclear. Two models have been proposed to explain the organization of essential origin elements and DnaA-mediated oriC melting. The two-state model assumes a continuous DnaA filament that switches from a dsDNA binding mode (the organizing complex) to an ssDNA binding mode in the DUE (the melting complex). By contrast, in the loop-back model, the DNA is sharply bent in oriC and folds back onto the initiator filament so that DnaA protomers simultaneously engage double- and single-stranded DNA regions. Elucidating how exactly oriC DNA is organized by DnaA remains thus an important task for future studies. Insights into initiation complex architecture will help explain not only how origin DNA is melted, but also how a replicative helicase is loaded directionally onto each of the exposed single DNA strands in the unwound DUE, and how these events are aided by interactions of the helicase with the initiator and specific loader proteins.

Archaeal

Origin organization and recognition in archaea. A) The circular chromosome of Sulfolobus solfataricus contains three different origins. B) Arrangement of initiator binding sites at two S. solfataricus origins, oriC1 and oriC2. Orc1-1 association with ORB elements is shown for oriC1. Recognition elements for additional Orc1/Cdc6 paralogs are also indicated, while WhiP binding sites have been omitted. C) Domain architecture of archaeal Orc1/Cdc6 paralogs. The orientation of ORB elements at origins leads to directional binding of Orc1/Cdc6 and MCM loading in between opposing ORBs (in B). (m)ORB – (mini-)origin recognition box, DUE – DNA unwinding element, WH – winged-helix domain.

Archaeal replication origins share some but not all of the organizational features of bacterial oriC. Unlike bacteria, Archaea often initiate replication from multiple origins per chromosome (one to four have been reported); yet, archaeal origins also bear specialized sequence regions that control origin function. These elements include both DNA sequence-specific origin recognition boxes (ORBs or miniORBs) and an AT-rich DUE that is flanked by one or several ORB regions. ORB elements display a considerable degree of diversity in terms of their number, arrangement, and sequence, both among different archaeal species and among different origins in a single species. An additional degree of complexity is introduced by the initiator, Orc1/Cdc6 in archaea, which binds to ORB regions. Archaeal genomes typically encode multiple paralogs of Orc1/Cdc6 that vary substantially in their affinities for distinct ORB elements and that differentially contribute to origin activities. In Sulfolobus solfataricus, for example, three chromosomal origins have been mapped (oriC1, oriC2, and oriC3), and biochemical studies have revealed complex binding patterns of initiators at these sites. The cognate initiator for oriC1 is Orc1-1, which associates with several ORBs at this origin. OriC2 and oriC3 are bound by both Orc1-1 and Orc1-3. Conversely, a third paralog, Orc1-2, footprints at all three origins but has been postulated to negatively regulate replication initiation. Additionally, the WhiP protein, an initiator unrelated to Orc1/Cdc6, has been shown to bind all origins as well and to drive origin activity of oriC3 in the closely related Sulfolobus islandicus. Because archaeal origins often contain several adjacent ORB elements, multiple Orc1/Cdc6 paralogs can be simultaneously recruited to an origin and oligomerize in some instances; however, in contrast to bacterial DnaA, formation of a higher-order initiator assembly does not appear to be a general prerequisite for origin function in the archaeal domain.

Structural studies have provided insights into how archaeal Orc1/Cdc6 recognizes ORB elements and remodels origin DNA. Orc1/Cdc6 paralogs are two-domain proteins and are composed of a AAA+ ATPase module fused to a C-terminal winged-helix fold. DNA-complexed structures of Orc1/Cdc6 revealed that ORBs are bound by an Orc1/Cdc6 monomer despite the presence of inverted repeat sequences within ORB elements. Both the ATPase and winged-helix regions interact with the DNA duplex but contact the palindromic ORB repeat sequence asymmetrically, which orients Orc1/Cdc6 in a specific direction on the repeat. Interestingly, the DUE-flanking ORB or miniORB elements often have opposite polarities, which predicts that the AAA+ lid subdomains and the winged-helix domains of Orc1/Cdc6 are positioned on either side of the DUE in a manner where they face each other. Since both regions of Orc1/Cdc6 associate with a minichromosome maintenance (MCM) replicative helicase, this specific arrangement of ORB elements and Orc1/Cdc6 is likely important for loading two MCM complexes symmetrically onto the DUE. Surprisingly, while the ORB DNA sequence determines the directionality of Orc1/Cdc6 binding, the initiator makes relatively few sequence-specific contacts with DNA. However, Orc1/Cdc6 severely underwinds and bends DNA, suggesting that it relies on a mix of both DNA sequence and context-dependent DNA structural features to recognize origins. Notably, base pairing is maintained in the distorted DNA duplex upon Orc1/Cdc6 binding in the crystal structures, whereas biochemical studies have yielded contradictory findings as to whether archaeal initiators can melt DNA similarly to bacterial DnaA. Although the evolutionary kinship of archaeal and eukaryotic initiators and replicative helicases indicates that archaeal MCM is likely loaded onto duplex DNA (see next section), the temporal order of origin melting and helicase loading, as well as the mechanism for origin DNA melting, in archaeal systems remains therefore to be clearly established. Likewise, how exactly the MCM helicase is loaded onto DNA needs to be addressed in future studies.

Eukaryotic

Origin organization and recognition in eukaryotes. Specific DNA elements and epigenetic features involved in ORC recruitment and origin function are summarized for S. cerevisiae, S. pombe, and metazoan origins. A schematic of the ORC architecture is also shown, highlighting the arrangement of the AAA+ and winged-helix domains into a pentameric ring that encircles origin DNA. Ancillary domains of several ORC subunits involved in targeting ORC to origins are included. Other regions in ORC subunits may also be involved in initiator recruitment, either by directly or indirectly associating with partner proteins. A few examples are listed. Note that the BAH domain in S. cerevisiae Orc1 binds nucleosomes but does not recognize H4K20me2.
BAH – bromo-adjacent homology domain, WH – winged-helix domain, TFIIB – transcription factor II B-like domain in Orc6, G4 – G quadruplex, OGRE – origin G-rich repeated element. ORC gene names are indicated by a single number; e.g. 3 refers to ORC3.

Origin organization, specification, and activation in eukaryotes are more complex than in bacterial or archaeal domains and significantly deviate from the paradigm established for prokaryotic replication initiation. The large genome sizes of eukaryotic cells, which range from 12 Mbp in S. cerevisiae to more than 100 Gbp in some plants, necessitates that DNA replication starts at several hundred (in budding yeast) to tens of thousands (in humans) origins to complete DNA replication of all chromosomes during each cell cycle. With the exception of S. cerevisiae and related Saccharomycotina species, eukaryotic origins do not contain consensus DNA sequence elements but their location is influenced by contextual cues such as local DNA topology, DNA structural features, and chromatin environment.

Eukaryotic origin function relies on a conserved initiator protein complex to load replicative helicases onto DNA during the late M and G1 phases of the cell cycle, a step known as origin licensing. In contrast to their bacterial counterparts, replicative helicases in eukaryotes are loaded onto origin duplex DNA in an inactive, double-hexameric form and only a subset of them (10-20% in mammalian cells) is activated during any given S phase, events that are referred to as origin firing.

The location of active eukaryotic origins is therefore determined on at least two different levels, origin licensing to mark all potential origins, and origin firing to select a subset that permits assembly of the replication machinery and initiation of DNA synthesis. The extra licensed origins serve as backup and are activated only upon slowing or stalling of nearby replication forks, ensuring that DNA replication can be completed when cells encounter replication stress. In the absence of stress, firing of extra origins is suppressed by a replication-associated signaling mechanism. Together, the excess of licensed origins and the tight cell cycle control of origin licensing and firing embody two important strategies to prevent under- and overreplication and to maintain the integrity of eukaryotic genomes.

Early studies in S. cerevisiae indicated that replication origins in eukaryotes might be recognized in a DNA-sequence-specific manner analogously to those in prokaryotes. In budding yeast, the search for genetic replicators lead to the identification of autonomously replicating sequences (ARS) that support efficient DNA replication initiation of extrachromosomal DNA. These ARS regions are approximately 100-200 bp long and exhibit a multipartite organization, containing A, B1, B2, and sometimes B3 elements that together are essential for origin function. The A element encompasses the conserved 11 bp ARS consensus sequence (ACS), which, in conjunction with the B1 element, constitutes the primary binding site for the heterohexameric origin recognition complex (ORC), the eukaryotic replication initiator. Within ORC, five subunits are predicated on conserved AAA+ ATPase and winged-helix folds and co-assemble into a pentameric ring that encircles DNA. In budding yeast ORC, DNA binding elements in the ATPase and winged-helix domains, as well as adjacent basic patch regions in some of the ORC subunits, are positioned in the central pore of the ORC ring such that they aid the DNA-sequence-specific recognition of the ACS in an ATP-dependent manner. By contrast, the roles of the B2 and B3 elements are less clear. The B2 region is similar to the ACS in sequence and has been suggested to function as a second ORC binding site under certain conditions, or as a binding site for the replicative helicase core. Conversely, the B3 element recruits the transcription factor Abf1, albeit B3 is not found at all budding yeast origins and Abf1 binding does not appear to be strictly essential for origin function.

Origin recognition in eukaryotes other than S. cerevisiae or its close relatives does not conform to the sequence-specific read-out of conserved origin DNA elements. Pursuits to isolate specific chromosomal replicator sequences more generally in eukaryotic species, either genetically or by genome-wide mapping of initiator binding or replication start sites, have failed to identify clear consensus sequences at origins. Thus, sequence-specific DNA-initiator interactions in budding yeast signify a specialized mode for origin recognition in this system rather than an archetypal mode for origin specification across the eukaryotic domain. Nonetheless, DNA replication does initiate at discrete sites that are not randomly distributed across eukaryotic genomes, arguing that alternative means determine the chromosomal location of origins in these systems. These mechanisms involve a complex interplay between DNA accessibility, nucleotide sequence skew (both AT-richness and CpG islands have been linked to origins), Nucleosome positioning, epigenetic features, DNA topology and certain DNA structural features (e.g., G4 motifs), as well as regulatory proteins and transcriptional interference  Importantly, origin properties vary not only between different origins in an organism and among species, but some can also change during development and cell differentiation. The chorion locus in Drosophila follicle cells constitutes a well-established example for spatial and developmental control of initiation events. This region undergoes DNA-replication-dependent gene amplification at a defined stage during oogenesis and relies on the timely and specific activation of chorion origins, which in turn is regulated by origin-specific cis-elements and several protein factors, including the Myb complex, E2F1, and E2F2. This combinatorial specification and multifactorial regulation of metazoan origins has complicated the identification of unifying features that determine the location of replication start sites across eukaryotes more generally.

To facilitate replication initiation and origin recognition, ORC assemblies from various species have evolved specialized auxiliary domains that are thought to aid initiator targeting to chromosomal origins or chromosomes in general. For example, the Orc4 subunit in S. pombe ORC contains several AT-hooks that preferentially bind AT-rich DNA, while in metazoan (animal) ORC the TFIIB-like domain of Orc6 is thought to perform a similar function. Metazoan Orc1 proteins also harbor a bromo-adjacent homology (BAH) domain that interacts with H4K20me2-nucleosomes. Particularly in mammalian cells, H4K20 methylation has been reported to be required for efficient replication initiation, and the Orc1's BAH domain facilitates ORC association with chromosomes and Epstein-Barr virus origin-dependent replication. Therefore, it is intriguing to speculate that both observations are mechanistically linked at least in a subset of metazoa, but this possibility needs to be further explored in future studies. In addition to the recognition of certain DNA or epigenetic features, ORC also associates directly or indirectly with several partner proteins that could aid initiator recruitment, including LRWD1, PHIP (or DCAF14), HMGA1a, among others. Interestingly, Drosophila ORC, like its budding yeast counterpart, bends DNA and negative supercoiling has been reported to enhance DNA binding of this complex, suggesting that DNA shape and malleability might influence the location of ORC binding sites across metazoan genomes. A molecular understanding for how ORC's DNA binding regions might support the read out of structural properties of the DNA duplex in metazoans rather than of specific DNA sequences as in S. cerevisiae awaits high-resolution structural information of DNA-bound metazoan initiator assemblies. Likewise, whether and how different epigenetic factors contribute to initiator recruitment in metazoan systems is poorly defined and is an important question that needs to be addressed in more detail.

Once recruited to origins, ORC and its co-factors Cdc6 and Cdt1 drive the deposition of the minichromosome maintenance 2-7 (Mcm2-7) complex onto DNA. Like the archaeal replicative helicase core, Mcm2-7 is loaded as a head-to-head double hexamer onto DNA to license origins. In S-phase, Dbf4-dependent kinase (DDK) and Cyclin-dependent kinase (CDK) phosphorylate several Mcm2-7 subunits and additional initiation factors to promote the recruitment of the helicase co-activators Cdc45 and GINS, DNA melting, and ultimately bidirectional replisome assembly at a subset of the licensed origins. In both yeast and metazoans, origins are free or depleted of nucleosomes, a property that is crucial for Mcm2-7 loading, indicating that chromatin state at origins regulates not only initiator recruitment but also helicase loading. A permissive chromatin environment is further important for origin activation and has been implicated in regulating both origin efficiency and the timing of origin firing. Euchromatic origins typically contain active chromatin marks, replicate early, and are more efficient than late-replicating, heterochromatic origins, which conversely are characterized by repressive marks. Not surprisingly, several chromatin remodelers and chromatin-modifying enzymes have been found to associate with origins and certain initiation factors, but how their activities impact different replication initiation events remains largely obscure. Remarkably, cis-acting “early replication control elements” (ECREs) have recently also been identified to help regulate replication timing and to influence 3D genome architecture in mammalian cells. Understanding the molecular and biochemical mechanisms that orchestrate this complex interplay between 3D genome organization, local and higher-order chromatin structure, and replication initiation is an exciting topic for further studies.

Why have metazoan replication origins diverged from the DNA sequence-specific recognition paradigm that determines replication start sites in prokaryotes and budding yeast? Observations that metazoan origins often co-localize with promoter regions in Drosophila and mammalian cells and that replication-transcription conflicts due to collisions of the underlying molecular machineries can lead to DNA damage suggest that proper coordination of transcription and replication is important for maintaining genome stability. Recent findings also point to a more direct role of transcription in influencing the location of origins, either by inhibiting Mcm2-7 loading or by repositioning of loaded Mcm2-7 on chromosomes. Sequence-independent (but not necessarily random) initiator binding to DNA additionally allows for flexibility in specifying helicase loading sites and, together with transcriptional interference and the variability in activation efficiencies of licensed origins, likely determines origin location and contributes to the co-regulation of DNA replication and transcriptional programs during development and cell fate transitions. Computational modeling of initiation events in S. pombe, as well as the identification of cell-type specific and developmentally-regulated origins in metazoans, are in agreement with this notion. However, a large degree of flexibility in origin choice also exists among different cells within a single population, albeit the molecular mechanisms that lead to the heterogeneity in origin usage remain ill-defined. Mapping origins in single cells in metazoan systems and correlating these initiation events with single-cell gene expression and chromatin status will be important to elucidate whether origin choice is purely stochastic or controlled in a defined manner.

Viral

HHV-6 genome
Genome of human herpesvirus-6, a member of the Herpesviridae family. The origin of replication is labeled as "OOR."

Viruses often possess a single origin of replication.

A variety of proteins have been described as being involved in viral replication. For instance, Polyoma viruses utilize host cell DNA polymerases, which attach to a viral origin of replication if the T antigen is present.

Variations

Although DNA replication is essential for genetic inheritance, defined, site-specific replication origins are technically not a requirement for genome duplication as long as all chromosomes are copied in their entirety to maintain gene copy numbers. Certain bacteriophages and viruses, for example, can initiate DNA replication by homologous recombination independent of dedicated origins. Likewise, the archaeon Haloferax volcanii uses recombination-dependent initiation to duplicate its genome when its endogenous origins are deleted. Similar non-canonical initiation events through break-induced or transcription-initiated replication have been reported in E. coli and S. cerevisiae. Nonetheless, despite the ability of cells to sustain viability under these exceptional circumstances, origin-dependent initiation is a common strategy universally adopted across different domains of life.

In addition, detailed studies of replication initiation have focused on a limited number of model systems. The extensively studied fungi and metazoa are both members of the opisthokont supergroup and exemplify only a small fraction of the evolutionary landscape in the eukaryotic domain. Comparably few efforts have been directed at other eukaryotic model systems, such as kinetoplastids or tetrahymena. Surprisingly, these studies have revealed interesting differences both in origin properties and in initiator composition compared to yeast and metazoans.

Phylogenetics

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Phylogenetics

In biology, phylogenetics (/ˌfləˈnɛtɪks, -lə-/) is the study of the evolutionary history of life using genetics, which is known as phylogenetic inference. It establishes the relationship between organisms with the empirical data and observed heritable traits of DNA sequences, protein amino acid sequences, and morphology. The results are a phylogenetic tree—a diagram depicting the hypothetical relationships between organisms and their evolutionary history.

The tips of a phylogenetic tree can be living taxa or fossils, which represent the present time or "end" of an evolutionary lineage, respectively. A phylogenetic diagram can be rooted or unrooted. A rooted tree diagram indicates the hypothetical common ancestor of the tree. An unrooted tree diagram (a network) makes no assumption about the ancestral line, and does not show the origin or "root" of the taxa in question or the direction of inferred evolutionary transformations.

In addition to their use for inferring phylogenetic patterns among taxa, phylogenetic analyses are often employed to represent relationships among genes or individual organisms. Such uses have become central to understanding biodiversity, evolution, ecology, and genomes.

Phylogenetics is a component of systematics that uses similarities and differences of the characteristics of species to interpret their evolutionary relationships and origins. Phylogenetics focuses on whether the characteristics of a species reinforce a phylogenetic inference that it diverged from the most recent common ancestor of a taxonomic group.

In the field of cancer research, phylogenetics can be used to study the clonal evolution of tumors and molecular chronology, predicting and showing how cell populations vary throughout the progression of the disease and during treatment, using whole genome sequencing techniques. The evolutionary processes behind cancer progression are quite different from those in most species and are important to phylogenetic inference; these differences manifest in several areas: the types of aberrations that occur, the rates of mutation, the high heterogeneity (variability) of tumor cell subclones, and the absence of genetic recombination.

Phylogenetics can also aid in drug design and discovery. Phylogenetics allows scientists to organize species and can show which species are likely to have inherited particular traits that are medically useful, such as producing biologically active compounds - those that have effects on the human body. For example, in drug discovery, venom-producing animals are particularly useful. Venoms from these animals produce several important drugs, e.g., ACE inhibitors and Prialt (Ziconotide). To find new venoms, scientists turn to phylogenetics to screen for closely related species that may have the same useful traits. The phylogenetic tree shows which species of fish have an origin of venom, and related fish they may contain the trait. Using this approach in studying venomous fish, biologists are able to identify the fish species that may be venomous. Biologist have used this approach in many species such as snakes and lizards. In forensic science, phylogenetic tools are useful to assess DNA evidence for court cases. The simple phylogenetic tree of viruses A-E shows the relationships between viruses e.g., all viruses are descendants of Virus A.

HIV forensics uses phylogenetic analysis to track the differences in HIV genes and determine the relatedness of two samples. Phylogenetic analysis has been used in criminal trials to exonerate or hold individuals. HIV forensics does have its limitations, i.e., it cannot be the sole proof of transmission between individuals and phylogenetic analysis which shows transmission relatedness does not indicate direction of transmission.

Taxonomy and classification

One small clade of fish, showing how venom has evolved multiple times.

Taxonomy is the identification, naming, and classification of organisms. Compared to systemization, classification emphasizes whether a species has characteristics of a taxonomic group. The Linnaean classification system developed in the 1700s by Carolus Linnaeus is the foundation for modern classification methods. Linnaean classification relies on an organism's phenotype or physical characteristics to group and organize species. With the emergence of biochemistry, organism classifications are now usually based on phylogenetic data, and many systematists contend that only monophyletic taxa should be recognized as named groups. The degree to which classification depends on inferred evolutionary history differs depending on the school of taxonomy: phenetics ignores phylogenetic speculation altogether, trying to represent the similarity between organisms instead; cladistics (phylogenetic systematics) tries to reflect phylogeny in its classifications by only recognizing groups based on shared, derived characters (synapomorphies); evolutionary taxonomy tries to take into account both the branching pattern and "degree of difference" to find a compromise between them.

Inference of a phylogenetic tree

Usual methods of phylogenetic inference involve computational approaches implementing the optimality criteria and methods of parsimony, maximum likelihood (ML), and MCMC-based Bayesian inference. All these depend upon an implicit or explicit mathematical model describing the evolution of characters observed.

Phenetics, popular in the mid-20th century but now largely obsolete, used distance matrix-based methods to construct trees based on overall similarity in morphology or similar observable traits (i.e. in the phenotype or the overall similarity of DNA, not the DNA sequence), which was often assumed to approximate phylogenetic relationships.

Prior to 1950, phylogenetic inferences were generally presented as narrative scenarios. Such methods are often ambiguous and lack explicit criteria for evaluating alternative hypotheses.

Impacts of taxon sampling

In phylogenetic analysis, taxon sampling selects a small group of taxa to represent the evolutionary history of its broader population. This process is also known as stratified sampling or clade-based sampling. The practice occurs given limited resources to compare and analyze every species within a target population. Based on the representative group selected, the construction and accuracy of phylogenetic trees vary, which impacts derived phylogenetic inferences.

Unavailable datasets, such as an organism's incomplete DNA and protein amino acid sequences in genomic databases, directly restrict taxonomic sampling. Consequently, a significant source of error within phylogenetic analysis occurs due to inadequate taxon samples. Accuracy may be improved by increasing the number of genetic samples within its monophyletic group. Conversely, increasing sampling from outgroups extraneous to the target stratified population may decrease accuracy. Long branch attraction is an attributed theory for this occurrence, where nonrelated branches are incorrectly classified together, insinuating a shared evolutionary history.

Percentage of inter-ordinal branches reconstructed with a constant number of bases and four phylogenetic tree construction models; neighbor-joining (NJ), minimum evolution (ME), unweighted maximum parsimony (MP), and maximum likelihood (ML). Demonstrates phylogenetic analysis with fewer taxa and more genes per taxon matches more often with the replicable consensus tree. The dotted line demonstrates an equal accuracy increase between the two taxon sampling methods. Figure is property of Michael S. Rosenberg and Sudhir Kumar as presented in the journal article Taxon Sampling, Bioinformatics, and Phylogenomics.

There are debates if increasing the number of taxa sampled improves phylogenetic accuracy more than increasing the number of genes sampled per taxon. Differences in each method's sampling impact the number of nucleotide sites utilized in a sequence alignment, which may contribute to disagreements. For example, phylogenetic trees constructed utilizing a more significant number of total nucleotides are generally more accurate, as supported by phylogenetic trees' bootstrapping replicability from random sampling.

The graphic presented in Taxon Sampling, Bioinformatics, and Phylogenomics, compares the correctness of phylogenetic trees generated using fewer taxa and more sites per taxon on the x-axis to more taxa and fewer sites per taxon on the y-axis. With fewer taxa, more genes are sampled amongst the taxonomic group; in comparison, with more taxa added to the taxonomic sampling group, fewer genes are sampled. Each method has the same total number of nucleotide sites sampled. Furthermore, the dotted line represents a 1:1 accuracy between the two sampling methods. As seen in the graphic, most of the plotted points are located below the dotted line, which indicates gravitation toward increased accuracy when sampling fewer taxa with more sites per taxon. The research performed utilizes four different phylogenetic tree construction models to verify the theory; neighbor-joining (NJ), minimum evolution (ME), unweighted maximum parsimony (MP), and maximum likelihood (ML). In the majority of models, sampling fewer taxon with more sites per taxon demonstrated higher accuracy.

Generally, with the alignment of a relatively equal number of total nucleotide sites, sampling more genes per taxon has higher bootstrapping replicability than sampling more taxa. However, unbalanced datasets within genomic databases make increasing the gene comparison per taxon in uncommonly sampled organisms increasingly difficult.

History

Overview

The term "phylogeny" derives from the German Phylogenie, introduced by Haeckel in 1866, and the Darwinian approach to classification became known as the "phyletic" approach. It can be traced back to Aristotle, who wrote in his Posterior Analytics, "We may assume the superiority ceteris paribus [other things being equal] of the demonstration which derives from fewer postulates or hypotheses."

Ernst Haeckel's recapitulation theory

The modern concept of phylogenetics evolved primarily as a disproof of a previously widely accepted theory. During the late 19th century, Ernst Haeckel's recapitulation theory, or "biogenetic fundamental law", was widely popular. It was often expressed as "ontogeny recapitulates phylogeny", i.e. the development of a single organism during its lifetime, from germ to adult, successively mirrors the adult stages of successive ancestors of the species to which it belongs. But this theory has long been rejected. Instead, ontogeny evolves – the phylogenetic history of a species cannot be read directly from its ontogeny, as Haeckel thought would be possible, but characters from ontogeny can be (and have been) used as data for phylogenetic analyses; the more closely related two species are, the more apomorphies their embryos share.

Timeline of key points

Branching tree diagram from Heinrich Georg Bronn's work (1858)
Phylogenetic tree suggested by Haeckel (1866)
  • 14th century, lex parsimoniae (parsimony principle), William of Ockam, English philosopher, theologian, and Franciscan friar, but the idea actually goes back to Aristotle, as a precursor concept. He introduced the concept of Occam's razor, which is the problem solving principle that recommends searching for explanations constructed with the smallest possible set of elements. Though he did not use these exact words, the principle can be summarized as "Entities must not be multiplied beyond necessity." The principle advocates that when presented with competing hypotheses about the same prediction, one should prefer the one that requires fewest assumptions.
  • 1763, Bayesian probability, Rev. Thomas Bayes, a precursor concept. Bayesian probability began a resurgence in the 1950s, allowing scientists in the computing field to pair traditional Bayesian statistics with other more modern techniques. It is now used as a blanket term for several related interpretations of probability as an amount of epistemic confidence.
  • 18th century, Pierre Simon (Marquis de Laplace), perhaps first to use ML (maximum likelihood), precursor concept. His work gave way to the Laplace distribution, which can be directly linked to least absolute deviations.
  • 1809, evolutionary theory, Philosophie Zoologique, Jean-Baptiste de Lamarck, precursor concept, foreshadowed in the 17th century and 18th century by Voltaire, Descartes, and Leibniz, with Leibniz even proposing evolutionary changes to account for observed gaps suggesting that many species had become extinct, others transformed, and different species that share common traits may have at one time been a single race, also foreshadowed by some early Greek philosophers such as Anaximander in the 6th century BC and the atomists of the 5th century BC, who proposed rudimentary theories of evolution
  • 1837, Darwin's notebooks show an evolutionary tree
  • 1840, American Geologist Edward Hitchcock published what is considered to be the first paleontological "Tree of Life". Many critiques, modifications, and explanations would follow.
    This chart displays one of the first published attempts at a paleontological "Tree of Life" by Geologist Edward Hitchcock. (1840)
  • 1843, distinction between homology and analogy (the latter now referred to as homoplasy), Richard Owen, precursor concept. Homology is the term used to characterize the similarity of features that can be parsimoniously explained by common ancestry. Homoplasy is the term used to describe a feature that has been gained or lost independently in separate lineages over the course of evolution.
  • 1858, Paleontologist Heinrich Georg Bronn (1800–1862) published a hypothetical tree to illustrating the paleontological "arrival" of new, similar species. following the extinction of an older species. Bronn did not propose a mechanism responsible for such phenomena, precursor concept.
  • 1858, elaboration of evolutionary theory, Darwin and Wallace, also in Origin of Species by Darwin the following year, precursor concept.
  • 1866, Ernst Haeckel, first publishes his phylogeny-based evolutionary tree, precursor concept. Haeckel introduces the now-disproved recapitulation theory. He introduced the term "Cladus" as a taxonomic category just below subphylum.
  • 1893, Dollo's Law of Character State Irreversibility, precursor concept. Dollo's Law of Irreversibility states that "an organism never comes back exactly to its previous state due to the indestructible nature of the past, it always retains some trace of the transitional stages through which it has passed."
  • 1912, ML (maximum likelihood recommended, analyzed, and popularized by Ronald Fisher, precursor concept. Fisher is one of the main contributors to the early 20th-century revival of Darwinism, and has been called the "greatest of Darwin's successors" for his contributions to the revision of the theory of evolution and his use of mathematics to combine Mendelian genetics and natural selection in the 20th century "modern synthesis".
  • 1921, Tillyard uses term "phylogenetic" and distinguishes between archaic and specialized characters in his classification system.
  • 1940, Lucien Cuénot coined the term "clade" in 1940: "terme nouveau de clade (du grec κλάδοςç, branche) [A new term clade (from the Greek word klados, meaning branch)]". He used it for evolutionary branching.
  • 1947, Bernhard Rensch introduced the term Kladogenesis in his German book Neuere Probleme der Abstammungslehre Die transspezifische Evolution, translated into English in 1959 as Evolution Above the Species Level (still using the same spelling).
  • 1949, Jackknife resampling, Maurice Quenouille (foreshadowed in '46 by Mahalanobis and extended in '58 by Tukey), precursor concept.
  • 1950, Willi Hennig's classic formalization. Hennig is considered the founder of phylogenetic systematics, and published his first works in German of this year. He also asserted a version of the parsimony principle, stating that the presence of amorphous characters in different species 'is always reason for suspecting kinship, and that their origin by convergence should not be presumed a priori'. This has been considered a foundational view of phylogenetic inference.
  • 1952, William Wagner's ground plan divergence method.
  • 1957, Julian Huxley adopted Rensch's terminology as "cladogenesis" with a full definition: "Cladogenesis I have taken over directly from Rensch, to denote all splitting, from subspeciation through adaptive radiation to the divergence of phyla and kingdoms." With it he introduced the word "clades", defining it as: "Cladogenesis results in the formation of delimitable monophyletic units, which may be called clades."
  • 1960, Arthur Cain and Geoffrey Ainsworth Harrison coined "cladistic" to mean evolutionary relationship,
  • 1963, first attempt to use ML (maximum likelihood) for phylogenetics, Edwards and Cavalli-Sforza.
  • 1965
    • Camin-Sokal parsimony, first parsimony (optimization) criterion and first computer program/algorithm for cladistic analysis both by Camin and Sokal.
    • Character compatibility method, also called clique analysis, introduced independently by Camin and Sokal (loc. cit.) and E. O. Wilson.
  • 1966
    • English translation of Hennig.
    • "Cladistics" and "cladogram" coined (Webster's, loc. cit.)
  • 1969
    • Dynamic and successive weighting, James Farris.
    • Wagner parsimony, Kluge and Farris.
    • CI (consistency index), Kluge and Farris.
    • Introduction of pairwise compatibility for clique analysis, Le Quesne.
  • 1970, Wagner parsimony generalized by Farris.
  • 1971
    • First successful application of ML (maximum likelihood) to phylogenetics (for protein sequences), Neyman.
    • Fitch parsimony, Walter M. Fitch. These gave way to the most basic ideas of maximum parsimony. Fitch is known for his work on reconstructing phylogenetic trees from protein and DNA sequences. His definition of orthologous sequences has been referenced in many research publications.
    • NNI (nearest neighbour interchange), first branch-swapping search strategy, developed independently by Robinson and Moore et al.
    • ME (minimum evolution), Kidd and Sgaramella-Zonta (it is unclear if this is the pairwise distance method or related to ML as Edwards and Cavalli-Sforza call ML "minimum evolution").
  • 1972, Adams consensus, Adams.
  • 1976, prefix system for ranks, Farris.
  • 1977, Dollo parsimony, Farris.
  • 1979
    • Nelson consensus, Nelson.
    • MAST (maximum agreement subtree)((GAS) greatest agreement subtree), a consensus method, Gordon.
    • Bootstrap, Bradley Efron, precursor concept.
  • 1980, PHYLIP, first software package for phylogenetic analysis, Joseph Felsenstein. A free computational phylogenetics package of programs for inferring evolutionary trees (phylogenies). One such example tree created by PHYLIP, called a "drawgram", generates rooted trees. This image shown in the figure below shows the evolution of phylogenetic trees over time.
  • 1981
    • Majority consensus, Margush and MacMorris.
    • Strict consensus, Sokal and Rohlf
      This image depicts a PHYLIP generated drawgram. This drawgram is an example of one of the possible trees the software is capable of generating.
      first computationally efficient ML (maximum likelihood) algorithm. Felsenstein created the Felsenstein Maximum Likelihood method, used for the inference of phylogeny which evaluates a hypothesis about evolutionary history in terms of the probability that the proposed model and the hypothesized history would give rise to the observed data set.
  • 1982
    • PHYSIS, Mikevich and Farris
    • Branch and bound, Hendy and Penny
  • 1985
    • First cladistic analysis of eukaryotes based on combined phenotypic and genotypic evidence Diana Lipscomb.
    • First issue of Cladistics.
    • First phylogenetic application of bootstrap, Felsenstein.
    • First phylogenetic application of jackknife, Scott Lanyon.
  • 1986, MacClade, Maddison and Maddison.
  • 1987, neighbor-joining method Saitou and Nei
  • 1988, Hennig86 (version 1.5), Farris
    • Bremer support (decay index), Bremer.
  • 1989
    • RI (retention index), RCI (rescaled consistency index), Farris.
    • HER (homoplasy excess ratio), Archie.
  • 1990
    • combinable components (semi-strict) consensus, Bremer.
    • SPR (subtree pruning and regrafting), TBR (tree bisection and reconnection), Swofford and Olsen.
  • 1991
    • DDI (data decisiveness index), Goloboff.
    • First cladistic analysis of eukaryotes based only on phenotypic evidence, Lipscomb.
  • 1993, implied weighting Goloboff.
  • 1994, reduced consensus: RCC (reduced cladistic consensus) for rooted trees, Wilkinson.
  • 1995, reduced consensus RPC (reduced partition consensus) for unrooted trees, Wilkinson.
  • 1996, first working methods for BI (Bayesian Inference) independently developed by Li, Mau, and Rannala and Yang and all using MCMC (Markov chain-Monte Carlo).
  • 1998, TNT (Tree Analysis Using New Technology), Goloboff, Farris, and Nixon.
  • 1999, Winclada, Nixon.
  • 2003, symmetrical resampling, Goloboff.
  • 2004, 2005, similarity metric (using an approximation to Kolmogorov complexity) or NCD (normalized compression distance), Li et al., Cilibrasi and Vitanyi.

Uses of phylogenetic analysis

Pharmacology

One use of phylogenetic analysis involves the pharmacological examination of closely related groups of organisms. Advances in cladistics analysis through faster computer programs and improved molecular techniques have increased the precision of phylogenetic determination, allowing for the identification of species with pharmacological potential.

Historically, phylogenetic screens for pharmacological purposes were used in a basic manner, such as studying the Apocynaceae family of plants, which includes alkaloid-producing species like Catharanthus, known for producing vincristine, an antileukemia drug. Modern techniques now enable researchers to study close relatives of a species to uncover either a higher abundance of important bioactive compounds (e.g., species of Taxus for taxol) or natural variants of known pharmaceuticals (e.g., species of Catharanthus for different forms of vincristine or vinblastine).

Biodiversity

Phylogenetic analysis has also been applied to biodiversity studies within the fungi family. Phylogenetic analysis helps understand the evolutionary history of various groups of organisms, identify relationships between different species, and predict future evolutionary changes. Emerging imagery systems and new analysis techniques allow for the discovery of more genetic relationships in biodiverse fields, which can aid in conservation efforts by identifying rare species that could benefit ecosystems globally.

Phylogenetic Subtree of fungi containing different biodiverse sections of the fungi group.

Infectious disease epidemiology

Whole-genome sequence data from outbreaks or epidemics of infectious diseases can provide important insights into transmission dynamics and inform public health strategies. Traditionally, studies have combined genomic and epidemiological data to reconstruct transmission events. However, recent research has explored deducing transmission patterns solely from genomic data using phylodynamics, which involves analyzing the properties of pathogen phylogenies. Phylodynamics uses theoretical models to compare predicted branch lengths with actual branch lengths in phylogenies to infer transmission patterns. Additionally, coalescent theory, which describes probability distributions on trees based on population size, has been adapted for epidemiological purposes. Another source of information within phylogenies that has been explored is "tree shape." These approaches, while computationally intensive, have the potential to provide valuable insights into pathogen transmission dynamics.

Pathogen Transmission Trees

The structure of the host contact network significantly impacts the dynamics of outbreaks, and management strategies rely on understanding these transmission patterns. Pathogen genomes spreading through different contact network structures, such as chains, homogeneous networks, or networks with super-spreaders, accumulate mutations in distinct patterns, resulting in noticeable differences in the shape of phylogenetic trees, as illustrated in Fig. 1. Researchers have analyzed the structural characteristics of phylogenetic trees generated from simulated bacterial genome evolution across multiple types of contact networks. By examining simple topological properties of these trees, researchers can classify them into chain-like, homogeneous, or super-spreading dynamics, revealing transmission patterns. These properties form the basis of a computational classifier used to analyze real-world outbreaks. Computational predictions of transmission dynamics for each outbreak often align with known epidemiological data.

Graphical Representation of Phylogenetic Tree analysis

Different transmission networks result in quantitatively different tree shapes. To determine whether tree shapes captured information about underlying disease transmission patterns, researchers simulated the evolution of a bacterial genome over three types of outbreak contact networks—homogeneous, super-spreading, and chain-like. They summarized the resulting phylogenies with five metrics describing tree shape. Figures 2 and 3 illustrate the distributions of these metrics across the three types of outbreaks, revealing clear differences in tree topology depending on the underlying host contact network.

Super-spreader networks give rise to phylogenies with higher Colless imbalance, longer ladder patterns, lower Δw, and deeper trees than those from homogeneous contact networks. Trees from chain-like networks are less variable, deeper, more imbalanced, and narrower than those from other networks.

Scatter plots can be used to visualize the relationship between two variables in pathogen transmission analysis, such as the number of infected individuals and the time since infection. These plots can help identify trends and patterns, such as whether the spread of the pathogen is increasing or decreasing over time, and can highlight potential transmission routes or super-spreader events. Box plots displaying the range, median, quartiles, and potential outliers datasets can also be valuable for analyzing pathogen transmission data, helping to identify important features in the data distribution. They may be used to quickly identify differences or similarities in the transmission data.

Disciplines other than biology

Phylogeny of Indo-European languages

Phylogenetic tools and representations (trees and networks) can also be applied to philology, the study of the evolution of oral languages and written text and manuscripts, such as in the field of quantitative comparative linguistics.

Computational phylogenetics can be used to investigate a language as an evolutionary system. The evolution of human language closely corresponds with human's biological evolution which allows phylogenetic methods to be applied. The concept of a "tree" serves as an efficient way to represent relationships between languages and language splits. It also serves as a way of testing hypotheses about the connections and ages of language families. For example, relationships among languages can be shown by using cognates as characters. The phylogenetic tree of Indo-European languages shows the relationships between several of the languages in a timeline, as well as the similarity between words and word order.

There are three types of criticisms about using phylogenetics in philology, the first arguing that languages and species are different entities, therefore you can not use the same methods to study both. The second being how phylogenetic methods are being applied to linguistic data. And the third, discusses the types of data that is being used to construct the trees.

Bayesian phylogenetic methods, which are sensitive to how treelike the data is, allow for the reconstruction of relationships among languages, locally and globally. The main two reasons for the use of Bayesian phylogenetics are that (1) diverse scenarios can be included in calculations and (2) the output is a sample of trees and not a single tree with true claim.

The same process can be applied to texts and manuscripts. In Paleography, the study of historical writings and manuscripts, texts were replicated by scribes who copied from their source and alterations - i.e., 'mutations' - occurred when the scribe did not precisely copy the source.

Phylogenetics has been applied to archaeological artefacts such as the early hominin hand-axes, late Palaeolithic figurines, Neolithic stone arrowheads, Bronze Age ceramics, and historical-period houses. Bayesian methods have also been employed by archaeologists in an attempt to quantify uncertainty in the tree topology and divergence times of stone projectile point shapes in the European Final Palaeolithic and earliest Mesolithic.

Cephalization

From Wikipedia, the free encyclopedia

A lobster is heavily cephalized, with eyes, antennae, multiple mouthparts, and the brain (inside the armoured exoskeleton), all concentrated at the animal's head end.

Cephalization is an evolutionary trend in animals that, over a sufficient number of generations, concentrates the special sense organs and nerve ganglia towards the front of the body where the mouth is located, often producing an enlarged head. This is associated with the animal's movement direction and bilateral symmetry. Cephalization of the nervous system has led to the formation of a brain with varying degrees of functional centralization in three phyla of bilaterian animals, namely the arthropods, cephalopod molluscs, and vertebrates. Hox genes organise aspects of cephalization in the bilaterians.

Bilateria

Idealised bilaterian body plan. With a cylindrical body (in the main clade, the nephrozoa) and a direction of travel, the animal has head and tail ends, favouring cephalization by natural selection. Sense organs, brain, and mouth form the basis of the head.

Cephalization is both a characteristic feature of any animal that habitually moves in one direction, thereby gaining a front end, and an evolutionary trend which created the head of these animals. In practice, this primarily means the bilaterians, a large group containing the majority of animal phyla. These have the ability to move, using muscles, and a body plan with a front end that encounters stimuli first as the animal moves forwards, and accordingly has evolved to contain many of the body's sense organs, able to detect light, chemicals, and gravity. There is often a collection of nerve cells able to process the information from these sense organs, forming a brain in several phyla and one or more ganglia (clusters of nerve cells) in others.

Complex active bodies

The philosopher Michael Trestman noted that three bilaterian phyla, namely the arthropods, the molluscs in the shape of the cephalopods, and the chordates, were distinctive in having "complex active bodies", something that the acoels and flatworms did not have. Any such animal, whether predator or prey, has to be aware of its environment—to catch its prey, or to evade its predators. These groups are exactly those that are most highly cephalized. These groups, however, are not closely related: in fact, they represent widely separated branches of the Bilateria, as shown on the phylogenetic tree; their lineages split hundreds of millions of years ago. Other (less cephalized) phyla are omitted for clarity.

Arthropods

In arthropods, cephalization progressed with the gradual incorporation of trunk segments into the head region. This was advantageous because it allowed for the evolution of more effective mouth-parts for capturing and processing food. Insects are strongly cephalized, their brain made of three fused ganglia attached to the ventral nerve cord, which in turn has a pair of ganglia in each segment of the thorax and abdomen, the parts of the trunk behind the head. The insect head is an elaborate structure made of several segments fused rigidly together, and equipped with both simple and compound eyes, and multiple appendages including sensory antennae and complex mouthparts (maxillae and mandibles).

Cephalopods like this cuttlefish have advanced 'camera' eyes. The cuttlefish has a W-shaped pupil.

Cephalopods

Cephalopods including the octopus, squid, cuttlefish and nautilus are the most intelligent of molluscs. They are highly cephalized, with well-developed senses, including advanced 'camera' eyes and large brains.

Vertebrates

Cephalization in vertebrates, the group that includes mammals, birds, reptiles, amphibians and fishes, has been studied extensively. The heads of vertebrates are complex structures, with distinct sense organs for sight, olfaction, and hearing, and a large, multi-lobed brain protected by a skull of bone or cartilage. Cephalochordates like the lancelet (Amphioxus), a small fishlike animal with very little cephalization, are closely related to vertebrates but do not have these structures. In the 1980s, the new head hypothesis proposed that the vertebrate head is an evolutionary novelty resulting from the emergence of neural crest and cranial placodes (thickened areas of the embryonic ectoderm layer), which result in the formation of all sense organs outside the brain. However, in 2014, a transient larva tissue of the lancelet was found to be virtually indistinguishable from the neural crest-derived cartilage (which becomes bone in jawed animals) which forms the vertebrate skull, suggesting that persistence of this tissue and expansion into the entire head space could be a viable evolutionary route to forming the vertebrate head. Advanced vertebrates have increasingly elaborate brains.

Idealised vertebrate body plan, showing brain and sense organs at the head end

Anterior Hox genes

Bilaterians have many more Hox genes controlling the development, including of the front of the body than do the less cephalized Cnidaria (two Hox clusters) and the Acoelomorpha (three Hox clusters). In the vertebrates, duplication resulted in the four Hox clusters (HoxA to HoxD) of mammals and birds, while another duplication gave teleost fishes eight Hox clusters. Some of these genes, those responsible for the front (anterior) of the body, helped to create the heads of both arthropods and vertebrates. However, the Hox1-5 genes were already present in ancestral arthropods and vertebrates that did not have complex head structures. The Hox genes therefore most likely assisted in cephalization of these two bilaterian groups independently by convergent evolution, resulting in similar gene networks.

Partly cephalized phyla

The gold-speckled flatworm, Thysanozoon nigropapillosum, is somewhat cephalized, with a distinct head end (at right) which has pseudotentacles and an photoreceptive eyespot.

The Acoela are basal bilaterians, part of the Xenacoelomorpha. They are small and simple animals with flat bodies. They have slightly more nerve cells at the head end than elsewhere, not forming a distinct and compact brain. This represents an early stage in cephalization.

Also among the bilaterians, Platyhelminthes (flatworms) have a more complex nervous system than the Acoela, and are lightly cephalized, for instance having an eyespot above the brain, near the front end.

Among animals without bilateral symmetry, the Cnidaria, such as the radially symmetrical (roughly cylindrical) Hydrozoa, show some degree of cephalization. The Anthomedusae have a head end with their mouth, photoreceptor cells, and a concentration of nerve cells.

Landscape-scale conservation

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Landscape-scale_conservation ...