Search This Blog

Monday, February 3, 2014

A nested phylogenetic reconstruction approach provides scalable resolution in the eukaryotic Tree Of Life

 

Author and article information

Abstract

 
Assembling the Tree Of Life (TOL) faces the pressing challenge of incorporating a rapidly growing number of sequenced genomes. This problem is exacerbated by the fact that different sets of genes are informative at different evolutionary scales. Here, we present a novel phylogenetic approach ( N ested P hylogenetic R econstruction) in which each tree node is optimized based on the genes shared at that taxonomic level. We apply such procedure to reconstruct a 216-species eukaryotic TOL and compare it with a standard concatenation-based approach. The resulting topology is highly accurate, and reveals general trends such as the relationship between branch lengths and genome content in eukaryotes. The approach lends itself to continuous update, and we show this by adding 29 and 173 newly-sequenced species in two consecutive steps. The proposed approach, which has been implemented in a fully-automated pipeline, enables the reconstruction and continuous update of highly-resolved phylogenies of sequenced organisms.
Cite this as
Huerta-Cepas J, Marcet-Houben M, Gabaldón T. (2014) A nested phylogenetic reconstruction approach provides scalable resolution in the eukaryotic Tree Of Life. PeerJ PrePrints 2:e223v1
View PrePrint PDF

Sections

 

Supplemental Information

Schematic representation of the nested phylogenetic reconstruction approach.

Schematic representation of the nested phylogenetic reconstruction approach. First, a starting unrooted tree is reconstructed including all species (iteration 0, red node in panel A) and using a Gene Concatenation Methodology (GCM, panel C). GCM includes: C1) searching for groups of one-to-one orthologs (Ortholog Groups, OGs), C2) reconstruction of multiple sequence alignments of each OG, C3) phylogenetic reconstruction for each single OG, C4) concatenation of OG alignments, C5) species tree reconstruction based on the concatenated alignment. Secondly, the first resulting tree is split into two well supported clades, each of them defining a subset of species. GCM is then applied to each of the new sets of organisms, including four extra species as rooting anchors. As a result, two new trees are obtained (iteration 1, blue nodes in panel A). Subsequently, each of the new sub-trees is rooted using their anchor species (C6) and split into its two major clades (C7). The four resulting partitions (iteration 2, green nodes in panel A) are used to continue the same procedure until reaching a given limit for the size (number of species) in the recomputed partitions (panel B). An animation showing how the tree is re-shaped at each iteration can be seen at http://tol.cgenomics.org/TOL_animation.gif .
DOI: 10.7287/peerj.preprints.223v1/supp-1

TOL analyses I

TOL analyses I: A-B) Grey lines represent topological distance between reference trees and the TOL (A-Chordates, B-Fungi, see Figure S5). Black line represents the number of protein families used at each iteration. C) Number of NCBI taxonomic groups not recovered at each iteration.
DOI: 10.7287/peerj.preprints.223v1/supp-2

Supplementary data

Supplementary methods, figures and tables
DOI: 10.7287/peerj.preprints.223v1/supp-5

Additional Information

Competing Interests

The authors declare they have no competing interests.

Author Contributions

Jaime Huerta-Cepas conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
Marina Marcet-Houben conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
Toni Gabaldón conceived and designed the experiments, analyzed the data, wrote the paper, reviewed drafts of the paper.

Grant Disclosures

The following grant information was disclosed by the authors:
Spanish Ministry of Economy and Competitiveness (BIO2012-37161) and (JCI2010-07614)
The European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013)/ERC Grant agreement n. 310325.

Funding

We acknowledge funding from the Spanish Ministry of Economy and Competitiveness to TG (BIO2012-37161) and to JHC (Subprograma Juan de la Cierva: JCI2010-07614), and the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013)/ERC Grant agreement n. 310325. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Archetype

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Archetype The concept of an archetyp...