Estimating Species Trees

Estimating Species Trees
Author: L. Lacey Knowles
Publisher: John Wiley and Sons
Total Pages: 230
Release: 2011-05-09
Genre: Science
ISBN: 1118126025

Download Estimating Species Trees Book in PDF, Epub and Kindle

Recent computational and modeling advances have produced methods for estimating species trees directly, avoiding the problems and limitations of the traditional phylogenetic paradigm where an estimated gene tree is equated with the history of species divergence. The overarching goal of the volume is to increase the visibility and use of these new methods by the entire phylogenetic community by specifically addressing several challenges: (i) firm understanding of the theoretical underpinnings of the methodology, (ii) empirical examples demonstrating the utility of the methodology as well as its limitations, and (iii) attention to technical aspects involved in the actual software implementation of the methodology. As such, this volume will not only be poised to become the quintessential guide to training the next generation of researchers, but it will also be instrumental in ushering in a new phylogenetic paradigm for the 21st century.

Estimation of Species Tree Using Approximate Bayesian Computation

Estimation of Species Tree Using Approximate Bayesian Computation
Author: Hang Fan
Publisher:
Total Pages: 27
Release: 2010
Genre:
ISBN:

Download Estimation of Species Tree Using Approximate Bayesian Computation Book in PDF, Epub and Kindle

Abstract: Development of methods for estimating species trees from multilocus data is a current challenge in evolutionary biology. We propose a method for estimating the species tree topology and branch lengths using Approximate Bayesian Computation (ABC). The method takes as data a sample of observed gene tree topologies without branch lengths, and then iterates through the following sequence of steps: First, a randomly selected species tree is used to compute the distribution of gene trees topologies. This distribution is then compared to the observed gene topology frequencies, and if the fit between the observed and the predicted distribution is close enough, the proposed species tree is retained. Repeating this many times leads to a collection of retained species trees that are then used to form the estimate of the overall species tree. We test the performance of the method, which we call ST-ABC, using both simulated and empirical data. The simulation study examines both symmetric and asymmetric species trees over a range of branch lengths and sample sizes. The results from the simulation study show that the model performs very well, giving accurate estimates for both the topology and the branch lengths across the conditions studied, and that a sample size of 25 loci appears to be adequate for the method. Further, we apply the method to two empirical cases: a 4-taxon data set for primates and a 7-taxon data set for yeast. In both cases, we find that estimates obtained with ST-ABC agree with previous studies. Thus, our method is able to deal with complex data in a timely and efficient way. In addition, the method does not require sequence data, but rather uses the observed distribution of gene topologies. Therefore, this method provides a nice alternative to other currently available methods for species tree estimation.

Estimating Species Trees

Estimating Species Trees
Author: L. Lacey Knowles
Publisher: John Wiley & Sons
Total Pages: 332
Release: 2011-09-20
Genre: Science
ISBN: 1118211405

Download Estimating Species Trees Book in PDF, Epub and Kindle

Recent computational and modeling advances have produced methods for estimating species trees directly, avoiding the problems and limitations of the traditional phylogenetic paradigm where an estimated gene tree is equated with the history of species divergence. The overarching goal of the volume is to increase the visibility and use of these new methods by the entire phylogenetic community by specifically addressing several challenges: (i) firm understanding of the theoretical underpinnings of the methodology, (ii) empirical examples demonstrating the utility of the methodology as well as its limitations, and (iii) attention to technical aspects involved in the actual software implementation of the methodology. As such, this volume will not only be poised to become the quintessential guide to training the next generation of researchers, but it will also be instrumental in ushering in a new phylogenetic paradigm for the 21st century.

Estimating Species Trees from Gene Trees Despite Gene Tree Incongruence Under Realistic Model Conditions

Estimating Species Trees from Gene Trees Despite Gene Tree Incongruence Under Realistic Model Conditions
Author: Md. Shamsuzzoha Bayzid
Publisher:
Total Pages: 732
Release: 2016
Genre:
ISBN:

Download Estimating Species Trees from Gene Trees Despite Gene Tree Incongruence Under Realistic Model Conditions Book in PDF, Epub and Kindle

Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. With the rapid growth rate of newly sequenced genomes, species tree inference from multiple genes has become one of the basic and popular tasks in comparative and evolutionary biology. However, combining data on multiple genes is not a trivial task since genes evolve through biological processes that include deep coalescence (also known as incomplete lineage sorting (ILS)), duplication and loss, horizontal gene transfer etc., so that the individual gene histories can differ from each other. In this dissertation, we focus on making advances on phylogenomic analyses with particular attention to the gene tree discordance. In addition to gene tree discordance, we consider other challenging conditions that frequently arise in genome scale data. One of these major challenges is incomplete gene trees, meaning that not all gene trees have individuals from all the species. We performed an extensive simulation study under the multi-species coalescent (MSC) model that shows that existing methods have poor accuracy when gene trees are incomplete. We formalized the optimal completion problem, which seeks to add the missing taxa (species) into the gene trees with respect to a species tree such that the distance (in terms of ILS) between the gene tree and the species tree is minimized. We developed an algorithm for solving this problem. We formalized optimization problems in the context of species tree estimation from a set of incomplete gene trees under the multi-species coalescent model, and proposed algorithms for solving these problems. We formulated different mathematical models for “gene loss” based on different reasons for incompleteness. Next, we addressed the Minimize Gene Duplication (MGD) problem, that seeks to find a species tree from a set of gene trees so as to minimize the total number of duplications needed to explain the evolutionary history. We proposed exact and heuristic algorithms to solve this NP-hard problem. Next, we showed in a comprehensive experimental study that existing methods are susceptible to poorly estimated gene trees in the presence of ILS. We proposed a new technique called “binning” that dramatically improves the performance of species tree estimation methods when gene trees are poorly estimated. We developed a novel technique called “naive binning” and subsequently proposed an improved version called “weighted statistical binning” to address the problem of gene tree estimation error. Finally, we addressed the computational challenges to reconstruct highly accurate species tree from large scale genomic data. We developed divide-and-conquer based meta-methods that can make existing methods scalable to very large datasets (in terms of the number of species). Overall, this dissertation contributes to understanding the limitations of the existing methods under realistic model conditions, developing new approaches to handle the challenging issues that frequently arise in phylogenomics, and improving and scaling the existing methods to larger datasets.

An Examination of the Transfer of Errors to Species Tree Estimation Caused by Model Selection in Gene Tree Estimation

An Examination of the Transfer of Errors to Species Tree Estimation Caused by Model Selection in Gene Tree Estimation
Author: Nevada Basdeo
Publisher:
Total Pages: 130
Release: 2017
Genre:
ISBN:

Download An Examination of the Transfer of Errors to Species Tree Estimation Caused by Model Selection in Gene Tree Estimation Book in PDF, Epub and Kindle

Inferences from phylogenetic trees is useful in forensic science, bioinformatics, identifying pathogens, and other applications. Thus, building accurate trees is important. Research on nucleotides substitution models has shown the models to be robust for estimating gene trees, but the effects on estimating species trees has not been examined. Cumulative errors on gene tree estimation can transfer over to species tree estimation. Even if the errors are small on each estimated gene tree, they can add up and have a significant impact on accuracy of species tree estimation. In part one of this research, simulations were used to explore how wrongly specified models affect species tree estimation. In part two, data from Austrian finches were used to explore the error of estimation in 30 genes. We found that the models we used in the simulations were robust in species tree estimation. In the finch data, 24 of the 30 estimated genes had a significant chi-square, meaning the 24 genes did not fit the data well. Genes with high GC content appear to have large residuals. Almost all of the residuals were positive suggesting that the evolutionary models were underestimating the frequency of most patterns. Having a vast majority of the genes not being correctly modeled, leads to the adage 'garbage in, garbage out, ' in reference to building a species tree. For improvements, models should better address genes with high GC content and address the under-fitting issue. Due to computational constraints, the results of the simulations may have been affected by the sample size of genes. The simulations might need a bigger sample size of genes to detect an error in species tree estimation if a true error existed.

Species Tree Inference

Species Tree Inference
Author: Laura Kubatko
Publisher: Princeton University Press
Total Pages: 352
Release: 2023-03-14
Genre: Science
ISBN: 0691207607

Download Species Tree Inference Book in PDF, Epub and Kindle

"Inferring evolutionary relationships among a collection of organisms -- that is, their relationship to each other on the tree of life -- remains a central focus of much of evolutionary biology as these relationships provide the background for key hypotheses. For example, support for different hypotheses about early animal evolution are contingent upon the phylogenetic relationships among the earliest animal lineages. Within the last 20 years, the field of phylogenetics has grown rapidly, both in the quantity of data available for inference and in the number of methods available for phylogenetic estimation. The authors' first book, "Estimating Species Trees: Practical and Theoretical Aspects", published in 2010, gave an overview of the state of phylogenetic practice for analyzing data at the time, but much has changed since then. The goal of this book is to serve as an updated reference on current methods within the field. The book is organized in three sections, the first of which provides an overview of the analytical and methodological developments of species tree inference. Section two focuses on empirical inference. Section three explores various applications of species trees in evolutionary biology. The combination of theoretical and empirical approaches is meant to provide readers with a level of knowledge of both the advances and limitations of species-tree inference that can help researchers in applying the methods, while also inspiring future advances among those researchers with an interest in methodological development"--

Handbook of Statistical Genomics

Handbook of Statistical Genomics
Author: David J. Balding
Publisher: John Wiley & Sons
Total Pages: 1828
Release: 2019-07-09
Genre: Science
ISBN: 1119429250

Download Handbook of Statistical Genomics Book in PDF, Epub and Kindle

A timely update of a highly popular handbook on statistical genomics This new, two-volume edition of a classic text provides a thorough introduction to statistical genomics, a vital resource for advanced graduate students, early-career researchers and new entrants to the field. It introduces new and updated information on developments that have occurred since the 3rd edition. Widely regarded as the reference work in the field, it features new chapters focusing on statistical aspects of data generated by new sequencing technologies, including sequence-based functional assays. It expands on previous coverage of the many processes between genotype and phenotype, including gene expression and epigenetics, as well as metabolomics. It also examines population genetics and evolutionary models and inference, with new chapters on the multi-species coalescent, admixture and ancient DNA, as well as genetic association studies including causal analyses and variant interpretation. The Handbook of Statistical Genomics focuses on explaining the main ideas, analysis methods and algorithms, citing key recent and historic literature for further details and references. It also includes a glossary of terms, acronyms and abbreviations, and features extensive cross-referencing between chapters, tying the different areas together. With heavy use of up-to-date examples and references to web-based resources, this continues to be a must-have reference in a vital area of research. Provides much-needed, timely coverage of new developments in this expanding area of study Numerous, brand new chapters, for example covering bacterial genomics, microbiome and metagenomics Detailed coverage of application areas, with chapters on plant breeding, conservation and forensic genetics Extensive coverage of human genetic epidemiology, including ethical aspects Edited by one of the leading experts in the field along with rising stars as his co-editors Chapter authors are world-renowned experts in the field, and newly emerging leaders. The Handbook of Statistical Genomics is an excellent introductory text for advanced graduate students and early-career researchers involved in statistical genetics.