Abstract
Abstract
Just exactly which tree(s) should we assume when testing evolutionary hypotheses? This question has plagued comparative biologists for decades. Though all phylogenetic comparative methods require input trees, we seldom know with certainty whether even a perfectly estimated tree (if this is possible in practice) is appropriate for our studied traits. Yet, we also know that phylogenetic conflict is ubiquitous in modern comparative biology, and we are still learning about its dangers when testing evolutionary hypotheses. Here, we investigate the consequences of tree-trait mismatch for phylogenetic regression in the presence of gene tree–species tree conflict. Our simulation experiments reveal excessively high false positive rates for mismatched models with both small and large trees, simple and complex traits, and known and estimated phylogenies. In some cases, we find evidence of a directionality of error: assuming a species tree for traits that evolved according to a gene tree sometimes fares worse than the opposite. We also explored the impacts of tree choice using an expansive, cross-species gene expression dataset as an arguably “best-case” scenario in which one may have a better chance of matching tree with trait. Offering a potential path forward, we found promise in the application of a robust estimator as a potential, albeit imperfect, solution to some issues raised by tree mismatch. Collectively, our results emphasize the importance of careful study design for comparative methods, highlighting the need to fully appreciate the role of accurate and thoughtful phylogenetic modeling.
Topics

No keywords indexed for this article. Browse by subject →

References
123
[1]
Adams "Comparing evolutionary rates for different phenotypic traits on a phylogeny using likelihood" Syst Biol (2013) 10.1093/sysbio/sys083
[2]
Adams "Of traits and trees: probabilistic distances under continuous trait models for dissecting the interplay among phylogeny, model, and data" Syst Biol (2021) 10.1093/sysbio/syab009
[3]
Adams "Microsatellite landscape evolutionary dynamics across 450 million years of vertebrate genome evolution" Genome (2016) 10.1139/gen-2015-0124
[4]
Adams "Robust phylogenetic regression" Syst Biol (2024) 10.1093/sysbio/syad070
[5]
Adams "Assessing the impacts of positive selection on coalescent-based species tree estimation and species delimitation" Syst Biol (2018) 10.1093/sysbio/syy034
[6]
Al-Kahtani "Kidney mass and relative medullary thickness of rodents in relation to habitat, body size, and phylogeny" Physiol Biochem Zool (2004) 10.1086/420941
[7]
Assis "Lineage-specific expression divergence in grasses is associated with male reproduction, host-pathogen defense, and domestication" Genome Biol Evol (2019) 10.1093/gbe/evy245
[8]
Assis "Conserved proteins are fragile" Mol Biol Evol (2014) 10.1093/molbev/mst217
[9]
Avise "Hemiplasy: a new term in the lexicon of phylogenetics" Syst Biol (2008) 10.1080/10635150802164587
[10]
Bastide "Efficient Bayesian inference of general Gaussian models on large phylogenetic trees" Ann Appl Stat (2021) 10.1214/20-aoas1419
[11]
Bastide "A phylogenetic framework to simulate synthetic interspecies RNA-seq data" Mol Biol Evol (2023) 10.1093/molbev/msac269
[12]
Beaulieu "Modeling stabilizing selection: expanding the Ornstein–Uhlenbeck model of adaptive evolution" Evolution (2012) 10.1111/j.1558-5646.2012.01619.x
[13]
Bertram "CAGEE: computational analysis of gene expression evolution" Mol Biol Evol (2023) 10.1093/molbev/msad106
[14]
Blanchette "Reconstructing large regions of an ancestral mammalian genome in silico" Genome Res (2004) 10.1101/gr.2800104
[15]
Blomberg "Testing for phylogenetic signal in comparative data: behavioral traits are more labile" Evolution (2003) 10.1111/j.0014-3820.2003.tb00285.x
[16]
Blomberg "Independent contrasts and PGLS regression estimators are equivalent" Syst Biol (2012) 10.1093/sysbio/syr118
[17]
Blomberg "Beyond Brownian motion and the Ornstein-Uhlenbeck process: stochastic diffusion models for the evolution of quantitative characters" Am Nat (2020) 10.1086/706339
[18]
Boettiger "Is your phylogeny informative? Measuring the power of comparative methods" Evolution (2012) 10.1111/j.1558-5646.2011.01574.x
[19]
Bonferroni "Teoria statistica delle classi e calcolo delle probabilita" Pubbl del R Ist Super di Sci Econ e Commericiali di Firenze (1936)
[20]
Borges "Nucleotide usage biases distort inferences of the species tree" Genome Biol Evol (2022) 10.1093/gbe/evab290
[21]
Brahmantio "Bayesian inference of mixed Gaussian phylogenetic models" arXiv:2410.11548 (2024)
[22]
Brandt "The promise of inferring the past using the ancestral recombination graph" Genome Biol Evol (2024) 10.1093/gbe/evae005
[23]
The evolution of gene expression levels in mammalian organs

David Brawand, Magali Soumillon, Anamaria Necsulea et al.

Nature 2011 10.1038/nature10532
[24]
Brawand "The genomic substrate for adaptive radiation in African cichlid fish" Nature (2014) 10.1038/nature13726
[25]
Chen "Phylogenetic comparative analysis of single-cell transcriptomes reveals constrained accumulation of gene expression heterogeneity during clonal expansion" Mol Biol and Evol (2023) 10.1093/molbev/msad113
[26]
Chen "A quantitative framework for characterizing the evolutionary history of mammalian gene expression" Genome Res (2019) 10.1101/gr.237636.118
[27]
Cressler "Detecting adaptive evolution in phylogenetic comparative analysis using the Ornstein–Uhlenbeck model" Syst Biol (2015) 10.1093/sysbio/syv043
[28]
DeGiorgio "Consistency and inconsistency of consensus methods for inferring species trees from gene trees in the presence of ancestral population structure" Theor Popul Biol (2016) 10.1016/j.tpb.2016.02.002
[29]
Gene tree discordance, phylogenetic inference and the multispecies coalescent

James H. Degnan, Noah A. Rosenberg

Trends in Ecology & Evolution 2009 10.1016/j.tree.2009.01.009
[30]
de Villemereuil "Bayesian models for comparative analysis integrating phylogenetic uncertainty" BMC Evol Biol (2012) 10.1186/1471-2148-12-102
[31]
Diaz-Uriarte "Testing hypotheses of correlated evolution using phylogenetically independent contrasts: sensitivity to deviations from Brownian motion" Syst Biol (1996) 10.1093/sysbio/45.1.27
[32]
Diaz-Uriarte "Effects of branch length errors on the performance of phylogenetically independent contrasts" Syst Biol (1998) 10.1080/106351598260653
[33]
Dimayacyac "Evaluating the performance of widely used phylogenetic models for gene expression evolution" Genome Biol Evol (2023) 10.1093/gbe/evad211
[34]
Doña "Host body size, not host population size, predicts genome-wide effective population size of parasites" Evol Lett (2023) 10.1093/evlett/qrad026
[35]
The UCSC Genome Browser database: extensions and updates 2011

T. R. Dreszer, D. Karolchik, A. S. Zweig et al.

Nucleic Acids Research 2012 10.1093/nar/gkr1055
[36]
Dunn "Pairwise comparisons across species are problematic when analyzing functional genomic data" Proc Natl Acad Sci U S A (2018) 10.1073/pnas.1707515115
[37]
Duret "Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate" Mol Biol Evol (2000) 10.1093/oxfordjournals.molbev.a026239
[38]
Edwards "Is a new and general theory of molecular systematics emerging?" Evolution (N Y) (2009)
[39]
Felenstein (2004)
[40]
Phylogenies and the Comparative Method

Joseph Felsenstein

The American Naturalist 1985 10.1086/284325
[41]
Felsenstein "Phylogenies and quantitative characters" Annu Rev Ecol Syst (1988) 10.1146/annurev.es.19.110188.002305
[42]
Fuentes-G "A Bayesian extension of phylogenetic generalized least squares: incorporating uncertainty in the comparative study of trait relationships and evolutionary rates" Evolution (2020) 10.1111/evo.13899
[43]
Gardner "Evolutionary sample size and consilience in phylogenetic comparative analysis" Syst Biol (2021) 10.1093/sysbio/syab017
[44]
Gittleman "On comparing comparative methods" Ann Rev Ecol Syst (1992) 10.1146/annurev.es.23.110192.002123
[45]
Grafen "The phylogenetic regression" (1989) 10.1098/rstb.1989.0106
[46]
Gu "Understanding tissue expression evolution: from expression phylogeny to phylogenetic network" Brief Bioinform (2016) 10.1093/bib/bbv041
[47]
Guerrero "Quantifying the risk of hemiplasy in phylogenetic inference" Proc Natl Acad Sci U S A (2018) 10.1073/pnas.1811268115
[48]
New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0

Stéphane Guindon, Jean-François Dufayard, Vincent Lefort et al.

Systematic Biology 2010 10.1093/sysbio/syq010
[49]
Irrational exuberance for resolved species trees

Matthew W. Hahn, Luay Nakhleh

Evolution 2016 10.1111/evo.12832
[50]
Hansen "Stabilizing selection and the comparative analysis of adaptation" Evolution (N Y) (1997) 10.2307/2411186

Showing 50 of 123 references

Metrics
8
Citations
123
References
Details
Published
Feb 11, 2025
Vol/Issue
42(3)
License
View
Funding
National Science Foundation
Arkansas Economic Development Commission
Cite This Article
Richard Adams, Jenniffer Roa Lozano, Mataya Duncan, et al. (2025). A Tale of Too Many Trees: A Conundrum for Phylogenetic Regression. Molecular Biology and Evolution, 42(3). https://doi.org/10.1093/molbev/msaf032