NEW PHYLOGENETIC COMPARATIVE APPROACHES FOR STUDYING VARIATION IN
RATES OF CONTINUOUS TRAIT EVOLUTION

By

Bruce Stagg Martin

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

Plant Biology—Doctor of Philosophy
Ecology, Evolutionary Biology and Behavior—Dual Major

2024

ABSTRACT

Rates of phenotypic evolution vary tremendously across the tree of life, generating vast dis-

parities in phenotypic diversity across space, time, and taxa. Unfortunately, elucidating the factors

driving such “rate heterogeneity” remains challenging due to various methodological limitations.

In particular, most available methods for inferring variation in rates of continuous trait evolution

assume rates are either influenced by only a few factors (i.e., variables hypothesized to affect rates)

or change infrequently over the course of a clade’s history. However, rates of phenotypic evolution

are likely affected by a dynamic, tangled web of countless environmental, life history, and genetic

factors. By ignoring “residual” rate variation stemming from unobserved factors and assuming

relatively simple rate variation patterns, available methods for modeling continuous trait evolution

tend to underfit empirical data and mislead hypothesis testing by inflating support for complex

models assuming spurious factor-rate associations. Here, to address these challenges, I develop,

test, and apply new phylogenetic comparative methods capable of accurately inferring variation in

rates of continuous trait evolution and robustly testing for factor-rate associations.

In chapter 1, I develop a novel continuous trait evolution model whereby rates constantly

and incrementally change over time and across lineages, resulting in continuous, stochastic rate

variation across a clade with closely-related lineages more likely to exhibit similar rates. I im-

plement a Bayesian approach for fitting this model to empirical data in an R package, evorates

(https://github.com/bstaggmartin/evorates/), along with comprehensive tools for analyzing and vi-

sualizing model results. Through simulation, I demonstrate that this method yields accurate infer-

ences and can more reliably detect general decreases/increases in rates over time (i.e., “early/late

bursts” of trait evolution) than previous methods by accounting for residual rate variation around

overall time-dependent trends. Additionally, I use evorates to show that rates of body size evolu-

tion among whales and dolphins have generally declined over time yet exhibit substantial residual

variation, with oceanic dolphins and beaked whales exhibiting anomalously fast and slow rates,

respectively.

In chapter 2, I generalize stochastic character mapping or “simmapping”-based pipelines for

inferring relationships between rates of continuous trait evolution and discrete factors (e.g., habitat,

diet) to also accommodate continuous factors (e.g., temperature, generation time). Simmapping is

a popular method for imputing the uncertain evolutionary history of a trait (or factor) by sam-

pling probable histories along a phylogeny under a given trait evolution model. However, available

simmapping implementations only work with discrete variables. Accordingly, I develop a new

R package, contsimmap (https://github.com/bstaggmartin/contsimmap/), which implements both a

scalable algorithm for simmapping continuous variables and methods for inferring relationships

between simmapped continuous factors and continuous trait evolution dynamics. I go on to ver-

ify the accuracy and robustness of this new pipeline in estimating factor-rate relationships via an

extensive simulation study, even devising a pragmatic new approach to account for residual rate

variation, which was ultimately crucial for controlling the pipeline’s error rates. Lastly, I use the

pipeline to show that rates of leaf and flower evolution are heterogeneous yet unrelated to overall

size in a clade of eucalyptus trees ranging from ∼1 to nearly 100 meters in maximum height.

In chapter 3, I devise a new approach for inferring associations between discrete factors and

continuous trait evolution dynamics by jointly modeling the evolution of both discrete factors

and continuous traits under a unified process. A key advantage of this method is that it allows

the continuous trait data to directly influence the likelihood of different factor histories, enabling

inference unobserved discrete factors or “hidden states” potentially driving residual rate variation

in continuous trait evolution. I implement this method in an R package, sce (https://github.com/

bstaggmartin/sce/), and show that the method can effectively detect and quantify heterogeneity in

rates of continuous trait evolution driven by both observed and unobserved factors under a wide

variety of simulated evolutionary scenarios. Further, I demonstrate the empirical utility of the new

method by using it to rigorously show that tropical sage lineages exhibit elevated rates of flower

size evolution compared to temperate lineages.

This thesis is dedicated to my wife, Eleanore Jeanne Ritter, who always believed in me even when
I didn’t believe in myself, and my dog, Mulberry Ritter-Martin, who reminded me to stop and
sniff the flowers from time to time. You two are my best friends in the entire world, and I would
not have gotten to this point without your generous love and support throughout this crazy
journey.

iv

ACKNOWLEDGEMENTS

Science is, at its core, a cumulative and community-driven endeavor that thrives on open, sup-

portive communication and collaboration. As scientists, we all stand on the shoulders of countless

dreamers, thinkers, and innovators that came before us. Further, the legacy of our research only

reaches its full potential when embedded in the larger ecosystem of contemporary and histori-

cal work that motivates, contextualizes, and challenges our findings. Every individual has their

own way of interpreting and describing the world around them–it is only through the sharing and

reconciliation of all our unique perspectives that humanity achieves greater understanding of the

universe’s greatest mysteries. I digress–but, in any case, this dissertation would not have been pos-

sible without the greater scientific community, not to mention the direct support–both professional

and personal–of numerous mentors, colleagues, friends, and family.

I would first like to thank my advisor, Dr. Marjorie Weber, who gave me the freedom and

independence to chase my scientific passions along with the support to both turn my craziest ideas

into practical research projects and overcome the countless obstacles one faces in carrying out any

long-term research project. Marjorie, your unwavering emphasis on the importance of inclusivity,

creativity, and awe in scientific research continues to inspire me, and I will never be able to thank

you enough for all you’ve taught me over the course of my dissertation.

Next, I would like to thank the rest of my guidance committee–Dr. Gideon Bradburd, Dr. Jef-

frey Conner, Dr. Luke Harmon, and Dr. William Wetzel–who have all also provided invaluable

inspiration and support over the years. Gideon, I would not be where I am today if it weren’t

for your incredible mentorship in statistics, not to mention your wisdom in navigating the bound-

ary between empirical research and method development. Jeff, our conversations have repeatedly

challenged me to reconsider the “conventional macroevolutionary wisdom” I sometimes take for

granted, and I am undoubtedly a much more learned and critical evolutionary biologist for it. Luke,

your book was my initial ticket into the world of phylogenetic comparative methods development–

I greatly appreciate all the enthusiasm you’ve shown for my research over the years, and I look

forward to meeting up at Evolution and collaborating on research projects for many years to come.

v

Will, your excitement for science is positively infectious, and I’ve greatly appreciated our corre-

spondence over the years–teaching a statistics course with you was an absolute blast!

I am of course indebted to countless faculty and staff at Michigan State University beyond my

committee. In particular, I would like to thank Dr. Chad Niederhuth and Dr. Lars Brudvig for

providing me with both lab space and community after Marjorie took up her new position at the

University of Michigan. Additionally, the Michigan State University Herbarium was an invaluable

resource to me throughout graduate school, thanks in no small part to the incredible efforts of both

Matt Chansler and Dr. Alan Prather. I also want to thank Sara Kraeuter, whose devotion to helping

graduate students is nothing short of incredible, along with the rest of the Plant Biology front

office staff–Kelley Rose, Heather Stallone, Krystal Witt, and Trevor Simmons–who, along with

the tireless efforts of Dr. Andrea Case, have breathed new life into the Plant Biology community

following its pandemic slump.

Next, I want to thank Weber lab members both past and present, official and honorary. Thank

you to Vincent Pan, Caroline Edwards, Erika LaPlante, Michael Foisy, Carolyn Graham, Rosy

Glos, Sylvie Martin-Eberhardt, Abbey Soule, Dr. Ash Zemenick, Dr. Eric LoPresti, Dr. Andrew

Myers, among many others. There are too many of you and unfortunately not enough time and

space to write out all the things I could thank you for, but suffice to say I couldn’t have asked for

a more supportive and engaging lab community. Together, we form a collective force of scientific

passion, creativity, and brilliance to be reckoned with!

I am also grateful to the broader community of my graduate student and postdoctoral peers at

Michigan State University–special shout outs to Dr. Nate Catlin, Dr. Leslie Kollar, Riley Pizza,

Meaghan Clark, Olivia Fitch, Brooke Jeffery, Maya Wilson-Brown, Miles Roberts, Sophie Buysse,

Madison Plunkert, Andrew Bleich, Julia Brose–the list goes on and on. You all rock, and I’ll really

miss the found family dinners, Friday coffee hours, and DnD sessions. Thank you all for building

such an open, welcoming, and supportive community in which all could find a niche and thrive.

Outside of Michigan State University, I would not be where I am today without the support

of my undergraduate mentors based at Skidmore College.

In particular, I would like to thank

vi

Dr. Elaine Larsen, Denise McQuade, and Dr. Patti Steinberger for initializing my training in

laboratory technician and teaching skills; Dr. Abbey Drake, Dr. Monica Raveret Richter, and Dr.

Corey Freeman-Gallant for providing me with fascinating and invaluable undergraduate research

opportunities; and Dr. Lucy Oremland and Dr. Julie Douglas for sparking my interest in the more

mathematical aspects of biology.

There are of course many others located at a wide variety of other institutions that played key

roles in the development of my academic career. Thanks to Dr. Michael Brett-Surman, who took

me on as a collections management intern at the Smithsonian Natural History Museum when I was

but a mere high school student, thus sparking a passion for museums and research collections that

persists to this day. I am also grateful for the formative research experiences afforded to me under

through the Mountain Lake Biological Station Research Experience for Undergraduates program

under Dr. Chloe Lash and Charles Kwit, as well as through the School for Field Studies study

abroad program in Queensland, Australia under Dr. Catherine Pohlmann. By crystallizing my

interests in botany and entomology, these research experiences played vital roles in shaping my

future academic career.

I also really want to thank the various phylogenetic comparative developers and systematists

who have taken the time to talk to me throughout my graduate school career, offering valuable

insights and wisdom that helped me (and my research) get to where it is today. So thank you to

Dr. James Leebens-Mack, Dr. James Boyko, Dr. Josef Uyeda, Dr. Rosana Zenil-Ferguson, Dr.

Michael Landis, Dr. Jeremy Beaulieu, Dr. Daniel Caetano, Dr. Liam Revell, and Dr. Michael

Donoghue.

Additionally, I must thank my family for all their love and support. Thank you to my father,

Andy, for exposing me to the wonders of the natural world from a young age and always admitting

when he didn’t know the answers to my incessant questions. Dad, you taught me how to find

answers on my own and think for myself, nurturing my curiosity and contributing in no small part

to where I am today. Thank you to my mother, Karen, for always encouraging me to follow my

dreams while nonetheless emphasizing the importance of academics and hard work in achieving

vii

those dreams. Mom, balancing idealism with pragmatism is often a difficult task, and I’m grateful

to have your teachings and example to follow in this regard. I also want to thank my siblings,

Bobby and Clare, as well as my extended family and in-laws, who always showed excitement for

my career choice and projects even as my research interests grew more esoteric.

Lastly, but of course not least, I extend my deepest, sincerest gratitude to my wife, Eleanore,

and dog, Mulberry. Both of you provided me with more companionship and encouragement I

could have ever asked for throughout this process, both grounding me whenever my head “was in

the clouds” and building me back up in the moments I felt defeated. I certainly would not have

gotten anywhere close to this point without both of you in my life. Eleanore, you are the most

brilliant and generous person I know, and I will never be able to fully express how thankful I am

for all the support you gave me even as you arduously worked on your dissertation. As this chapter

of our lives draws to a close, I look forward to the future, excited to see what life has in store for

us.

viii

TABLE OF CONTENTS

CHAPTER 1

.

.

.

.

.

Introduction .

MODELING THE EVOLUTION OF RATES OF CONTINUOUS
TRAIT EVOLUTION . . . . . . . . . . . . . . . . . . . . . . . . . .

.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1
1.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
1.3 Results .
.
.
1.4 Discussion .
BIBLIOGRAPHY .
APPENDIX 1A
APPENDIX 1B

1
1
4
. 17
.
. 26
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
SUPPLEMENTAL TABLES AND FIGURES . . . . . . . . . . 40
APPROXIMATING GEOMETRIC BROWNIAN MOTION
TIME-AVERAGES . . . . . . . . . . . . . . . . . . . . . . . . 44
AVERAGE CHANGES IN TRAIT EVOLUTION RATES . . . 62
PRIOR SENSITIVITY STUDY . . . . . . . . . . . . . . . . . . 71

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.

APPENDIX 1C
APPENDIX 1D

.
.
.

.
.
.

CHAPTER 2

.

.

.

.

.

Introduction .

STOCHASTIC CHARACTER MAPPING OF CONTINUOUS
TRAITS ON PHYLOGENIES . . . . . . . . . . . . . . . . . . . . . . . 89
2.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
2.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
. 109
2.3 Results
.
.
.
2.4 Discussion .
. 117
. .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
BIBLIOGRAPHY .
.
SUPPLEMENTAL TABLES AND FIGURES . . . . . . . . . . 134
APPENDIX 2A
STOCHASTIC APPROXIMATION OF LIKELIHOOD
APPENDIX 2B
FUNCTION GRADIENTS . . . . . . . . . . . . . . . . . . .
GENERATING CONTSIMMAPS UNDER EVORATES
MODELS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.

APPENDIX 2C

.
.
. .
.
.

. 142

CHAPTER 3

.

.

.

Introduction .

A NEW APPROACH FOR INFERRING STATE-DEPENDENT
VARIATION IN CONTINUOUS TRAIT EVOLUTION
DYNAMICS .
.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 158
3.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
3.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
. 177
3.3 Results
.
.
.
3.4 Discussion .
. 186
. .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
BIBLIOGRAPHY .
.
SUPPLEMENTAL TABLES AND FIGURES . . . . . . . . . . 200
APPENDIX 3A
PRUNING ALGORITHM DETAILS . . . . . . . . . . . . . . . 211
APPENDIX 3B

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.

.
.
. .
.
.

.

ix

CHAPTER 1

MODELING THE EVOLUTION OF RATES OF CONTINUOUS TRAIT EVOLUTION

This chapter has been published in Systematic Biology:

Martin B.S.1, Bradburd G.S.2, Harmon L.J.3, and Weber M.G.2 2023. Modeling the evolution of

rates of continuous trait evolution. Syst Biol 72:590–605.

1Department of Plant Biology, Michigan State University, East Lansing, MI, USA

2Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA

3Department of Biological Sciences, University of Idaho, Moscow, ID, USA

1.1

Introduction

The rates at which traits evolve is markedly heterogeneous across the tree of life, as evidenced

by the uneven distribution of phenotypic diversity across space, time, and taxa (e.g., Simpson,

1944; Reaney et al., 2020; Brusatte et al., 2012; Chartier et al., 2021). While understanding the

drivers of such patterns can provide critical insights into macroevolutionary processes, general con-

sensus on what factors are most important in accelerating and decelerating trait evolution remain

elusive (Chira et al., 2018). There is a vast, interconnected web of factors hypothesized to affect

trait evolution rates, typically divided into extrinsic and intrinsic components. Extrinsic factors

relate to the environment of an evolving lineage, commonly including aspects of biogeography

like climate or habitat (e.g., Clavel and Morlon, 2017; Mihalitsis and Bellwood, 2019), as well as

interactions with other species (e.g., Slater, 2015; Borstein et al., 2019; Drury et al., 2021). Intrin-

sic factors instead involve properties of the evolving lineage itself, including life history attributes

such as behavior or developmental traits (e.g., Muñoz and Bodensteiner, 2019; Fabre et al., 2020)

and genetic features like trait heritability and effective population size (e.g., Arnold et al., 2008;

Villar et al., 2014). The effects of all these variables are interrelated and depend on the particular

traits being studied, further complicating matters (Cooper and Purvis, 2009; Muñoz et al., 2018;

see also Donoghue and Sanderson, 2015).

Unfortunately, the evolutionary histories of many factors hypothesized to affect trait evolution

rates are largely unobserved. Thus, methods testing for associations between rates and variables

1

of interest must first estimate the history of the explanatory variables themselves (but see Hansen

et al., 2022). This limits researchers to considering only a few, relatively simple hypotheses (Rev-

ell, 2013; Caetano and Harmon, 2019), causing trait evolution models to often underfit observed

data (Pennell et al., 2015; Chira and Thomas, 2016; Chira et al., 2018). This underfitting gener-

ally oversimplifies inferred rate variation patterns and artificially increases statistical support for

complex models which may imply spurious links between trait evolution rates and explanatory

variables (May and Moore, 2020; see also Rabosky and Goldberg, 2015; Beaulieu and O’Meara,

2016). Thus, these “hypothesis-driven” approaches to modeling trait evolution should be integrated

with “data-driven” approaches that agnostically model variation in trait evolution rates based on

observed trait data alone. Such approaches can account for rate variation unrelated to some focal

hypothesis, or even be used to generate novel hypotheses regarding what factors may have driven

inferred rate variation patterns (Uyeda et al., 2018; May and Moore, 2020; see also Beaulieu and

O’Meara, 2016).

Several data-driven methods for inferring trait evolution rates are already available and widely

used (Eastman et al., 2011; Thomas and Freckleton, 2012; Rabosky et al., 2014; Pagel et al.,

2022), but such methods generally work by splitting phylogenetic trees into subtrees and assigning

a unique rate to each subtree (sometimes termed “macroevolutionary regimes”). These models

implicitly assume trait evolution rates stay constant over long periods of time with sudden shifts in

particular lineages. This mode of rate variation would be expected if rates are primarily influenced

by only a few, discretely varying factors of large effect. However, this assumption could be prob-

lematic given the sheer number of factors hypothesized to affect trait evolution rates, as well as the

fact that many of these factors vary continuously (Cooper and Purvis, 2009). If rates are instead

affected by many factors, mostly with subtle effects, we would expect trait evolution rates to con-

stantly shift in small increments over time within a given lineage, resulting in gradually changing

rates over time and phylogenies. In other words, rates themselves would “evolve” and be similar,

but not identical, among closely-related lineages (i.e., phylogenetic autocorrelation; see Sakamoto

and Venditti, 2018). By assuming that rates change infrequently, current data-driven methods

2

likely oversimplify rate variation patterns, collapsing heterogeneous evolutionary processes into

homogeneous regimes (but see May and Moore, 2020; Fisher et al., 2021). To this end, Revell

(2021) recently developed a data-driven method that models trait evolution as gradually changing,

but this method is limited in requiring a priori specification of how much trait evolution rates vary

across the phylogeny. Further, the method offers no way to rigorously test whether lineages exhibit

different rates (Revell, 2021).

Notably, some hypothesis-driven methods model trait evolution rates as gradually changing

over time. However, such models most commonly assume that rates only follow a simple trend

of exponential decrease or increase over time (Blomberg et al., 2003; but see Clavel and Morlon,

2017; Slater et al., 2017). In this context, declining trait evolution rates, or “early bursts” (EB),

are often invoked as signatures of adaptive radiation (Harmon et al., 2010), while increasing trait

evolution rates, or “late bursts” (LB), are sometimes linked to processes like character displacement

(Weber et al., 2016; Skeels and Cardillo, 2019). Unfortunately, current methods lack statistical

power to detect decreasing trends in rates when just a few lineages deviate from an overall EB

pattern (Slater and Pennell, 2014). Essentially, by assuming a perfect correspondence between

time and rates across all lineages, inference under these methods is misled by subclades exhibiting

anomalously low or high trait evolution rates. New methods that explicitly model such “residual”

rate variation may more accurately detect general trends in trait evolution rates by accounting for

these anomalous lineages/subclades.

Here we develop a new, data-driven method that models trait evolution rates as gradually chang-

ing over time, ultimately resulting in stochastic, continuously distributed rates that are more similar

among closely-related lineages. We take advantage of recent developments in Bayesian inference

and develop new strategies for efficiently estimating autocorrelated rates on phylogenetic trees

while dealing with uncertain trait values, resulting in relatively fast, reliable inference. We call this

method (and its corresponding software implementation) “evolving rates” or evorates for short.

Evorates is both flexible and intuitive, allowing researchers to infer both how and where rates vary

on a phylogeny. Through simulation, we demonstrate that evorates recovers accurate parameter

3

estimates on ultrametric phylogenies spanning a range of sizes and that it is more sensitive and

robust in detecting trends in trait evolution rates than conventional EB/LB models. We also use

evorates to model body size evolution among extant whales and dolphins (order cetacea) and find

evidence for declining rates of body size evolution and moderate rate heterogeneity in this clade,

unifying and expanding on previous results (Slater et al., 2010; Slater and Pennell, 2014; Sander

et al., 2021).

1.2 Materials and Methods

Evorates uses comparative data on a univariate continuous trait to infer how trait evolution

rates change over time as well as which lineages in a phylogeny exhibit anomalous rates. Here,

comparative data refers to a fixed, rooted phylogeny with branch lengths proportional to time and

trait values associated with its tips. We generally caution against using evorates with univariate

ordinations of multivariate trait data such as principal component scores because ordination can

bias rate inference from comparative data (Uyeda et al., 2015). Evorates is designed to work with

raw trait measurements; both missing data and multiple trait values per tip are allowed (i.e., tips

with 0 and > 1 observations, respectively). In the case of averaged trait measurements, estimated

mean trait values and standard error can be used to specify normal priors on trait values at particular

tips. The current implementation also allows for assigning raw trait measurements and priors to

internal nodes as well, perhaps reflecting fossil data and/or strong prior beliefs, though we do

not test this feature here. Conditional on these trait data, evorates uses Bayesian inference to

estimate two key parameters governing the process of rate change: rate variance, controlling how

quickly rates diverge among independently evolving lineages, and a trend, determining whether

rates tend to decrease or increase over time. When rate variance is 0, rates do not accumulate

random variation over time and are constant across contemporaneous lineages. In this case, trait

evolution follows the same exact process as expected under a conventional EB/LB model, with

negative trends corresponding to EBs, no trend to Brownian motion (BM), and positive trends to

LBs. The method also infers branchwise rates, which are estimates of average trait evolution rates

along each branch in the phylogeny, indicating which lineages exhibit unusually low or high rates.

4

1.2.1 The Model

At its core, evorates works by extending a typical Brownian motion (BM) model of univariate

trait evolution to include stochastic, incremental changes in trait evolution rates, σ 2. Specifi-

cally, σ 2 follows a process approximating geometric BM (GBM) with a constant rate, meaning

that ln(σ 2) follows a homogeneous BM-like process. GBM is a natural process to describe “rate

evolution” because it ensures rates stay positive and implies rates vary on a multiplicative, as op-

posed to additive, scale (Limpert et al., 2001; Gingerich, 2009). To render inference under this

model tractable, we treat it as a hierarchical model with a trait evolution process dependent on

the unknown–but estimable–branchwise rates, which are themselves dependent on a rate evolu-

tion process controlled by the estimated rate variance and trend parameters. The overall posterior

probability of the model can be summarized as:

P(σ 2, θ |x, ψ) ∝ P(x|ψ, σ 2)P(σ 2|ψ, θ )P(θ )

(1)

Where ψ is a phylogeny with e branches and n tips, σ 2 is an e-length vector of branchwise

rates, x is an n-length vector of trait values for each tip, and θ is a vector of parameters governing

the rate evolution process. Cases with missing data and multiple trait values per tips are covered in

a later section. In our notation, time is 0 at the root of the phylogeny and increases towards the tips.

P(x|ψ, σ 2) is the likelihood of x given the trait evolution process, P(σ 2|ψ, θ ) is the probability of

branchwise rates given the rate evolution process, and P(θ ) is the prior probability of the rate

evolution process parameters. We explicitly estimate and condition likelihood calculations on

branchwise rates (a type of “data augmentation”; see May and Moore, 2020) because the likelihood

of the trait data while marginalizing over branchwise rates (i.e., P(x|ψ, θ )) does not follow a known

probability distribution and would require complex, numerical approximations to compute. On the

other hand, P(x|ψ, σ 2) follows a straight-forward multivariate normal density:

x ∼ MVN(α,C)

(2)

5

where α is a vector of the trait value at the root of the phylogeny repeated n times and C is an

n × n matrix. The entries of C are given by:

Ci, j = ∑

k∈anc(i, j)

σ 2

ktk

(3)

where t is an e-length vector of branch lengths, i and j are indices denoting specific tips, k is

an index denoting a particular branch, and anc(i, j) is a function that returns all ancestral branches

shared by i and j. Note that when branchwise rates are constant across the tree, Ci, j is proportional

to the elapsed time between the root of the phylogeny and the most recent common ancestor of i

and j. Branchwise rates can be thought of as “squashing” and “stretching” the branch lengths of a

phylogeny, such that certain lineages have evolved for effectively shorter or longer amounts time,

respectively.

Unfortunately, there is no general solution for calculating P(σ 2|ψ, θ ) under a true GBM

process (Lepage et al., 2007), so we instead use a multivariate log-normal approximation (e.g.,

Dufresne, 2004; Welch and Waxman, 2008) of the distribution of branchwise rates and calculate

probabilities under this approximation. Briefly, this approximation decomposes branchwise rates

into their expected values, β , determined solely by the trend parameter, and a “noise” component,

γ, sampled from a multivariate normal distribution controlled by the rate variance parameter:

ln(σ 2) ≈ β + γ

(4)

Here, the noise component is approximate because it follows the distribution of geometric,

rather than arithmetic, averages of trait evolution rates along each branch assuming there is no

trend (i.e., ln(σ 2) rather than ln(σ 2); see APPENDIX 1B for further details). The entries of β are

given by:

β = ln(σ 2

0 ) +



0


ln(|exp[µσ 2τ2] − exp[µσ 2τ1]|) − ln(|µσ 2|) − ln(t)

if µσ 2 = 0

if µσ 2 ̸= 0

(5)

6

where ln(σ 2

0 ) is the estimated rate at the root of the phylogeny, µσ 2 is the trend parameter, t is

an e-length vector of branch lengths, and τ1 and τ2 are e-length vectors of the start and end times

of each branch in the phylogeny (Blomberg et al., 2003). The entries of γ are given by:

γ ∼ MVN(0, σ 2

σ 2D)

(6)

where 0 is a vector of 0s repeated e times, σ 2

σ 2 is the rate variance parameter, and D is an e × e

matrix. The entries of D are given by:

Di, j = ∑

k∈anc(i, j)

tk −






2ti/3

ti/2

t j/2

0

if i = j

if i ∈ anc( j, j)

if j ∈ anc(i, i)

if i ̸= j, i ̸∈ anc( j, j), j ̸∈ anc(i, i)

(7)

where i, j, and k are all indices denoting branches and anc(i, j) is a function that returns all

ancestral branches shared by i and j (Devreese et al., 2010; see APPENDIX 1B for further details).

Overall, this approximation closely matches the distribution of branchwise rates obtained via fine-

grained simulations of GBM on phylogenies under plausible parameter values and is negligibly

different from other computationally efficient approximations (e.g., Thorne et al., 1998; Lartillot

and Poujol, 2011; Revell, 2021; Figs. 1B.1–1B.14; Tables 1B.1–1B.3). We prefer this approxi-

mation because it is convenient to work with and directly focuses on estimating branchwise rates

rather than rates at the nodes of the phylogeny, which is what other strategies focus on.

Under this approximation, the final expression for the posterior probability is:

P(ln(σ 2), α, ln(σ 2

0 ), σ 2

σ 2, µσ 2|x, ψ) ∝

exp[− 1

2 (x − α)′C−1(x − α)]

exp[− 1

(cid:112)(2π)n|C|
2 (ln(σ 2) − β )′(σ 2
σσ 2

(cid:112)(2π)e|D|

σ 2D)−1(ln(σ 2) − β )]

(8)

P(α, ln(σ 2

0 ), σ 2

σ 2, µσ 2)

7

1.2.2 Model Implementation

Evorates estimates the posterior distribution of parameters given a phylogeny and associated

trait data via Hamiltonian Monte Carlo (HMC) using the probablistic programming language Stan,

interfaced through R (Carpenter et al., 2017; Stan Development Team, 2019, 2020). Unlike con-

ventional Markov Chain Monte Carlo algorithms like Metropolis-Hastings samplers, HMC uses

derivatives and physics simulations to efficiently explore posterior distributions, which is particu-

larly helpful for complex, high-dimensional posteriors (see Neal, 2011 and Hoffman and Gelman,

2014 for further information). To optimize sampling efficiency and avoid numerical issues, evo-

rates estimates branchwise rates with an uncentered parameterization (Betancourt and Girolami,

2019) and marginalizes over unobserved trait values at the root and tips of the tree (Freckleton,

2012; Hassler et al., 2022). Under an uncentered parameterization, the HMC algorithm does not

directly estimate branchwise rates, but instead estimates the distribution of e independent standard

normal random variables, z, which are transformed to follow the distribution of branchwise rates:

ln(σ 2) = σσ 2Lz + β

(9)

where L is lower triangular Cholesky factorization of D (i.e., D = LL′; see Eq. (7)). This

parameterization is particularly efficient because it avoids having to repeatedly manipulate D to

calculate P(ln(σ 2)|ψ, ln(σ 2

0 ), σ 2

σ 2, µσ 2).

Evorates also uses Felsenstein’s pruning algorithm for quantitative traits to marginalize over

the trait value at the root of the phylogeny and avoid repeatedly inverting C when calculating

P(x|ln(σ 2) (Felsenstein, 1973; Freckleton, 2012; Caetano and Harmon, 2019). To simplify the

pruning algorithm implementation, any multifurcations in the phylogeny are converted to series

of bifurcations by adding additional “pseudo-branches” of length 0. This procedure does not alter

the resulting likelihood calculations (Felsenstein, 2008), and our implementation does not estimate

branchwise rates along pseudo-branches because these rates do not affect the likelihood of the

observed trait data.

8

1.2.3 Accommodating Missing Data and Multiple Observations

Incorporating uncertainty in observed trait values in comparative studies is especially impor-

tant for methods that model trait evolution rate variation because measurement error can inflate

estimates of evolutionary rates, particularly in young clades (Felsenstein, 2008). To prevent such

biases, evorates generally treats the mean trait values at the tips, x, as unknown parameters. We

marginalize over x given raw trait measurements, y (potentially including 0 or > 1 observations for

some tips), and “tip error” variances for each tip, σ 2

y . While we use the term “raw” trait measure-

ment for clarity, the data provided for certain tips could instead be the mean of a normal prior on

the trait value. Entries of σ 2

y for such tips may be fixed to an associated variance for the prior. All

other entries of σ 2

y are treated as unfixed, free parameters. To render the model more tractable, we

assume tip error variance is constant across all tips with unfixed variance.

To marginalize over the mean trait values at the tips, we modify the initialization of Felsen-

stein’s pruning algorithm (Felsenstein, 1973). Prior to pruning, we assign each tip the expectation

and variance of its mean trait value given its raw trait measurements. We then calculate each

tip’s partial likelihood from contrasts between its associated raw trait measurements given its error

variance, σ 2

y,i. Assuming the raw trait measurements are independently sampled from a normal

distribution with variance σ 2

y,i, the mean trait value’s expectation is simply the mean of the raw

trait measurements, yi, and its variance is given by σ 2

y,i/mi, where mi is the number of raw trait

measurements (Felsenstein, 2008). Note that if there are no trait measurements for a particular

tip (i.e., mi = 0), the expectation of that tip’s true trait value is undefined with infinite variance

(Hassler et al., 2022).

Because there are no contrasts for tips with one or fewer raw trait measurements, the partial

likelihood associated with these tips is 1. Otherwise, we can derive a general formula for the partial

likelihood by considering each tip as a small subtree and applying Felsenstein’s pruning algorithm.

Specifically, each tip is treated as a star phylogeny consisting of mi “sub-tips” of length σ 2

y,i, with

trait values yi (Felsenstein, 1973, 2008):

9

P(yi|σ 2

y,i) =

mi−1
∏
k=1

σy,i

√
k
(cid:112)2π(k + 1)

(cid:34)

exp

−

k
2(k + 1)

(cid:18)yi,k+1 − yi,1:k
σy,i

(cid:19)2(cid:35)

(10)

where i denotes a particular tip, yi is a vector of mi raw trait measurements for tip i, σ 2

y,i is the

tip error variance for tip i, and yi,1:k is the mean of measurements 1 through k in the vector yi.

After initializing all tips in the phylogeny, Felsenstein’s pruning algorithm can be applied nor-

mally, iterating over the internal nodes from the tips towards the root (e.g., Felsenstein, 1973,

Freckleton, 2012, Caetano and Harmon, 2019). The presence of missing data, however, will cause

some calculations to involve nodes with undefined expected trait values and infinite variance. Note

that these “data-deficient” nodes do not contribute information to the expectation and variance of

the trait value at their ancestral nodes. Thus, if both nodes descending from some focal node are

data deficient, the focal node will also be data deficient, with undefined expectation and infinite

variance. Otherwise, if only one descendant node is data deficient, the expectation and variance

of the trait value at the focal node is solely determined by the descendant node that is not data

deficient. Let the descendant, non-data deficient node have expected trait value and variance ˆxi

, respectively, and be connected to the focal node by a branch of length ti with branchwise

and σ 2
ˆxi
i . The focal node’s expected trait value and variance will be ˆxi and σ 2
rate σ 2
ˆxi

+ σ 2

i ti, respectively.

Whether one or both descendant nodes are data deficient, there is no contrast associated with the

focal node and the corresponding partial likelihood is 1.

In the case of univariate traits, tips with missing data have no effect on the likelihood of trait

data or parameter inference. However, by including missing data one can estimate posterior distri-

butions of the unobserved trait values at these tips (Goolsby, 2017; Hassler et al., 2022). Evorates

already includes functionality for sampling from the posterior distribution of trait values at all

nodes and tips in a phylogeny given a fitted model. The inclusion of additional branches could

theoretically affect the inferred rate evolution process because our GBM approximation improves

along shorter branches. However, inference using evorates is robust to whether rate evolution is

simulated under our GBM approximation or a true GBM process (Figs. 1B.10 and 1B.14; Tables

1B.1–1B.3), suggesting such effects are too minor to have practical consequences.

10

1.2.4 Priors

Despite their popularity, flat and uninformative priors tend to result in fat-tailed posteriors that

explore unrealistic regions of parameter space, and Bayesian statisticians have increasingly advo-

cated for the use of at least weakly informative priors in recent years (Lemoine, 2019). We follow

this advice, choosing default priors for evorates that modestly regularize parameter estimates, pro-

moting conservative inferences (i.e., little rate heterogeneity) while still allowing for a wide range

of evolutionary dynamics. We also conducted a prior sensitivity study to document the impact of

priors on inference using evorates (Figs. 1D.1–1D.7; Tables 1D.1–1D.12). Overall, evorates is

fairly robust to alternate prior specifications, provided that priors are not overly informative, and

the the default priors appear adequate under a variety of conditions.

By default, a normal prior with mean 0 and standard deviation 10/T is placed on the trend

parameter (µσ 2), while a Half-Cauchy prior with scale 5/T is placed on rate variance (σ 2

σ 2), where
T is the height of the phylogeny. These priors are quite liberal: a trend of 10/T corresponds to a

e10 ∼20,000-fold change in trait evolution rates over the timespan of a phylogeny, and data sim-

ulated with a rate variance of 5/T on random trees with 50 tips or more (generated using the R

package ape version 5.6-2; Paradis and Schliep, 2019) typically yield branchwise rates spanning

two to four orders of magnitude. Of course, researchers may increase or decrease the standard devi-

ation/scale of these priors if a phylogeny spans an especially long or short timescale, respectively.

To penalize tip error variance (σ 2

trait data, a half-Cauchy prior with scale σ 2

y ) estimates that are large relative to the scale of the observed
raw/2 is placed on tip error variance, where σ 2

raw is the

variance of the trait data.

It is somewhat more challenging to pick a default prior for the rate at the root (σ 2

0 ) because this

parameter depends on both the timescale of the phylogeny and scale of the observed trait data. By

default, a log-normal prior with location ln(σ 2

raw/T ) and scale 10 is placed on the root rate. This

prior is designed to regularize root rate estimation by roughly centering on trait evolution rates

that could give rise to the observed trait data with little rate heterogeneity. Notably, decreasing

and increasing trends will generally shift the location of this default prior downward and upwards,

11

respectively, relative to the true root rate. While more complex schemes for choosing a root rate

prior (perhaps based on phylogenetic independent contrasts) could help mitigate this issue, we

wanted to keep default prior settings as simple and transparent as possible. As a rule of thumb,

the scale of the root rate prior should be roughly equal to the maximum plausible change in trait

evolution rates over the timespan of a phylogeny. The default scale of 10, corresponding to a

e10 ∼20,000-fold change in rates, is quite liberal and should suffice for most purposes. In any

case, we encourage researchers to alter the root rate prior to reflect biologically plausible trait

evolution rates when such information is available.

1.2.5 Hypothesis Testing

We agree with other macroevolutionary biologists advocating for greater focus on interpreting

parameter estimates and effect sizes inferred by comparative models (e.g., Beaulieu and O’Meara,

2016). Nonetheless, assessing statistical support for particular hypotheses remains important for

biologically interpreting fitted models – particularly complex models with many parameters. In the

context of evorates, we focus on two main hypotheses: 1) that significant rate heterogeneity, inde-

pendent of any trend, occurred over the history of a clade (σ 2

σ 2 > 0), and 2) rates generally declined
or increased over time (i.e., µσ 2 ̸= 0). Both hypotheses could be tested by fitting additional models

with constrained rate variance and/or trend parameters and comparing among unconstrained and

constrained models using Bayes factors. However, Bayes factor estimation requires additional,

time-consuming computation. Thus, we developed alternative approaches that only require the

posterior samples of a fitted, unconstrained model.

We use the posterior probability that µσ 2 > 0 to test for overall trends in rates. If the posterior

probability is 0.025 or less, we can conclude that there is substantial evidence that rates declined

over time, and vice versa if the posterior probability is 0.975 or above. This corresponds to a

two-tailed test with a critical value of 0.05. For rate variance, we instead use Savage-Dickey (SD)

ratios because rate variance is bounded at 0 and the posterior probability that σ 2

σ 2 > 0 will always
be 1. SD ratios are ratios of the posterior to prior probability density at a particular parameter value

corresponding to a null hypothesis. If this ratio is sufficiently less than 1, the data have “pulled”

12

prior probability mass away from the null hypothesis, suggesting that the null hypothesis is likely

incorrect.

In general, a ratio of 1/3 or less is considered substantial evidence against the null

hypothesis (Kass and Raftery, 1995). We use log spline density estimation implemented in the R

package logspline (version 2.1.16) to estimate the posterior probability density at σ 2

σ 2 = 0 (Stone

et al., 1997; Wagenmakers et al., 2010).

Researchers may also wish to identify lineages evolving at anomalous rates. The most straight-

forward method to do so is to calculate the posterior probability that branchwise rates are greater

than some “background rate”, analogous to the approach for trends. In this paper, we define the

background trait evolution rate as the geometric mean of branchwise rates, weighted by their rela-

tive branch lengths. Rates are generally distributed with long right tails (Gingerich, 2009), partic-

ularly under our model whereby rate evolution follows a GBM-like process. Geometric means are

less sensitive than arithmetic means to extremely high, outlier rates associated with these long tails,

and are thus better-suited for rate comparisons. In the presence of a strong trend, only the oldest

and youngest lineages will generally exhibit anomalous rates, rendering anomalous rate detection

redundant with trend estimation. Thus, we define a helpful branchwise rate transformation, called

“detrending”, which further facilitates interpretation of evorates results. Specifically, branchwise

rates are detrended prior to calculating background rates and posterior probabilities by subtracting

β from branchwise rates on the natural log scale (see Eq. (5)). These detrended rates yield a new set

of transformed parameters, branchwise rate deviations, ln(σ 2

dev), defined as the difference between

detrended branchwise rates and the background detrended rate on the natural log scale. When the

posterior probability ln(σ 2

dev) > 0 for a given branch is less than 0.025 or greater than 0.975, we

can conclude that trait evolution is anomalously slow or fast along that branch, respectively, given

the overall trend in rates through time. While we focus on comparing detrended branchwise and

background rates based on geometric means in the current paper, we note that evorates can also

compare untransformed branchwise and background rates based on either geometric or arithmetic

means per user specifications.

Additionally, users may also calculate background trait evolution rates for subsets of branches

13

in a phylogeny, such that rates for specific lineages and/or subclades can be estimated and com-

pared. Some caution, however, is warranted in first identifying lineages exhibiting anomalous rates

and then testing for significant differences among them, as this could increase the risk of spuriously

detecting rate differences. This potential issue is not unique to evorates and applies to any data-

driven phylogenetic comparative method designed to identify shifts in evolutionary processes. In

practice, we recommend users mainly focus on interpreting comparisons between branchwise rates

and the overall background rate, calculating background rates for branch subsets only to effectively

summarize and communicate model results. Of course, it is also perfectly reasonable to compare

rates among specific lineages and/or subclades when these comparisons are planned prior to model

fitting and/or have biological justification (e.g., comparing background rates among lineages that

vary in some factor hypothesized to affect trait evolution rates).

Notably, relationships among Bayes factors, posterior probabilities, and frequentist p-values

are not necessarily straight-forward and depend on sample size, priors, and posterior distribu-

tion shape, among other factors (Held and Ott, 2018; Wagenmakers et al., 2022). The hypothesis

testing procedures we propose and test here are essentially useful heuristics developed to guide

researchers in interpreting models fit through evorates, and these heuristics are not formally equiv-

alent to conventional significance testing under a frequentist framework. Nonetheless, we use

terms like “hypothesis testing”, “null hypothesis”, and “significance” in describing and analyzing

the performance of these heuristics for ease of communication.

1.2.6 Simulation Study

To test the performance and accuracy of evorates, we applied it to continuous trait data sim-

ulated under the model of inference. We simulated data under all combinations of no, low, and

high rate variance (σ 2

σ 2 = 0, 3, 6) and decreasing, constant, and increasing trends (µσ 2 = −4, 0, 4),
for a total of 9 trait evolution scenarios. We picked these values to simulate data that appeared

empirically plausible and represented a range of different trait evolution dynamics. Note that when

rate variance is 0, the resulting simulations evolve under EB, BM, or LB models of trait evolution

depending on the trend parameter. We simulated traits evolving along ultrametric, pure-birth phy-

14

logenies with 50, 100, and 200 tips generated using the R package phytools (version 1.0-1; Revell,

2012) to assess the effect of increasing sample size on model performance. While evorates can be

applied to non-ultrametric trees, we focus on ultrametric trees here to render the simulation study

more manageable. We simulated 10 phylogenies and associated trait data for each trait evolution

scenario and phylogeny size for a total of 270 simulations. In all cases, phylogenies were rescaled

to a total height of 1, ensuring the effect of parameters remained consistent across replicates. All

simulations were simulated with a trait and log rate value of 0 at the root. Because we focused on

the estimation of branchwise rate, rate variance, and trend parameters, we simulated trait data with

only 1 observation per tip and no tip error.

To quantitatively assess the simulation study results, we calculated the median absolute error

(MAE), breadth, and coverage of marginal posterior distributions for rate variance and trend pa-

rameters. Here, MAE is the median absolute difference between posterior samples and their corre-

sponding true, simulated value, such that larger MAEs are associated with less accurate posteriors.

We prefer median to mean absolute error because the former metric is less influenced by posterior

precision and more directly reflects variation in posterior accuracy. Breadth refers to the width of

the 95% equal-tailed interval (i.e., a type of credible interval that spans from the 2.5% to 97.5%

posterior quantiles, hereafter simply termed credible intervals) and measures posterior precision,

with smaller breadths corresponding to more precise (though not necessarily accurate) posteriors.

Lastly, coverage is a binary metric equal to one when the true value falls within the 95% credible

interval and zero otherwise. For branchwise rate parameters, we averaged the MAEs, breadths,

and coverage of all branchwise rate marginal posterior distributions (on the natural log scale) for

each model fit. Additionally, we calculated the statistical power and false positive error rate (i.e.,

type I error rate, hereafter error rate) of evorates for detecting significant rate variance and de-

creasing/increasing trends. Due to the continuous nature of branchwise rates, we assessed power

and error rates for detecting anomalous branchwise rates by calculating the proportion of times

a branch is detected as exhibiting anomalously slow or rapid trait evolution rates across different

values of true branchwise rate deviations.

15

1.2.7 Empirical Example

We applied evorates to model body size evolution in extant cetaceans using a recently estimated

timetree of both fossil and extant cetaceans (Lloyd and Slater, 2021), pruned to consist of 88 extant

species (we excluded one extant species, Balaenoptera brydei, due to its uncertain taxonomic

status; see Constantine et al., 2018), and associated trait data on log-transformed maximum female

body lengths for each species. Most body length data was compiled in a previous comparative

study, but we supplemented these data with published measurements for an additional 15 species

(Table 1A.1). We chose this example because previous research detected notable signatures of

declining body size evolution rates over time in this clade, despite conventional model selection

failing to yield support for an EB model of trait evolution. This puzzling result seems primarily

due to a few recently-evolved lineages exhibiting unusually rapid shifts in body size (Slater et al.,

2010; Slater and Pennell, 2014; see also Sander et al., 2021). While previous work used a mix

of simulation and outlier detection techniques to arrive at this conclusion, we predicted that our

method would identify these patterns in a more cohesive modeling framework.

1.2.8 HMC configuration and diagnostics

When fitting models to simulated and the empirical data, we ran 4 HMC chains consisting of

3,000 iterations. After discarding the first 1,500 iterations as warmup and checking for conver-

gence, chains were combined for a total of 6,000 HMC samples for each simulation. We repeated

this procedure while constraining the rate variance parameter to 0 to see if our method could detect

trends in trait evolution rates with more power than conventional EB/LB models. We set tip error

for the simulation study to 0 a priori because we do not focus on inference of this parameter here,

though we did allow the method to estimate tip error in the empirical example. For each model

fit, chains mixed well (greatest ˆR ≈ 1.013) and achieved effective sample sizes of at least 3,000

for every parameter. Divergent transitions, a feature of HMC which can be indicative of sam-

pling problems, were relatively rare, with only six simulation model fits exhibiting 1-3 divergent

transitions. Overall, diagnostic tests suggested all HMC chains converged and sampled posterior

distributions thoroughly.

16

1.3 Results

1.3.1 Performance of Method

Overall, the method exhibited accurate inference and appropriate coverage for all parameters,

though posterior breadth was often quite large, especially for trees with 50 tips ( 1.1–1.3, Fig. 1.1).

Posterior accuracy and precision were highly dependent on trait evolution scenario and tree size.

In general, higher values of trends and rate variance were associated with larger posterior MAEs

and breadth for their respective parameters, such that increasing trends and high rate variance are

estimated with the least accuracy and precision. In some cases, higher trends seemed to increase

the MAEs and breadth of rate variance posteriors and vice versa, but this pattern was weak overall.

On the other hand, larger tree sizes resulted in smaller posterior MES and breadth, such that trees

with 200 tips yielded the most accurate, precise posteriors. Coverage for trend and rate variance

parameters across all trait evolution scenarios and tree sizes remained consistent at around the

theoretical expectation of 95%.

Figure 1.1 Relationship between simulated and estimated rate variance (σ 2
σ 2) and trend (µσ 2) pa-
rameters. Each point is the posterior median from a single fit, while the violins are combined
posterior distributions from all fits for a given trait evolution scenario. Vertical lines represent the
50% (thicker lines) and 95% equal-tailed intervals (thinner lines) of these combined posteriors,
while horizontal lines represent positions of true, simulated values.

Both the statistical power and error rates of our method were appropriate for detecting trends

17

0510152025Estimated rate variance (ss22)-10-5051015-404Estimated trend (ms2)Number of species50100200Simulated trend (ms2)-404Simulated rate variance (ss22)036-404Table 1.1 Median absolute errors of rate variance, trend, and branchwise rate posteriors (i.e., me-
dian absolute difference between posterior samples and their true, simulated values, a measure
of posterior distribution accuracy), averaged across replicates for each simulated trait evolution
scenario and tree size. σ 2
σ 2 and µσ 2 indicate the true, simulated values of rate variance and trend
parameters, respectively.

σ 2
σ 2 =

rate variance
6
3
0

trend
3

0

6

branchwise rates
6
3

0

50 species

µσ 2 = -4
0
4

0.66
0.57
0.99

1.96
2.48
1.75

2.55
3.69
3.00

1.36 1.29
1.49
2.06

1.83
2.09 2.45
2.79 2.91

0.47
0.48
0.60

0.81
0.86
0.87

1.00
1.06
1.01

100 species

µσ 2 = -4
0
4

0.30
0.37
0.34

1.01
1.62
1.56

2.03
2.37
1.87

0.77 1.08
1.12
1.89

1.31
1.20 1.59
1.63 1.54

0.31
0.37
0.44

0.73
0.76
0.83

0.90
0.89
0.90

200 species

µσ 2 = -4
0
4

0.13
0.11
0.18

1.27
0.75
0.82

1.50
1.44
1.69

0.77 0.95
0.92
1.00

1.25
1.13 0.95
1.13 1.35

0.24
0.23
0.27

0.66
0.71
0.72

0.80
0.85
0.84

and significant rate variance. In general, power increased with larger trees, while error rates re-

mained consistent. The ability of SD ratios to identify significant rate variance was particularly

impressive, erroneously detecting rate variance only once while exhibiting high power (Fig. 1.2).

Decreasing trends were notably easier to detect than increasing trends, particularly on small trees

(Fig. 1.3). Trend error rates consistently remained below ∼5%, and decreasing trends were never

mistaken for increasing trends and vice versa. Higher rate variance seemed to only slightly de-

crease the power to detect trends. Constraining rate variance to 0 resulted in either worse power

or higher error rates for detecting trends, depending on whether trends were decreasing or in-

creasing. As rate variance increased, the power of constrained models to detect decreasing trends

dramatically diminished. On the other hand, constrained models detected increasing trends with

greater power, at the cost of greatly inflated error rates. Overall, estimating rate variance allows for

more sensitive detection of declining trait evolution rates while better safe-guarding against false

18

Table 1.2 Breadths of rate variance, trend, and branchwise rate posteriors (i.e., the difference be-
tween the 97.5% and 2.5% quantiles of posterior samples, a measure of posterior distribution
precision), averaged across replicates for each simulated trait evolution scenario and tree size. σ 2
σ 2
and µσ 2 indicate the true, simulated values of rate variance and trend parameters, respectively.

σ 2
σ 2 =

rate variance
3

6

0

trend
3

branchwise rates
6
3

0

6

0

50 species

µσ 2 = -4
0
4

3.85
3.65
4.52

9.07
10.07
8.66

15.05
14.82
14.05

5.03
5.92
10.73

6.08
8.26
10.75

2.33
6.71
2.29
8.28
10.75 3.01

3.17
3.41
3.49

3.76
3.90
3.85

µσ 2 = -4
0
4

1.56
1.91
1.69

µσ 2 = -4
0
4

0.69
0.62
0.79

5.60
6.45
6.47

4.13
4.23
3.89

8.53
9.01
8.39

6.43
6.21
6.14

100 species

3.27
4.31
7.61

4.65
5.27
8.42

200 species

2.80
3.39
4.50

3.59
3.99
5.21

4.84
6.01
7.39

4.01
4.06
5.65

1.66
1.87
2.06

2.92
3.10
3.32

3.35
3.42
3.60

1.23
1.18
1.39

2.51
2.72
2.83

3.06
3.23
3.22

detection of increasing rates.

Figure 1.2 Power and error rates for the rate variance parameter (σ 2
σ 2). Lines depict changes in the
proportion of model fits that correctly showed evidence for rate variance significantly greater than
0 (i.e., power, in black) and incorrectly showed evidence (i.e., error, in light red) as a function of
tree size.

Branchwise rate estimation also generally displayed appropriate coverage, accuracy, and statis-

tical testing properties (Tables 1.1–1.3, Fig. 1.4). However, branchwise rate estimates were notice-

ably biased towards their overall mean (i.e., shrinkage). Linear regressions of median branchwise

19

Proportion of fits withrate variance (ss22 > 0)01Decreasing (ms2 = -4)50100200Simulated trendNone (ms2 = 0)Number of species50100200Increasing (ms2 = 4)Power  Error50100200Table 1.3 Coverage of rate variance, trend, and branchwise rate posteriors (i.e., proportion of times
the true, simulated value is greater than the 2.5% posterior distribution quantile and less than the
97.5% quantile) for each simulated trait evolution scenario and tree size. σ 2
σ 2 and µσ 2 indicate the
true, simulated values of rate variance and trend parameters, respectively.

σ 2
σ 2 =

rate variance
6
3
0

trend
3

0

6

branchwise rates
6
3

0

50 species

µσ 2 = -4 — 0.90 1.00
0.90
0 — 0.90
0.90
4 — 1.00

0.80
1.00
1.00

1.00
0.90
0.90

1.00
0.80
0.90

0.98 0.95 0.92
0.96 0.92
0.99
0.96 0.92
0.99

100 species

µσ 2 = -4 — 1.00 0.90
1.00
0 — 0.80
1.00
4 — 0.90

1.00
1.00
0.90

1.00
1.00
1.00

1.00
1.00
1.00

1.00 0.97 0.92
0.96 0.95
0.99
0.95 0.96
0.97

200 species

µσ 2 = -4 — 0.90 1.00
1.00
0 — 1.00
0.90
4 — 1.00

1.00
0.90
1.00

1.00
0.90
1.00

0.90
1.00
0.90

1.00 0.94 0.94
0.95 0.94
0.99
0.96 0.95
1.00

Figure 1.3 Power and error rates for the trend parameter (µσ 2). Lines depict changes in the pro-
portion of model fits that correctly showed evidence for trends significantly less and greater than
0 (i.e., power, in black) and incorrectly showed evidence (i.e., error, in light red) as a function of
tree size. Results are shown for both models allowed to freely estimate rate variance (σ 2
σ 2) (i.e.,
unconstrained models, solid lines) and models with rate variance constrained to 0 (i.e., constrained
models, dashed lines). The latter models are identical to conventional early/late burst models.

20

Proportion of fits withdecreasing trend (ms2 < 0)01None (ss22 = 0)Proportion of fits withincreasing trend (ms2 > 0)0150100200Simulated rate varianceModerate (ss22 = 3)Number of species50100200Power  ErrorUnconstrained  ConstrainedHigh (ss22 = 6)50100200rate estimates on simulated branchwise rates yield an average slope of about 0.8 (Fig. 1.5). A sim-

ilar pattern holds for linear regression of branchwise rate deviations (Fig. 1A.1). Branchwise rate

posteriors for simulations with no rate variance exhibited especially high accuracy, precision, and

coverage (notably above the theoretical expectation of 95%), perhaps due to the increased preci-

sion of rate variance posteriors under such trait evolution scenarios. In contrast to other parameters,

increasing tree size only slightly decreased posterior MAEs and breadth for branchwise rates. Af-

ter accounting for variation in simulated branchwise rate deviations, trait evolution scenario and

tree size had little effect on statistical power and error rates for detecting anomalous branchwise

rates. Averaging across all fits to simulations with significant rate variance detected, error rates

for detecting anomalous rates remained negligible, peaking at around 0.5% for branchwise rate

deviations around 0. In fact, this peak only increased to about 5% when we set the significant

posterior probability thresholds to 10% and 90% (Fig. 1A.2). The method was somewhat more

sensitive to positive than negative deviations, correctly and consistently detecting anomalous rates

with deviations more extreme than -4 (1/50th of background rate) or 3 (20 times background rate).

Figure 1.4 Power and error rates for branchwise rate parameters (ln σ 2). Lines depict changes in
proportions of branchwise rates considered anomalously slow (in dark blue) or fast (in light red)
as a function of simulated rate deviations (ln σ 2
dev). These results combine all fits to simulated
data that detected rate variance (σ 2
σ 2) significantly greater than 0. The proportions are equivalent
to power when the detected rate deviation is of the same sign as the true, simulated deviation (left
of 0 for anomalously slow rates in dark blue and right for anomalously fast rates in light red), and
to error rate when the detected and true rate deviations are of opposite signs. Here, significant rate
deviations for simulated rate deviations that are exactly 0 are considered errors regardless of sign.

21

01Proportion of significantrate deviations (ln sdev2¹0)sdev2<0   sdev2>0Simulated rate deviation ( ln sdev2)-8-6-4-202468Figure 1.5 Relationship between simulated and estimated branchwise rate parameters (ln σ 2). For
each simulation and posterior sample, branchwise rates were first centered by subtracting their
mean. We estimated centered branchwise rates by taking the median of the centered posterior
samples. The solid line represents the position of the true centered branchwise rates, while the
shallower, dashed line represents the observed line of best fit for these data.

1.3.2 Empirical Example

Overall, our model suggests that rates of body size evolution among extant cetaceans have

generally slowed down over time, with considerable divergence in rates of body size evolution

among key subclades (Fig. 1.6). We found marginally significant support for a decreasing trend in

rates over time, with rates declining by about 7% every million years (95% credible interval (CI):

0 - 15% decrease, posterior probability (PP) of increasing trend: 2.5%). We also infer a moderate

rate variance of about 0.06 per million years (CI: 0.01 - 0.22, SD ratio: 0.14). Combining these

two results, changes in body size evolution rates over a million year time interval are expected to

range from a 50% decrease to 60% increase for any particular lineage (Fig. 1.7).

We also identify a few regions of the cetacean phylogeny where rates of body size evolution

seem to be especially low or high. After detrending, rates of body size evolution in the beaked

whale genus Mesoplodon are about 34% slower than the background rate (CI: 13 - 77%, PP of

positive rate deviation: < 1%). On the other end of the spectrum, some oceanic dolphin lineages

exhibit unusually rapid body size evolution rates. In particular, pilot whales and allies (subfamily

globicephalinae) and the orca (Orcinus orca) lineage exhibit body size evolution rates about 160%

(CI: 10 - 900%, PP: 99%) and 200% (CI: 20 - 1300, PP: 99%) higher than the background rate,

respectively. In fact, oceanic dolphins as a whole exhibit a marginally significant increase in body

size evolution rates, even after excluding the pilot whale subfamily and orca lineage (CI: 90 - 300%

22

-8-6-4-202468Estimated branchwise rate(centered ln s2)Truth  Best FitSimulated branchwise rate (centered ln s2)-8-6-4-202468Figure 1.6 Phylogram of model results for cetacean body size data. Branch colors represent median
posterior estimates of branchwise rates (ln σ 2) of body size evolution, with slower and faster rates
in dark blue and light red, respectively. The thinner, inset colors represent the posterior probability
that a branchwise rate is anomalously fast according to its rate deviation (ln σ 2
dev), with lower and
higher posterior probabilities in light and dark gray, respectively.

23

0.0020.0070.0180.05 Branchwise rate (s2)< 0.0250.025 - 0.10.9 - 0.975> 0.975 Posterior probability of positive rate deviation (sdev2>0)MesoplodonGlobicephalinaeBalaenoptera musculusOrcinus orcaZiphiidaeBalaenopteridaeBalaenidaeNeobalaenidaePhyseteridae + KogiidaePlatanistidaeIniiidae + Pontoporiidae + LipotidaeMonodontidaePhocoenidaeDelphinidaebackground rate, PP: 95%). Similarly, the blue whale (Balanoptera musculus) lineage also exhibits

a marginally significant increase in body size evolution rate, about 140% (CI: -10 - 1000%, PP:

95%) higher than the background rate.

Under the model with rate variance constrained to 0, rates of body size evolution decrease

by only about 4% every million years (95% CI: -1 - 10% decrease, PP of increasing trend:

7.3%). While only a slight difference, the trend parameter estimated under the full model yields a

marginally significant, two-tailed “p-value” of ∼5%, while the constrained model yields a decid-

edly insignificant “p-value” of ∼15% . This is reflected in a conventional sample-size corrected

Akaike Information Criterion (AICc) comparison between simple BM and EB models of trait evo-

lution fitted via maximum likelihood using the R package geiger (version 2.0.7; Pennell et al.,

2014). In this case, a simple BM model receives nearly twice the AICc weight of an EB model

(65% vs. 35%).

24

Figure 1.7 The posterior probability distribution of fold-changes in cetacean body size evolution
rates (σ 2) per 1 million years. This distribution is given by exp[µσ 2 + σσ 2X], where X is a random
variable drawn from a standard normal distribution. The gray filled-in portion represent the 95%
equal-tailed interval, while the vertical line represents the starting rate of 1.

25

0.00.51.01.50.00.51.01.52.02.53.03.54.04.55.0Posterior probability densityFold-change in rate (s2) per million years1.4 Discussion

Here we implemented a novel data-driven method, evorates, for modeling stochastic, incre-

mental variation in trait evolution rates. Part of the power of evorates is its ability to infer trait

evolution rate variation independent of an a priori hypothesis on what factors influence rates. This

allows for detailed, hypothesis-free exploration of trait evolution rate variation across time and

taxa. Researchers may use such results to generate and refine hypotheses regarding what factors

have influenced trait evolution rates across the tree of life (e.g., Uyeda et al., 2018). Overall,

evorates performs well on simulated data, recovering accurate parameter estimates and exhibit-

ing appropriate statistical power and error rates for hypothesis testing. Further, the method shows

great promise for empirical macroevolutionary research, offering novel insights into the dynamics

of cetacean body size evolution – a notably well-studied system (e.g., Slater et al., 2010, Pyenson

and Sponberg, 2011 Montgomery et al., 2013, Slater and Pennell, 2014, Slater et al., 2017, Sander

et al., 2021). The results of our study also build on previous work in demonstrating that estimat-

ing time-independent rate heterogeneity is critical for accurately quantifying temporal dynamics

in trait evolution rates (Slater and Pennell, 2014). This finding has consequences for how EBs/LBs

of trait evolution are practically identified and conceptually defined.

The simulation study results showcases evorate’s ability to recover accurate parameter esti-

mates across a range of tree sizes. Despite the high uncertainty of rate variance estimates under

some trait evolution scenarios, rate heterogeneity could still be correctly detected about 90% of

the time with an error rate substantially lower than 5%. Indeed, our hypothesis testing procedures

seem conservative in general, exhibiting relatively low error rates. While it could be beneficial to

relax significance thresholds for SD ratios and/or posterior probabilities for increased statistical

power, our hypothesis testing procedures seem sufficiently powered and we thus do not explore al-

ternative thresholds in great detail here (but see Fig. 1A.2). In any case, compared to conventional

EB/LB models, evorates can detect decreasing trends in trait evolution rates with greater sensitivity

and detect increasing trends with greater robustness. Notably, traits evolving with exponentially

increasing rates on an ultrametric phylogeny (i.e., an LB model) exhibit the same probability distri-

26

bution expected under a single-peak Ornstein-Uhlenbeck (OU) model, where traits evolve towards

some optimum at a constant rate (Blomberg et al., 2003). Therefore, the frequently observed sup-

port for single-peak OU models from ultrametric comparative data (e.g., Harmon et al., 2010; see

also Cooper et al., 2016; Landis and Schraiber, 2017) may partially result from autocorrelated rate

heterogeneity, which inflates support for LB/OU models based on our simulation study. Despite

their mathematical similarities, LB, OU, and our new models have distinct biological interpreta-

tions regarding the importance of rate heterogeneity and selective forces in shaping the patterns of

trait diversity within clades.

Interestingly, closer inspection of our simulation study results suggest that, in the presence of

rate heterogeneity, models with rate variance constrained to 0 (i.e., conventional EB/LB models)

estimate trend parameters corresponding to changes in average trait evolution rates over time. On

the other hand, unconstrained evorates models estimate trend parameters corresponding to changes

in median trait evolution rates over time, essentially determining whether most lineages in a clade

exhibit rate decreases or increases (Figs. 1C.3–1C.5; Tables 1C.1–1C.3). Counterintuitively, when

the trend parameter is only weakly negative relative to rate variance (−σ 2

σ 2/2 < µσ 2 < 0), it is
possible for a majority of lineages within a clade to exhibit declining trait evolution rates (i.e., an

EB according to evorates) while rates averaged across the entire clade increase over time (i.e., an

LB according to conventional methods). This occurs because rates evolve in a right-skewed man-

ner under our model–in other words, a few anomalous lineages/subclades tend to evolve extremely

high trait evolution rates in spite of declining rates among most other lineages, driving up a clade’s

overall average rate (Fig. 1C.1 and 1C.2). We note that evorates still returns estimates of average

changes in trait evolution rates per unit time via a simple parameter transformation (µσ 2 + σ 2

σ 2/2).
We choose to focus on the majority-based definition of EBs/LBs since, by accounting for anoma-

lous lineages/subclades exhibiting unusual rates, this definition better matches many macroevolu-

tionary biologists’ intuitive definition of EBs (Lloyd et al., 2012; Slater and Pennell, 2014; Benson

et al., 2014; Hopkins and Smith, 2015; Wright, 2017; Puttick, 2018).

Our empirical example with cetacean body size directly demonstrates the practical importance

27

of these nuances in defining EB/LB dynamics. We find substantial evidence that body size evo-

lution has slowed down in most cetacean lineages, despite the presence of “outlier” lineages ex-

hibiting relatively rapid rates. Indeed, we find little evidence for a decline in body size evolution

rates averaged across the clade (95% credible interval: 12% decrease - 5% increase in average rate

per million years, posterior probability of increasing average rate: 16%). This broadly agrees with

previous research, but evorates is able to offer novel insights and contextualize prior results by ex-

plicitly estimating branchwise rates in addition to overall trends (Slater and Pennell, 2014; Sander

et al., 2021). For example, Slater and Pennell (2014) identified the orca and pilot whale lineages

as outlier lineages exhibiting especially rapid rates of body size evolution. Our method recapitu-

lates these findings while suggesting oceanic dolphins as a whole represent a relatively recent burst

of body size evolution that has largely masked signals of an earlier burst towards the base of the

clade. Such findings more generally agree with recent suggestions that bursts of trait evolution

may be common but not limited to the base of “major” clades. This is likely due, in part, to major

clades being arbitrarily designated based on taxonomic rank (Puttick, 2018). Alternatively, some

propose that EBs may be hierarchical, with major clades exhibiting repeated bouts of rapid trait

diversification as competing, closely-related lineages partition niche space more finely over time

(Slater and Friscia, 2019). Ultimately, we are optimistic that evorates may be better able to resolve

how frequently bursts of trait evolution–early or not–occur across the tree of life compared to more

conventional methods.

The shrinkage of branchwise rates, whereby rate estimates are biased towards their overall

mean, is presumably due to the assumption that rates are autocorrelated under our model. Because

of this, rate estimates are partially informed by the rates in closely-related lineages, particularly

when closely-related lineages are better sampled (i.e., more related to taxa with sampled trait val-

ues and/or consisting of many short branch lengths). This “diffusion” of rates across the phylogeny

appears to cause under- and overestimation of unusually high and low rates, respectively. Fortu-

nately, this renders evorates conservative in terms of identifying anomalous trait evolution rates,

safeguarding against erroneous conclusions. In general, we view this behavior as a good com-

28

promise between model flexibility and robustness, allowing evorates to infer rate variation while

avoiding ascribing significance to noise in data. We note that rate variance estimates under our

model are largely unbiased, such that branchwise rates in a typical posterior sample should be as

variable as the true rates. Thus, taking the joint distribution of branchwise rates into account by an-

alyzing distributions of differences between rates, rather than just assessing marginal distributions

of rates, appears important in accurately interpreting results under our model. In any case, despite

this shrinkage phenomenon, the statistical power to identify overall rate heterogeneity and anoma-

lous rates with evorates appears comparable to that of previous data-driven methods (Eastman

et al., 2011).

Evorates is one of several recently developed methods that also estimate unique trait evolution

rates for each branch in a phylogeny but assume an alternative mode of rate change (May and

Moore, 2020; Fisher et al., 2021). These other methods assume that branchwise rates are inde-

pendently distributed according to a log-normal distribution. The method we develop here differs

from these “independent rate” (IR) models in assuming that rates evolve gradually and are thus

phylogenetically autocorrelated (see also Revell, 2021). Theoretically, trait evolution rates should

exhibit some degree of phylogenetic autocorrelation given that many factors hypothesized to af-

fect trait evolution rates themselves exhibit phylogenetic autocorrelation. Indeed, a recent study

found evidence for autocorrelation of trait evolution rates in a few vertebrate clades (Sakamoto

and Venditti, 2018), and autocorrelation has also been found in lineage diversification (Savolaine

et al., 2002; Caron and Pie, 2020) and molecular substitution rates (Lepage et al., 2006; Tao et al.,

2019). Notably, there is also no known rate evolution process that would produce independent,

log-normally distributed branchwise rates (Lepage et al., 2006, 2007). However, IR models could

outperform “autocorrelated rate” (AR) models in some instances due to their tremendous flexibil-

ity in modeling how rates vary over time and phylogenies. In general, we expect that IR models

will perform best in cases with many traits and/or non-ultrametric trees, where the flexibility of the

model can be tempered by rich information content in the data. More work testing for rate auto-

correlation or lack thereof in continuous trait data is needed as methods for inferring trait evolution

29

rate variation become more complex.

Revell (2021) independently developed a method, multirateBM, based on a model similar to the

one we introduce here, though evorates offers several key advantages. In particular, the maximum

likelihood (ML) implementation of multirateBM renders it impossible to estimate rate variance.

To do so, one would need to analytically marginalize over uncertainty in branchwise rates. Here,

we circumvent this issue by using Bayesian inference to numerically integrate over uncertainty in

branchwise rates. This is analogous to how ML implementations of mixed effect models analyti-

cally marginalize over uncertainty in random effects, while Bayesian implementations of the same

models sample random effects (Browne and Draper, 2006). Indeed, ML implementations of mixed

effect models that treat random effects as parameters would be unable to estimate random effect

variances due to the very same reasons multirateBM cannot estimate rate variance. Additionally,

our model has the added advantage of accommodating both trends in rates and uncertainty in tip

trait values. Lastly, we implement procedures to test the significance of rate heterogeneity, trends,

and anomalous trait evolution rates. While multirateBM offers a quick and convenient means for

comparative data exploration, our new method allows for more rigorous quantification and analysis

of rate evolutionary processes and patterns from comparative data.

There are a number of ways the evorates might be improved or expanded. Assuming that trait

evolution rates for different traits are correlated with one another, using data on multiple traits

could improve inference of both the rate evolution process and branchwise rate parameters (May

and Moore, 2020). Another promising future direction is integration of evorates with hypothesis-

driven methods. This could be done post hoc by applying phylogenetic linear regression to “tip

rates” estimated under the model (e.g., Rabosky and Huang, 2016) or analyzing distributions of

branchwise rates associated with ancestral states estimated via stochastic character maps (Revell,

2013; but see May and Moore, 2020). Alternatively, one could explicitly model rates as the product

of both a stochastic rate evolution process and a deterministic function of some factor of interest.

We have already taken steps towards this model extension in our current implementation by al-

lowing rates to change as a deterministic function of time. Lastly, despite our focus on gradually

30

changing rates, trait evolution rates might also exhibit sudden shifts of large magnitude (“jumps”)

or short-lived fluctuations (“pulses”) in response to factors with particularly strong influence on

rates. It would be ideal – but difficult – to model rates as evolving gradually, while potentially

undergoing sudden jumps or pulses (e.g., Lartillot et al., 2016). An alternative strategy is develop-

ing methods to compare the fit of a model like ours against more conventional data-driven models

whereby rates jump or even Lévy models whereby rates pulse (Landis et al., 2013). Assessing

when and whether comparative data can distinguish between different modes of rate change will

be important for future research on the dynamics of trait evolution.

1.4.1 Conclusion

Here, we introduced evorates, a method that models gradual change, rather than abrupt shifts,

in continuous trait evolution rates from comparative data. Unlike nearly all other comparative

methods for inferring rate variation, evorates goes beyond identifying lineages exhibiting anoma-

lous rates by also estimating the process by which rates themselves evolve. Although there are

many potential modes of rate variation over time and phylogenies, our model estimates rate evo-

lution processes as the product of two parameters: one controlling how quickly rates accumulate

random variation, and another determining whether rates tend to decrease or increase over time.

The resulting method returns accurate estimates of evolutionary processes and provides a flexible

and intuitive means of detecting and analyzing trait evolution rate variation. Looking forward,

evorates has tremendous potential for improvement and elaboration, and we are optimistic that

the future of macroevolutionary biology will benefit from increased focus not only on how traits

evolve, but how the rates of trait evolution themselves evolve over time and taxa.

31

BIBLIOGRAPHY

Arnold P.W. and Heinsohn G.E. 1996. Phylogenetic status of the Irrawaddy dolphin Orcaella bre-

virostris (Owen in Gray): A cladistic analysis. Mem Queensl Mus 39:143–204.

Arnold S.J., Bürger R., Hohenlohe P.A., Ajie B.C., and Jones A.G. 2008. Understanding the evo-

lution and stability of the G-matrix. Evolution 62:2451–2461.

Baker A.N. 1981. The southern right whale dolphin Lissodelphis peronii (Lacépède) in Aus-

tralasian waters. Natl Mus NZ Rec 2:17–34.

Barros N.B. 1991. Recent cetacean records for southeastern Brazil. Mar Mamm Sci 7:296–306.

Beaulieu J.M. and O’Meara B.C. 2016. Detecting hidden diversification shifts in models of trait-

dependent speciation and extinction. Syst Biol 65:583–601.

Benson R.B.J., Campione N.E., Carrano M.T., Mannion P.D., Sullivan C., Upchurch P., and Evans
D.C. 2014. Rates of dinosaur body mass evolution indicate 170 million years of sustained eco-
logical innovation on the avian stem lineage. PLoS Biol 12:e1001853.

Betancourt M. and Girolami M. 2019. Hamiltonian Monte Carlo for hierarchical models. Pages 79–
97 in Current Trends in Bayesian Methodology with Applications (S. K. Upadhyay, U. Singh,
D. K. Dey, and A. Loganathan, eds.). Chapman and Hall/CRC Press, Boca Raton, FL.

Beygelzimer A., Kakadet S., Langford J., Arya S., Mount D., and Li S. 2022. FNN: Fast nearest

neighbor search algorithms and applications. R package version 1.1.3.1.

Blomberg S.P., Garland T. Jr, and Ives A.R. 2003. Testing for phylogenetic signal in comparative

data: Behavioral traits are more labile. Evolution 57:717–745.

Borstein S.R., Fordyce J.A., O’Meara B.C., Wainwright P.C., and McGee M.D. 2019. Reef fish

functional traits evolve fastest at trophic extremes. Nat Ecol Evol 3:191–199.

Browne W.J. and Draper D. 2006. A comparison of Bayesian and likelihood-based methods for

fitting multilevel models. Bayesian Anal. 1:473–514.

Brusatte S.L., Butler R.J., Prieto-Márquez A., and Norell M.A. 2012. Dinosaur morphological

diversity and the end-Cretaceous extinction. Nat Commun 3:804.

Caetano D.S. and Harmon L.J. 2019. Estimating correlated rates of trait evolution with uncertainty.

Syst Biol 68:412–429.

Caron F.S. and Pie M.R. 2020. The phylogenetic signal of diversification rates. J Zoolog Syst Evol

Res 58:1432–1436.

32

Carpenter B., Gelman A., Hoffman M.D., Lee D., Goodrich B., Betancourt M., Brubaker M.,
Guo J., Li P., and Riddell A. 2017. Stan: A probabilistic programming language. J Stat Softw
76:1–32.

Charlton-Robb K., Gershwin L.A., Thompson R., Austin J., Owen K., and McKechnie S. 2011.
A new dolphin species, the Burrunan dolphin Tursiops australis sp. nov., endemic to southern
Australian coastal waters. PLoS One 6:e24047.

Chartier M., von Balthazar M., Sontag S., Löfstrand S., Palme T., Jabbour F., Sauquet H., and
Schönenberger J. 2021. Global patterns and a latitudinal gradient of flower disparity: Perspec-
tives from the angiosperm order Ericales. New Phytol 230:821–831.

Chira A.M., Cooney C.R., Bright J.A., Capp E.J.R., Hughes E.C., Moody C.J.A., Nouri L.O.,
Varley Z.K., and Thomas G.H. 2018. Correlates of rate heterogeneity in avian ecomorphological
traits. Ecol Lett 21:1505–1514.

Chira A.M. and Thomas G.H. 2016. The impact of rate heterogeneity on inference of phylogenetic

models of trait evolution. J Evol Biol 29:2502–2518.

Clavel J. and Morlon H. 2017. Accelerated body size evolution during cold climatic periods in the

Cenozoic. Proc Natl Acad Sci USA 114:4183–4188.

Constantine R., Iwata T., Nieukirk S.L., and Penry G.S. 2018. Future directions in research on

Bryde’s whales. Front Mar Sci 5:1–7.

Cooper N. and Purvis A. 2009. What factors shape rates of phenotypic evolution? A comparative

study of cranial morphology of four mammalian clades. J Evol Biol 22:1024–1035.

Cooper N., Thomas G.H., Venditti C., Meade A., and Freckleton R.P. 2016. A cautionary note on
the use of Ornstein Uhlenbeck models in macroevolutionary studies. Biol J Linn Soc 118:64–77.

Dalebout M.L., Mead J.G., Baker C.S., Baker A.N., and Helden A.L. 2002. A new species of
beaked whale Mesoplodon perrini sp. n. (Cetacea: Ziphiidae) discovered through phylogenetic
analyses of mitochondrial DNA sequences. Mar Mamm Sci 18:577–608.

Dalebout M.L., Scott Baker C., Steel D., Thompson K., Robertson K.M., Chivers S.J., Perrin W.F.,
Goonatilake M., Charles Anderson R., Mead J.G., Potter C.W., Thompson L., Jupiter D., and
Yamada T.K. 2014. Resurrection of Mesoplodon hotaula Deraniyagala 1963: A new species of
beaked whale in the tropical Indo-Pacific. Mar Mamm Sci 30:1081–1108.

Devreese J.P.A., Lemmens D., and Tempere J. 2010. Path integral approach to Asian options in the

Black-Scholes model. Physica A: Statistical Mechanics and its Applications 389:780–788.

Donoghue M.J. and Sanderson M.J. 2015. Confluence, synnovation, and depauperons in plant

diversification. New Phytol 207:260–274.

33

Drury J.P., Clavel J., Tobias J.A., Rolland J., Sheard C., and Morlon H. 2021. Tempo and mode of

morphological evolution are decoupled from latitude in birds. PLoS Biol 19:e3001270.

Dufresne D. 2004. The log-normal approximation in financial and other computations. Adv Appl

Probab 36:747–773.

Eastman J.M., Alfaro M.E., Joyce P., Hipp A.L., and Harmon L.J. 2011. A novel comparative
method for identifying shifts in the rate of character evolution on trees. Evolution 65:3578–
3589.

Fabre A.C., Bardua C., Bon M., Clavel J., Felice R.N., Streicher J.W., Bonnel J., Stanley E.L.,
Blackburn D.C., and Goswami A. 2020. Metamorphosis shapes cranial diversity and rate of
evolution in salamanders. Nat Ecol Evol 4:1129–1140.

Felsenstein J. 1973. Maximum-likelihood estimation of evolutionary trees from continuous char-

acters. Am J Hum Genet 25:471–492.

Felsenstein J. 2008. Comparative methods with sampling error and within-species variation: Con-

trasts revisited and revised. Am Nat 171:713–725.

Fisher A.A., Ji X., Zhang Z., Lemey P., and Suchard M.A. 2021. Relaxed random walks at scale.

Syst Biol 70:258–267.

Fortune S.M.E., Moore M.J., Perryman W.L., and Trites A.W. 2021. Body growth of North Atlantic

right whales (Eubalaena glacialis) revisited. Mar Mamm Sci 37:433–447.

Freckleton R.P. 2012. Fast likelihood calculations for comparative analyses. Methods Ecol Evol

3:940–947.

Gill M.S., Tung Ho L.S., Baele G., Lemey P., and Suchard M.A. 2017. A relaxed directional

random walk model for phylogenetic trait evolution. Syst Biol 66:299–319.

Gingerich P.D. 2009. Rates of evolution. Annu Rev Ecol Evol Syst 40:657–675.

Goolsby E.W. 2017. Rapid maximum likelihood ancestral state reconstruction of continuous char-

acters: A rerooting-free algorithm. Ecol Evol 7:2791–2797.

Hansen T.F., Bolstad G.H., and Tsuboi M. 2022. Analyzing disparity and rates of morphological

evolution with model-based phylogenetic comparative methods. Syst Biol 71:1054–1072.

Harmon L.J., Losos J.B., Jonathan Davies T., Gillespie R.G., Gittleman J.L., Bryan Jennings W.,
Kozak K.H., McPeek M.A., Moreno-Roark F., Near T.J., Purvis A., Ricklefs R.E., Schluter D.,
Schulte J.A. Ii, Seehausen O., Sidlauskas B.L., Torres-Carvajal O., Weir J.T., and Mooers A.Ø.
2010. Early bursts of body size and shape evolution are rare in comparative data. Evolution
64:2385–2396.

34

Hassler G., Tolkoff M.R., Allen W.L., Ho L.S.T., Lemey P., and Suchard M.A. 2022. Inferring
phenotypic trait evolution on large trees with many incomplete measurements. J Am Stat Assoc
117:678–692.

Held L. and Ott M. 2018. On p-values and Bayes factors. Annu Rev Stat Appl 5:393–419.

Hoffman M.D. and Gelman A. 2014. The No-U-Turn sampler: Adaptively setting path lengths in

Hamiltonian Monte Carlo. J Mach Learn Res 15:1593–1623.

Hopkins M.J. and Smith A.B. 2015. Dynamic evolutionary change in post-Paleozoic echinoids and
the importance of scale when interpreting changes in rates of evolution. Proc Natl Acad Sci USA
112:3758–3763.

Hunt G. 2006. Fitting and comparing models of phyletic evolution: Random walks and beyond.

Paleobiology 32:578–601.

Jefferson T.A. and Rosenbaum H.C. 2014. Taxonomic revision of the humpback dolphins (Sousa

spp.), and description of a new species from Australia. Mar Mamm Sci 30:1494–1541.

Kass R.E. and Raftery A.E. 1995. Bayes factors. J Am Stat Assoc 90:773–795.

Konishi K., Tamura T., Zenitani R., Bando T., Kato H., and Walløe L. 2008. Decline in energy
storage in the Antarctic minke whale (Balaenoptera bonaerensis) in the Southern Ocean. Polar
Biol 31:1509–1520.

Landis M.J. and Schraiber J.G. 2017. Pulsed evolution shaped modern vertebrate body sizes. Proc

Natl Acad Sci USA 114:13224–13229.

Landis M.J., Schraiber J.G., and Liang M. 2013. Phylogenetic analysis using Lévy processes:

finding jumps in the evolution of continuous traits. Syst Biol 62:193–204.

Lartillot N., Phillips M.J., and Ronquist F. 2016. A mixed relaxed clock model. Philos Trans R Soc

B 371:20150132.

Lartillot N. and Poujol R. 2011. A phylogenetic model for investigating correlated evolution of

substitution rates and continuous phenotypic characters. Mol Biol Evol 28:729–744.

Lemoine N.P. 2019. Moving beyond noninformative priors: why and how to choose weakly infor-

mative priors in Bayesian analyses. Oikos 128:912–928.

Lepage T., Bryant D., Philippe H., and Lartillot N. 2007. A general comparison of relaxed molec-

ular clock models. Mol Biol Evol 24:2669–2680.

Lepage T., Lawi S., Tupper P., and Bryant D. 2006. Continuous and tractable models for the

variation of evolutionary rates. Math Biosci 199:216–233.

35

Limpert E., Stahel W.A., and Abbt M. 2001. Log-normal distributions across the sciences: Keys

and clues. Bioscience 51:341–352.

Lloyd G.T. and Slater G.J. 2021. A total-group phylogenetic metatree for Cetacea and the impor-

tance of fossil data in diversification analyses. Syst Biol 70:922–939.

Lloyd G.T., Wang S.C., and Brusatte S.L. 2012. Identifying heterogeneity in rates of morpholog-
ical evolution: discrete character change in the evolution of lungfish (Sarcopterygii; Dipnoi).
Evolution 66:330–348.

Lodi L., Sicilian S., and Capistran L. 1990. Mass stranding of Peponocephala electra (Cetacea,
Globicephalinae) on Piracange Beach, Baria, northeastern Brazil. Sci Rep Cetacean Res 1:79–
84.

May M.R. and Moore B.R. 2020. A Bayesian approach for inferring the impact of a discrete char-
acter on rates of continuous-character evolution in the presence of background-rate variation.
Syst Biol 69:530–544.

Mead J.G., Walker W.A., and Houck W.J. 1982. Biological observations on Mesoplodon carlhubbsi

(Cetacea, Ziphiidae). Smithson Contrib Zool Pages 1–25.

Mihalitsis M. and Bellwood D.R. 2019. Morphological and functional diversity of piscivorous

fishes on coral reefs. Coral Reefs 38:945–954.

Molina D.M. and Oporto J.A. 1993. Comparative study of dentine staining techniques to estimate
age in the Chilean dolphin, Cephalorhynchus eutropia (Gray, 1846). Aquat Mamm 19:45–48.

Montgomery S.H., Geisler J.H., McGowen M.R., Fox C., Marino L., and Gatesy J. 2013. The

evolutionary history of cetacean brain and body size. Evolution 67:3339–3353.

Muñoz M.M. and Bodensteiner B.L. 2019. Janzen’s hypothesis meets the Bogert effect: Connect-
ing climate variation, thermoregulatory behavior, and rates of physiological evolution. Integr
Org Biol 1:oby002.

Muñoz M.M., Hu Y., Anderson P.S.L., and Patek S.N. 2018. Strong biomechanical relationships

bias the tempo and mode of morphological evolution. Elife 7:e37621.

Neal R. 2011. MCMC using Hamiltonian dynamics. Pages 113–162 in Handbook of Markov
Chain Monte Carlo (S. Brooks, A. Gelman, G. L. Jones, and M. Xiao-Li, eds.). Chapman and
Hall/CRC, Boca Raton, FL.

Pagel M., O’Donovan C., and Meade A. 2022. General statistical model shows that macroevolu-
tionary patterns and processes are consistent with Darwinian gradualism. Nat Commun 13:1113.

Paradis E. and Schliep K. 2019. ape 5.0: An environment for modern phylogenetics and evolution-

36

ary analyses in R. Bioinformatics 35:526–528.

Pennell M.W., Eastman J.M., Slater G.J., Brown J.W., Uyeda J.C., FitzJohn R.G., Alfaro M.E.,
and Harmon L.J. 2014. geiger v2.0: An expanded suite of methods for fitting macroevolutionary
models to phylogenetic trees. Bioinformatics 30:2216–2218.

Pennell M.W., FitzJohn R.G., Cornwell W.K., and Harmon L.J. 2015. Model adequacy and the

macroevolution of angiosperm functional traits. Am Nat 186:E33–50.

Plön S., Albrecht K.H., Cliff G., and Froneman P.W. 2012. Organ weights of three dolphin species
(Sousa chinensis, Tursiops aduncus and Delphinus capensis) from South Africa: Implications
for ecological adaptation? J Cetacean Res Manag 12:265–276.

Puttick M.N. 2018. Mixed evidence for early bursts of morphological evolution in extant clades. J

Evol Biol 31:502–515.

Pyenson N.D. and Sponberg S.N. 2011. Reconstructing body size in extinct crown Cetacea (Neo-
ceti) using allometry, phylogenetic methods and tests from the fossil record. J Mamm Evol
18:269.

Rabosky D.L., Donnellan S.C., Grundler M., and Lovette I.J. 2014. Analysis and visualization of
complex macroevolutionary dynamics: An example from Australian scincid lizards. Syst Biol
63:610–627.

Rabosky D.L. and Goldberg E.E. 2015. Model inadequacy and mistaken inferences of trait-

dependent speciation. Syst Biol 64:340–355.

Rabosky D.L. and Huang H. 2016. A robust semi-parametric test for detecting trait-dependent

diversification. Syst Biol 65:181–193.

Raj Pant S., Goswami A., and Finarelli J.A. 2014. Complex body size trends in the evolution of

sloths (Xenarthra: Pilosa). BMC Evol Biol 14:184.

Reaney A.M., Bouchenak-Khelladi Y., Tobias J.A., and Abzhanov A. 2020. Ecological and mor-
phological determinants of evolutionary diversification in Darwin’s finches and their relatives.
Ecol Evol 10:14020–14032.

Revell L.J. 2012. phytools: an R package for phylogenetic comparative biology (and other things).

Methods Ecol Evol 3:217–223.

Revell L.J. 2013. A comment on the use of stochastic character maps to estimate evolutionary rate

variation in a continuously valued trait. Syst Biol 62:339–345.

Revell L.J. 2021. A variable-rate quantitative trait evolution model using penalized-likelihood.

PeerJ 9:e11997.

37

Reyes J.C., Mead J.G., and Van Waerebeek K. 1991. A new species of beaked whale Mesoplodon

peruvianus sp. n. (Cetacea: Ziphiidae) from Peru. Mar Mamm Sci 7:1–24.

Safak A. and Safak M. 2002. Moments of the sum of correlated log-normal random variables.
Pages 140–144 in Proceedings of IEEE Vehicular Technology Conference (VTC) vol. 1 IEEE.

Sakamoto M. and Venditti C. 2018. Phylogenetic non-independence in rates of trait evolution. Biol

Lett 14:20180502.

Sander P.M., Griebeler E.M., Klein N., Juarbe J.V., Wintrich T., Revell L.J., and Schmitz L. 2021.
Early giant reveals faster evolution of large body size in ichthyosaurs than in cetaceans. Science
374:eabf5787.

Savolaine V., Heard S.B., Powell M.P., Jonathan Davies T., and Mooers A.Ø. 2002. Is cladogenesis

heritable? Sys Biol 51:835–843.

Simpson G.G. 1944. Tempo and Mode in Evolution. Columbia University Press, New York, NY.

Skeels A. and Cardillo M. 2019. Equilibrium and non-equilibrium phases in the radiation of Hakea

and the drivers of diversity in Mediterranean-type ecosystems. Evolution 73:1392–1410.

Slater G.J. 2015. Iterative adaptive radiations of fossil canids show no evidence for diversity-

dependent trait evolution. Proc Natl Acad Sci USA 112:4897–4902.

Slater G.J. and Friscia A.R. 2019. Hierarchy in adaptive radiation: A case study using the Carnivora

(Mammalia). Evolution 73:524–539.

Slater G.J., Goldbogen J.A., and Pyenson N.D. 2017. Independent evolution of baleen whale gi-

gantism linked to Plio-Pleistocene ocean dynamics. Proc R Soc B 284:20170546.

Slater G.J. and Pennell M.W. 2014. Robust regression and posterior predictive simulation increase

power to detect early bursts of trait evolution. Syst Biol 63:293–308.

Slater G.J., Price S.A., Santini F., and Alfaro M.E. 2010. Diversity versus disparity and the radia-

tion of modern cetaceans. Proc R Soc B 277:3097–3104.

Sookias R.B., Butler R.J., and Benson R.B.J. 2012. Rise of dinosaurs reveals major body-size
transitions are driven by passive processes of trait evolution. Proc R Soc B 279:2180–2187.

Stan Development Team . 2019. Stan Modeling Language Users Guide and Reference Manual.

Version 2.21.0.

Stan Development Team . 2020. RStan: The R interface to Stan. R package version 2.21.1.

Stone C.J., Hansen M.H., Kooperberg C., and Truong Y.K. 1997. Polynomial splines and their

38

tensor products in extended linear modeling. Ann Stat 25:1371–1425.

Tao Q., Tamura K., U Battistuzzi F., and Kumar S. 2019. A machine learning method for detecting

autocorrelation of evolutionary rates in large phylogenies. Mol Biol Evol 36:811–824.

Thomas G.H. and Freckleton R.P. 2012. MOTMOT: Models of trait macroevolution on trees. Meth-

ods Ecol Evol 3:145–151.

Thompson K., Baker C.S., van Helden A., Patel S., Millar C., and Constantine R. 2012. The world’s

rarest whale. Curr Biol 22:R905–R906.

Thorne J.L., Kishino H., and Painter I.S. 1998. Estimating the rate of evolution of the rate of

molecular evolution. Mol Biol Evol 15:1647–1657.

Uyeda J.C., Caetano D.S., and Pennell M.W. 2015. Comparative analysis of principal components

can be misleading. Syst Biol 64:677–689.

Uyeda J.C., Zenil-Ferguson R., and Pennell M.W. 2018. Rethinking phylogenetic comparative

methods. Syst Biol 67:1091–1109.

Vehtari A., Gelman A., Simpson D., Carpenter B., and Bürkner P.C. 2021. Rank-normalization,
folding, and localization: An improved ˆR for assessing convergence of MCMC (with discus-
sion). Bayesian Anal 16:667–718.

Villar D., Flicek P., and Odom D.T. 2014. Evolution of transcription factor binding in metazoans–

mechanisms and functional implications. Nat Rev Genet 15:221–233.

Wagenmakers E.J., Gronau Q.F., Dablander F., and Etz A. 2022. The support interval. Erkenntnis

87:589–601.

Wagenmakers E.J., Lodewyckx T., Kuriyal H., and Grasman R. 2010. Bayesian hypothesis testing

for psychologists: a tutorial on the Savage-Dickey method. Cogn Psychol 60:158–189.

Weber M.G., Mitko L., Eltz T., and Ramírez S.R. 2016. Macroevolution of perfume signalling in

orchid bees. Ecol Lett 19:1314–1323.

Welch J.J. and Waxman D. 2008. Calculating independent contrasts for the comparative study of

substitution rates. J Theor Biol 251:667–678.

Wright D.F. 2017. Phenotypic innovation and adaptive constraints in the evolutionary radiation of

Palaeozoic crinoids. Sci Rep 7:13745.

Zhao P. and Lai L. 2021. On the convergence rates of KNN density estimation. Pages 2840–2845

in 2021 IEEE International Symposium on Information Theory (ISIT) IEEE.

39

APPENDIX 1A

SUPPLEMENTAL TABLES AND FIGURES

Figure 1A.1 Relationship between simulated and estimated branchwise rate deviation parameters
(ln σ 2
dev). The solid line represents the position of the true branchwise rate deviations, while the
shallower, dashed line represents the observed line of best fit for these data.

Figure 1A.2 Power and error rates for branchwise rate parameters (ln σ 2) under relaxed signif-
icance thresholds (posterior probability < 0.1 or > 0.9). Lines depict changes in proportions of
branchwise rates considered anomalously slow (in dark blue) or fast (in light red) as a function of
simulated rate deviations (ln σ 2
dev). These results combine all fits to simulated data that detected
rate variance (σ 2
σ 2) significantly greater than 0. The proportions are equivalent to power when the
detected rate deviation is of the same sign as the true, simulated deviation (left of 0 for anoma-
lously slow rates in dark blue and right for anomalously fast rates in light red), and to error rate
when the detected and true rate deviations are of opposite signs. Here, significant rate deviations
for simulated rate deviations that are exactly 0 are considered errors regardless of sign.

40

-8-6-4-202468Estimated rate deviation(ln sdev2)Truth  Best FitSimulated rate deviation (ln sdev2)-8-6-4-20246801Proportion of significantrate deviations (ln sdev2¹0)sdev2<0   sdev2>0Simulated rate deviation ( ln sdev2)-8-6-4-202468Table 1A.1 Cetacean body length data and associated references used for empirical example.

species
Balaena mysticetus
Balaenoptera acutorostrata
Balaenoptera bonaerensis
Balaenoptera borealis
Balaenoptera edeni
Balaenoptera musculus
Balaenoptera omurai
Balaenoptera physalus
Berardius arnuxii
Berardius bairdii
Caperea marginata
Cephalorhynchus commersoni
Cephalorhynchus eutropia
Cephalorhynchus heavisidii
Cephalorhynchus hectori
Delphinapterus leucas
Delphinus capensis
Delphinus delphis
Eschrichtius robustus
Eubalaena australis
Eubalaena glacialis
Eubalaena japonica
Feresa attenuata
Globicephala macrorhynchus
Globicephala melas
Grampus griseus
Hyperoodon ampullatus
Hyperoodon planifrons
Indopacetus pacificus
Inia geoffrensis
Kogia breviceps
Kogia sima
Lagenodelphis hosei
Lagenorhynchus albirostris
Leucopleurus acutus
Lipotes vexillifer
Lissodelphis borealis
Lissodelphis peronii
Megaptera novaeangliae
Mesoplodon bidens

reference
Slater et al., 2010
Slater et al., 2010
Konishi et al., 2008
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Molina and Oporto, 1993
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Plön et al., 2012
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Fortune et al., 2021
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Baker, 1981
Slater et al., 2010
Slater et al., 2010

length (m)
18.0
10.7
10.2
16.1
15.4
33.6
10.7
21.2
8.9
12.0
6.2
1.5
1.5
1.7
1.5
3.8
2.5
2.3
14.6
13.9
13.7
17.4
2.4
4.8
5.1
3.7
7.9
7.5
7.2
2.0
3.4
2.4
2.6
3.0
2.4
2.0
2.3
2.3
18.0
5.1

41

Table 1A.1 (cont’d)

species
Mesoplodon bowdoini
Mesoplodon carlhubbsi
Mesoplodon densirostris
Mesoplodon europaeus
Mesoplodon ginkgodens
Mesoplodon grayi
Mesoplodon hectori
Mesoplodon hotaula
Mesoplodon layardii
Mesoplodon mirus
Mesoplodon perrini
Mesoplodon peruvianus
Mesoplodon stejnegeri
Mesoplodon traversii
Monodon monoceros
Neophocaena phocaenoides
Orcaella brevirostris
Orcaella heinsohni
Orcinus orca
Peponocephala electra
Phocoena dioptrica
Phocoena phocoena
Phocoena sinus
Phocoena spinipinnis
Phocoenoides dalli
Physeter macrocephalus
Platanista gangetica
Pontoporia blainvillii
Pseudorca crassidens
Sagmatias australis
Sagmatias cruciger
Sagmatias obliquidens
Sagmatias obscurus
Sotalia fluviatilis
Sotalia guianensis
Sousa chinensis
Sousa teuszii
Stenella attenuata
Stenella clymene
Stenella coeruleoalba

length (m)
4.5
5.3
4.7
5.2
4.9
5.3
4.4
4.8
6.2
5.1
4.4
3.7a
5.7
5.3
4.3
1.4
2.2
2.2
7.9
2.8
1.9
1.9
1.1
1.7
1.9
11.0
2.5
1.5
5.1
2.1
1.8
2.4
1.9
1.5
2.1
2.4
2.5
2.1
1.9
2.3

42

reference
Slater et al., 2010
Mead et al., 1982
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Dalebout et al., 2014
Slater et al., 2010
Slater et al., 2010
Dalebout et al., 2002
Reyes et al., 1991
Slater et al., 2010
Thompson et al., 2012
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Arnold and Heinsohn, 1996
Slater et al., 2010
Lodi et al., 1990
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Barros, 1991
Slater et al., 2010
Jefferson and Rosenbaum, 2014
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010

Table 1A.1 (cont’d)

species
Stenella frontalis
Stenella longirostris
Steno bredanensis
Tasmacetus shepherdi
Tursiops aduncus
Tursiops australis
Tursiops truncatus
Ziphius cavirostris
afrom male specimen because no mature females were measured
bsex not reported

reference
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Slater et al., 2010
Charlton-Robb et al., 2011
Slater et al., 2010
Slater et al., 2010

length (m)
2.1
2.0
2.6
6.5
2.1
2.8b
2.4
6.4

43

APPENDIX 1B

APPROXIMATING GEOMETRIC BROWNIAN MOTION TIME-AVERAGES

Our model seeks to model rates (σ 2) as “evolving” under a trended Geometric Brownian mo-

tion (GBM)-like process, whereby the natural log of rates evolve in a trended Brownian motion

(BM)-like manner. Unfortunately, this requires an expression for the probability distribution of

GBM time-averages along each branch in the phylogeny. Expressions for such distributions are

infamously intractable, necessitating approximate solutions (Dufresne, 2004; Lepage et al., 2007).

For our model, we use a multivariate log-normal approximation to model rate time-averages along
¯σ 2) based on two observations. First, as the rate variance pa-

each branch (branchwise averages,

rameter (σ 2

σ 2) approaches 0, rates (σ 2) will converge to following a simple exponential function
0 is the starting rate, µσ 2 is the trend, and t is time.

0 exp[µσ 2t], where σ 2

with respect to time, σ 2 = σ 2

In this case, the branchwise averages can be derived through integration and are equivalent to the

time-averaged rates expected under a conventional “early/late burst” (EB/LB) model (Blomberg

et al., 2003). Second, over short amounts of time and/or with low rate variance, the arithmetic and

geometric time-averages of a GBM process approach one another. The geometric time-average of a

GBM process is simply the exponentiated arithmetic time-average of the GBM process on the nat-

ural log scale, which has a straight-forward and tractable log-normal distribution (Devreese et al.,

2010). Thus, assuming that branch lengths in a phylogeny are typically short and rate variance is

relatively low, we can approximate the distribution of the natural log of branchwise averages by

adding multivariate normal “noise”, γ, to the natural log of branchwise averages expected under a

conventional EB/LB model, β . In other words:

ln(σ 2) ≈ β + γ


0


β = ln(σ 2

0 ) +

ln(|exp[µσ 2τ2] − exp[µσ 2τ1]|) − ln(|µσ 2|) − ln(t)

γ ∼ MVN(0, σ 2

σ 2D)

44

if µσ 2 = 0

if µσ 2 ̸= 0

(1)

(2)

(3)

as in the main text. Here, t is a vector of branch lengths, τ1 and τ2 are vectors of the start and

end times of each branch (i.e., τ2 − τ1 = t), and D is the variance-covariance matrix of branchwise

averages for a value evolving under an untrended BM process on a phylogeny. Let ¯x and t be

vectors of time-averaged trait values and edge lengths, respectively, for three edges: two sister

edges, i and j, with ancestral edge, k. If traits evolve under an untrended BM process and the

ancestral trait value of k is fixed, the variances of ¯xi and ¯x j are ti/3 + tk and t j/3 + tk, respectively.

The covariance between ¯xi and ¯x j is simply tk, and the covariances between either ¯xi or ¯x j and ¯xk

is tk/2 (Devreese et al., 2010). From this, we can derive an expression for the variance-covariance

matrix of branchwise averages given an arbitrary phylogeny, as shown in the main text:

Di, j = ∑

k∈anc(i, j)

tk −






2ti/3

ti/2

t j/2

0

if i = j

if i ∈ anc( j, j)

if j ∈ anc(i, i)

if i ̸= j, i ̸∈ anc( j, j), j ̸∈ anc(i, i)

(4)

While this multivariate log-normal approximation is rough, we demonstrate here that it is

largely sufficient for our purposes. Notably, we are not the first to approximate GBM time-averages

using log-normal distributions in the context of comparative phylogenetics (Welch and Waxman,

2008). There are two other tractable strategies for approximating these distributions given in the

comparative phylogenetics literature. Both of these strategies use the fact that values at the nodes of

a phylogeny evolving under a GBM process follow an exact multivariate log-normal distribution,

and instead focus on estimating nodewise values. Branchwise averages are then approximated by

either averaging ancestral and descendant nodewise values for each edge (e.g., Thorne et al., 1998)

or via the maximum likelihood estimate of branchwise averages given the ancestral and descendant

nodewise values (e.g., Lartillot and Poujol, 2011; Revell, 2021). We term these strategies “end-

point averaging” and “endpoint integration”, respectively. We prefer the log-normal approximation

due to its convenient formulation and direct focus on estimating branchwise, rather than nodewise,

quantities. In the spirit of thoroughness, however, we conducted three simulation experiments to

45

investigate the relative performance of these different approximation strategies.

We first conducted a simple experiment where we simulated 100,000 GBM time-averages on

the natural log scale under each approximation strategy. We also estimated a “true” branchwise

average distribution for comparison by simulating 100,000 fine-grained GBM sample paths (1,000

time points) and taking the natural log of each sample path’s average. We repeated these simula-

tions for each combination of trend (µσ 2) and rate variance (σ 2

σ 2) parameter values used in the main
text’s simulation study (Fig. 1B.1). All simulations were standardized to occur over a time interval

of 1, just as each phylogeny in our simulation study was rescaled to have a total height of 1. The

results below thus represent how “off” each approximation would be for a single branch spanning

the entire height of a phylogeny in our simuation study. The log-normal approximation notably

lacks a right skew characteristic of the true distribution and other approximations. The log-normal

approximation also appears to overestimate the variance of branchwise averages when trends are

decreasing and underestimates variance when trends are increasing, particularly with high rate

variance. On the other hand, the endpoint average approximation exhibits notable upward bias and

consistently underestimates branchwise average variance. Additionally, this approximation fails

to converge to the correct branchwise average when rate variance is 0. Lastly, the endpoint in-

tegration approximation exhibits no notable bias but underestimates branchwise average variance

in the case of no or decreasing trends. The accuracy of branchwise average variance under the

log-normal approximation might be improved by adapting the Fenton-Wilkinson approximation of

log-normal sums for GBM processes (Safak and Safak, 2002), but we did not explore this here.

The above results help give a sense of where each approximation breaks down in parameter

space, yet poorly represent the practical behavior of each approximation. In the context of our

model, these approximations take place on individual branches of a phylogeny, which typically

span relatively short intervals of time. For our next simulation experiment, we scaled up to simulat-

ing sets of branchwise averages on entire phylogenies. For each parameter combination (excluding

combinations where rate variance is 0), we repeated the same simulations on 100 pure birth phylo-

genies with either 50, 100, or 200 species (generated using the R package phytools; Revell, 2012)

46

Figure 1B.1 Distributions of simulated branchwise averages under different approximation strate-
gies and the true distribution given parameter combinations used in the main text’s simulation
study. All simulations were run on single branches of length 1.

standardized to a height of 1. For each phylogeny, we simulated 1,000 sets of branchwise averages

under each approximation strategy, as well as fine-grained GBM sample paths ( 1,000 time points

across entire phylogeny’s height) representing the true distribution. Because these samples have a

high number of dimensions (one for each branch in a phylogeny), we visualized how well these

multivariate distributions match one another using summary statistics. Specifically, for each tree,

we recorded the correlation coefficients between the means/(co)variances of branchwise averages

simulated under each approximation strategy and the true distribution (Figs. 1B.2–1B.7). To have

a null expectation for these correlation coefficients, we also simulated a second true distribution

and estimated correlation coefficients for means/(co)variances between replicate true distributions.

Overall, the results indicate that all approximations do a fairly good job at recapitulating the

means and (co)variances expected under the true distribution. The log-normal approximation no-

47

ms2= -4ss22= 0ms2= 0-10-5051015ms2= 4ss22= 3-10-5051015ln(s2)truthendpoint averagesendpoint integrationlognormal approximationss22= 6-10-5051015tably exhibits uncorrelated means in the case of no trend, in contrast to other approximations. This

is due to the log-normal approximation lacking the right skew of the true distribution and other

approximations (Fig. 1B.1), which naturally inflates the means of branchwise average distribu-

tions along long branches. In the case of any trend, the endpoint average approximation exhibits

somewhat less strong correlations between branchwise average means compared to other approx-

imations. When rate variance is high, the log-normal approximation exhibits performance inter-

mediate between the endpoint average approximation and endpoint integration approximation/null

distribution. However, even the worst performing simulations nearly always exhibit strong corre-

lations in branchwise average means above 0.98. In contrast to means, correlations for branchwise

average (co)variances consistently varied between about 0.98-0.99 regardless of simulation param-

eters or approximation strategy, closely matching the null distribution.

Because GBM time-averages are non-normally distributed, we also sought a non-parametric

method of comparing samples from the approximations and true distributions. For this, we at-

tempted to use the R package FNN (Beygelzimer et al., 2022) to estimate Kullback-Leibler (KL)

divergence from each approximation to the true distribution. However, this estimator exhibited

severe numerical issues, like negative KL divergence estimates. Thus, we instead implemented a

crude K nearest neighbor probability density estimator (Zhao and Lai, 2021). For each tree in the

simulation experiment above, we used this estimator to calculate local probability densities under

each approximation and the true distribution around samples from a replicate true distribution. We

then calculated log ratios of the true densities to densities under each approximation and averaged

the distances between these log ratios and 0 (i.e., equal densities). These averaged distances give

a rough sense of how well the probability density of each approximation matches that of the true

distribution, with increased sampling in higher-density regions of the true distribution (Figs. 1B.8

and 1B.9). Overall, the average log density ratio distances under each approximation matches the

null distribution well. The endpoint average and log-normal approximations exhibit marginally

elevated distances in the case of non-zero trends and decreasing trends, respectively, likely due to

these approximations’ under/overestimation of branchwise average variance in certain regions of

48

Figure 1B.2 Distributions of correlation coefficients between mean simulated branchwise averages
under different approximation strategies and the true distribution with rate variance (σ 2
σ 2) set to 3.
All simulations were run on pure-birth phylogenies of height 1.

parameter space (Fig. 1B.1).

Lastly, we redid our entire simulation study with trait evolution rates simulated as evolving

under a fine-grained GBM process (∼500 time points across entire phylogeny’s height). We

present all figures and tables for this simulation study below (Figs. 1B.10–1B.14; Tables 1.3,

1B.1 and 1B.2). In general, the results qualitatively match those of the simulation study presented

in the main text, and we feel confident that the log-normal approximation of branchwise aver-

ages is sufficient for our model. While there is some discrepancy in the statistical power of trend

detection compared to results in the main text, it is unlikely such discrepancies result from system-

atic bias. Notably, statistical power for trend detection even under conventional EB/LB models in

this simulation study also differs from the main text results, suggesting that any discrepancies are

attributable to variation in the simulated data.

49

ms2= -450 speciesms2= 0-1.0-0.50.00.51.0ms2= 4100 species-1.0-0.50.00.51.0correlation coefficient between mean ln(s2) (ss22 = 3)truthendpoint averagesendpoint integrationlognormal approximation200 species-1.0-0.50.00.51.0Figure 1B.3 Distributions of correlation coefficients between mean simulated branchwise averages
under different approximation strategies and the true distribution with rate variance (σ 2
σ 2) set to 3.
All simulations were run on pure-birth phylogenies of height 1. Plots are zoomed in on distribu-
tions close to 1.

50

ms2= -450 speciesms2= 00.900.920.940.960.981.00ms2= 4100 species0.900.920.940.960.981.00correlation coefficient between mean ln(s2) (ss22 = 3)truthendpoint averagesendpoint integrationlognormal approximation200 species0.900.920.940.960.981.00Figure 1B.4 Distributions of correlation coefficients between mean simulated branchwise averages
under different approximation strategies and the true distribution with rate variance (σ 2
σ 2) set to 6.
All simulations were run on pure-birth phylogenies of height 1.

51

ms2= -450 speciesms2= 0-1.0-0.50.00.51.0ms2= 4100 species-1.0-0.50.00.51.0correlation coefficient between mean ln(s2) (ss22 = 6)truthendpoint averagesendpoint integrationlognormal approximation200 species-1.0-0.50.00.51.0Figure 1B.5 Distributions of correlation coefficients between mean simulated branchwise averages
under different approximation strategies and the true distribution with rate variance (σ 2
σ 2) set to 6.
All simulations were run on pure-birth phylogenies of height 1. Plots are zoomed in on distribu-
tions close to 1.

52

ms2= -450 speciesms2= 00.900.920.940.960.981.00ms2= 4100 species0.900.920.940.960.981.00correlation coefficient between mean ln(s2) (ss22 = 6)truthendpoint averagesendpoint integrationlognormal approximation200 species0.900.920.940.960.981.00Figure 1B.6 Distributions of correlation coefficients between simulated branchwise average
(co)variances under different approximation strategies and the true distribution with rate variance
(σ 2

σ 2) set to 3. All simulations were run on pure-birth phylogenies of height 1.

53

ms2= -450 speciesms2= 00.900.920.940.960.981.00ms2= 4100 species0.900.920.940.960.981.00correlation coefficient between ln(s2) (co)variances (ss22 = 3)truthendpoint averagesendpoint integrationlognormal approximation200 species0.900.920.940.960.981.00Figure 1B.7 Distributions of correlation coefficients between simulated branchwise average
(co)variances under different approximation strategies and the true distribution with rate variance
(σ 2

σ 2) set to 6. All simulations were run on pure-birth phylogenies of height 1.

54

ms2= -450 speciesms2= 00.900.920.940.960.981.00ms2= 4100 species0.900.920.940.960.981.00correlation coefficient between ln(s2) (co)variances (ss22 = 6)truthendpoint averagesendpoint integrationlognormal approximation200 species0.900.920.940.960.981.00Figure 1B.8 Distributions of average log density ratio distances between simulated branchwise
average distributions under different approximation strategies and the true distribution with rate
variance (σ 2
σ 2) set to 3. Probability densities were estimated via K nearest neighbors. All simula-
tions were run on pure-birth phylogenies of height 3.

55

ms2= -450 speciesms2= 023456ms2= 4100 species3456789average log density ratio distance (ss22 = 3)truthendpoint averagesendpoint integrationlognormal approximation200 species68101214Figure 1B.9 Distributions of average log density ratio distances between simulated branchwise
average distributions under different approximation strategies and the true distribution with rate
variance (σ 2
σ 2) set to 6. Probability densities were estimated via K nearest neighbors. All simula-
tions were run on pure-birth phylogenies of height 1.

56

ms2= -450 speciesms2= 023456ms2= 4100 species3456789average log density ratio distance (ss22 = 6)truthendpoint averagesendpoint integrationlognormal approximation200 species68101214Figure 1B.10 Relationship between simulated and estimated rate variance (σ 2
σ 2) and trend (µσ 2)
parameters. Each point is the posterior median from a single fit, while the violins are combined
posterior distributions from all fits for a given trait evolution scenario. Vertical lines represent the
50% (thicker lines) and 95% equal-tailed intervals (thinner lines) of these combined posteriors,
while horizontal lines represent positions of true, simulated values.

Figure 1B.11 Power and error rates for the rate variance parameter (σ 2
σ 2). Lines depict changes in
the proportion of model fits that correctly showed evidence for rate variance significantly greater
than 0 (i.e., power, in black) and incorrectly showed evidence (i.e., error, in red) as a function of
tree size.

57

0510152025Estimated rate variance (ss22)-10-5051015-404Estimated trend (ms2)Number of species50100200Simulated trend (ms2)-404Simulated rate variance (ss22)036-404Proportion of fits withrate variance (ss22 > 0)01Decreasing (ms2 = -4)50100200Simulated trendNone (ms2 = 0)Number of species50100200Increasing (ms2 = 4)Power  Error50100200Figure 1B.12 Power and error rates for the trend parameter (µσ 2). Lines depict changes in the
proportion of model fits that correctly showed evidence for trends significantly less and greater
than 0 (i.e., power, in black) and incorrectly showed evidence (i.e., error, in light red) as a function
of tree size. Results are shown for both models allowed to freely estimate rate variance (σ 2
σ 2) (i.e.,
unconstrained models, solid lines) and models with rate variance constrained to 0 (i.e., constrained
models, dashed lines). The latter models are identical to conventional early/late burst models.

Figure 1B.13 Power and error rates for branchwise rate parameters (ln σ 2). Lines depict changes
in proportions of branchwise rates considered anomalously slow (in dark blue) or fast (in light red)
as a function of simulated rate deviations (ln σ 2
dev). These results combine all fits to simulated
data that detected rate variance (σ 2
σ 2) significantly greater than 0. The proportions are equivalent
to power when the detected rate deviation is of the same sign as the true, simulated deviation (left
of 0 for anomalously slow rates in dark blue and right for anomalously fast rates in light red), and
to error rate when the detected and true rate deviations are of opposite signs. Here, significant rate
deviations for simulated rate deviations that are exactly 0 are considered errors regardless of sign.

58

Proportion of fits withdecreasing trend (ms2 < 0)01None (ss22 = 0)Proportion of fits withincreasing trend (ms2 > 0)0150100200Simulated rate varianceModerate (ss22 = 3)Number of species50100200Power  ErrorUnconstrained  ConstrainedHigh (ss22 = 6)5010020001Proportion of significantrate deviations (ln sdev2¹0)sdev2<0   sdev2>0Simulated rate deviation ( ln sdev2)-8-6-4-202468Figure 1B.14 Relationship between simulated and estimated branchwise rate parameters (ln σ 2).
For each simulation and posterior sample, branchwise rates were first centered by subtracting their
mean. We estimated centered branchwise rates by taking the median of the centered posterior
samples. The solid line represents the position of the true centered branchwise rates, while the
shallower, dashed line represents the observed line of best fit for these data.

Table 1B.1 Median absolute errors of rate variance, trend, and branchwise rate posteriors (i.e.,
median absolute difference between posterior samples and their true, simulated values, a measure
of posterior distribution accuracy), averaged across replicates for each simulated trait evolution
scenario and tree size. σ 2
σ 2 and µσ 2 indicate the true, simulated values of rate variance and trend
parameters, respectively.

σ 2
σ 2 =

rate variance
6
3
0

trend
3

0

6

branchwise rates
6
3

0

50 species

µσ 2 = -4
0
4

0.61
0.89
0.58

1.58
1.89
1.68

2.26
2.23
2.41

0.94 1.68
2.09
2.15

1.78
1.56 2.22
2.98 2.62

0.42
0.62
0.63

0.80
0.82
0.92

0.96
1.04
0.98

100 species

µσ 2 = -4
0
4

0.31
0.31
0.26

2.11
1.59
1.49

2.37
1.95
2.21

0.91 1.22
0.81
1.67

1.43
1.26 1.47
2.16 2.02

0.32
0.32
0.41

0.77
0.82
0.85

0.86
0.93
0.94

200 species

µσ 2 = -4
0
4

0.14
0.21
0.18

1.23
0.93
0.98

1.79
1.82
1.50

0.62 0.66
0.65
1.09

1.29
1.09 1.10
1.17 1.27

0.23
0.24
0.28

0.68
0.72
0.73

0.80
0.84
0.84

59

-8-6-4-202468Estimated branchwise rate(centered ln s2)Truth  Best FitSimulated branchwise rate (centered ln s2)-8-6-4-202468Table 1B.2 Breadths of rate variance, trend, and branchwise rate posteriors (i.e., the difference
between the 97.5% and 2.5% quantiles of posterior samples, a measure of posterior distribution
precision), averaged across replicates for each simulated trait evolution scenario and tree size. σ 2
σ 2
and µσ 2 indicate the true, simulated values of rate variance and trend parameters, respectively.

σ 2
σ 2 =

rate variance
3

6

0

trend
3

branchwise rates
6
3

0

6

0

50 species

µσ 2 = -4
0
4

3.67
4.38
3.35

9.11
10.67
9.00

12.98
12.60
13.88

4.66
7.28
10.34

6.02
7.09
10.95

2.28
6.81
8.00
2.60
12.09 2.81

3.24
3.41
3.50

3.65
3.89
4.10

µσ 2 = -4
0
4

1.77
1.64
1.36

µσ 2 = -4
0
4

0.71
1.04
0.79

7.96
6.72
6.77

3.97
4.26
3.62

9.58
9.15
8.13

7.20
6.52
6.89

100 species

3.53
4.04
6.74

4.56
5.09
8.08

200 species

2.64
3.34
4.53

3.58
3.98
4.88

4.72
5.67
7.86

4.06
4.15
5.69

1.71
1.76
1.87

3.22
3.12
3.31

3.46
3.42
3.55

1.24
1.36
1.39

2.50
2.77
2.70

3.12
3.25
3.37

60

Table 1B.3 Coverage of rate variance, trend, and branchwise rate posteriors (i.e., proportion of
times the true, simulated value is greater than the 2.5% posterior distribution quantile and less than
the 97.5% quantile) for each simulated trait evolution scenario and tree size. σ 2
σ 2 and µσ 2 indicate
the true, simulated values of rate variance and trend parameters, respectively.

σ 2
σ 2 =

rate variance
6
3
0

trend
3

0

6

branchwise rates
6
3

0

50 species

µσ 2 = -4 — 1.00 1.00
1.00
0 — 1.00
1.00
4 — 1.00

1.00
0.90
1.00

0.80
1.00
0.80

0.90
1.00
1.00

1.00 0.95 0.94
0.98 0.94
0.97
0.94 0.96
0.95

100 species

µσ 2 = -4 — 0.70 0.90
1.00
0 — 1.00
0.90
4 — 1.00

0.90
1.00
0.90

1.00
1.00
0.90

0.90
0.90
1.00

1.00 0.96 0.96
0.94 0.94
1.00
0.95 0.93
0.99

200 species

µσ 2 = -4 — 0.90 1.00
0.80
0 — 1.00
1.00
4 — 1.00

1.00
1.00
0.90

1.00
0.90
1.00

0.90
1.00
1.00

0.99 0.93 0.95
0.95 0.95
1.00
0.93 0.96
0.99

61

APPENDIX 1C

AVERAGE CHANGES IN TRAIT EVOLUTION RATES

Conventional early/late burst (EB/LB) models of trait evolution assume that rates follow a

homogeneous, exponential declines or increases with respect to time (Blomberg et al., 2003). The

definition of EBs/LBs under such models is thus straight-forward–any given time slice in a clade’s

history is associated with a single trait evolution rate, and these rates can only decrease, increase or

stay the same. On the other hand, allowing for rate heterogeneity independent of overall temporal

trends means that any given time slice in a clade’s history is associated with a distribution of trait

evolution rates. Because of this, our new method allows for alternative definitions of EBs/LBs,

depending on how one summarizes these distributions. In the current study, we mainly consider

a definition based on whether the medians, or geometric means, of these distributions decrease or

increase over time (change per unit time given by µσ 2, hereafter the “trend” parameter, as in the

main text). Alternatively, one could use a definition based on whether the average, or arithmetic

means, of these distributions decrease or increase over time (change per unit time given by µσ 2 +
σ 2
σ 2/2, hereafter the “average change” parameter, δσ 2).

We chose to focus on trend over average change estimation and define EBs/LBs based on the

trend parameter for a few reasons. First, average change is a composite parameter of both the trend

and rate variance parameters, posing some interpretational challenges. In general, it seems more

intuitive to consider the magnitude of deterministic changes in trait evolution rates (the trend com-

ponent) apart from the magnitude of stochastic changes (the rate variance component). Second,

because rates evolve in an approximately log-normal manner under our model, medians are a nat-

ural, reliable way of summarizing their distributions, corresponding to the exponentiated average

of rates on the natural log scale. In contrast, the right skew of log-normal distributions causes raw

averages of trait evolution rates to be highly influenced by few, extreme outliers, particularly when

rate variance is high. For this reason, our model can produce trait evolution scenarios whereby

rates exhibit declines in the majority of lineages (directly related to changes in median rates) while

increasing on average (Fig. 1C.1 and 1C.2). Lastly, many macroevolutionary biologists consider

62

“accounting” for lineages/subclades exhibiting unusual trait evolution rates critical to elucidating

and understanding changes in rates over time (Lloyd et al., 2012; Slater and Pennell, 2014; Ben-

son et al., 2014; Hopkins and Smith, 2015; Wright, 2017; Puttick, 2018). This implies that many

empiricists intuitively define EBs/LBs based on majority changes in rates rather than changes in

average rates. Additionally, by log-transforming traits prior to analysis, many macroevolutionary

biologists implicitly use GBM processes to model trait evolution, just as we use a (approximate)

GBM process to model rate evolution here. In the context of trait evolution, the analogous trend

parameter is widely considered by empiricists and method developers alike to determine whether

a clade exhibits a directional “evolutionary trend” in traits, regardless the estimated variance pa-

rameter (Hunt, 2006; Raj Pant et al., 2014; Sookias et al., 2012; Gill et al., 2017).

Figure 1C.1 Distributions of 6,000 rates simulated as evolving under a GBM process with trend
of -0.015 and rate variance of 0.05 at various time points, with starting rate of 1 at time t = 0.
Parameter values were chosen to clearly illustrate how rates under our model may exhibit majority
declines while increasing on average due to the skewed nature of rate change. Solid and dashed
vertical lines represent the positions of median and average rate values, respectively, for each time
point.

Here, we briefly consider our new method’s performance with respect to estimating and de-

63

0.00.20.40.60.81.0t = 2t = 4t = 8t = 160.00.51.01.52.02.53.03.54.0Probability densitys2medianaverageFigure 1C.2 Changes over time in the median and average of 6,000 rates simulated as evolving
under a GBM process with trend of -0.015 and rate variance of 0.05, with starting rate of 1 at time
t = 0. Parameter values were chosen to clearly illustrate how rates under our model may exhibit
majority declines while increasing on average due to the skewed nature of rate change. Solid and
dashed lines depict changes in median and average rate values, respectively, while the dotted line
depicts changes in the proportion of rates greater than the starting rate of 1.

tecting average changes in trait evolution rates. Interestingly, our simulation study results revealed

that, in the presence of time-independent rate heterogeneity, conventional EB/LB models (equiv-

alent to our new models with rate variance constrained to 0) appear to estimate average change,

rather than trend parameters, as defined under our model (Fig. 1C.3 and 1C.4). We are not aware of

any previous research explicitly demonstrating this phenomenon. When comparing performance

of constrained to unconstrained models with respect to detecting significant average change (i.e.,

95% equal-tailed interval lies entirely below or above 0), we generally see only a modest reduc-

tion in error rates and greatly reduced power to detect negative average change under the full,

unconstrained model (Fig. 1C.5). Nonetheless, inference of the average change parameter seems

substantially improved under unconstrained models (Tables 1C.1–1C.3). In the presence of time-

independent rate heterogeneity, constrained models tend to exhibit less accurate, overly-narrow

64

0501001502000.00.20.40.60.81.0TimeMedian s2/Proportion s2 > 112.545.5Average s2MedianAverageProportionposterior estimates of average change, particularly when the rate variance and trend parameters

are high, resulting in low posterior coverage. This warrants caution in interpreting the results

of conventional EB/LB models fitted to comparative data exhibiting substantial time-independent

rate heterogeneity, and we recommend estimating rate variance even when one’s only goal is to

estimate changes in average trait evolution rates over time.

Figure 1C.3 Relationship between simulated rate variance (σ 2
σ 2)/trend (µσ 2) and estimated trend
parameters. Each point is the posterior median from a single fit, while the violins are combined
posterior distributions from all fits for a given trait evolution scenario. Vertical lines represent the
50% (thicker lines) and 95% equal-tailed intervals (thinner lines) of these combined posteriors,
while horizontal lines represent positions of true, simulated values. Results for models with esti-
mated rate variance unconstrained and constrained to 0 are shown on top and bottom, respectively.

65

-100102030Unconstrained model-100102030-404Constrained model (ss22 = 0)Number of species50100200Simulated trend (ms2)-404Simulated rate variance (ss22)036-404Figure 1C.4 Relationship between simulated rate variance (σ 2
σ 2)/trend (µσ 2) and estimated average
change (δσ 2) parameters. Each point is the posterior median from a single fit, while the violins
are combined posterior distributions from all fits for a given trait evolution scenario. Vertical lines
represent the 50% (thicker lines) and 95% equal-tailed intervals (thinner lines) of these combined
posteriors, while horizontal lines represent positions of true, simulated values. Results for models
with estimated rate variance unconstrained and constrained to 0 are shown on top and bottom,
respectively.

66

-100102030Unconstrained model-100102030-404Constrained model (ss22 = 0)Number of species50100200Simulated trend (ms2)-404Simulated rate variance (ss22)036-404Figure 1C.5 Power and error rates for the average parameter (δσ 2). Lines depict changes in the
proportion of model fits that correctly showed evidence for average change significantly less and
greater than 0 (i.e., power, in black) and incorrectly showed evidence (i.e., error, in light red) as a
function of tree size. Results are shown for both models allowed to freely estimate rate variance
(σ 2
σ 2) (i.e., unconstrained models, solid lines) and models with rate variance constrained to 0 (i.e.,
constrained models, dashed lines). The latter models are identical to conventional early/late burst
models.

67

Proportion of fits withaverage change (ds2) < 001None (ss22 = 0; ds2 = -4, 0, 4)Proportion of fits withaverage change (ds2) > 00150100200Simulated rate varianceModerate (ss22 = 3; ds2 = -2.5, 1.5, 5.5)Number of species50100200Power  ErrorUnconstrained  ConstrainedHigh (ss22 = 6; ds2 = -1, 3, 7)50100200Table 1C.1 Median absolute errors of average change posteriors (i.e., median absolute difference
between posterior samples and their true, simulated values, a measure of posterior distribution
accuracy) under models with rate variance unconstrained and constrained to 0, averaged across
replicates for each simulated trait evolution scenario and tree size. σ 2
σ 2 and µσ 2 indicate the true,
simulated values of rate variance and trend parameters, respectively.

σ 2
σ 2 =

unconstrained
6
3
0

constrained
3
0

6

50 species

µσ 2 = -4
0
4

1.41
1.43
2.22

1.61
2.10
3.04

2.50
3.08
3.34

1.50
1.23
1.45 2.45
2.05 3.05

2.74
6.08
3.87

100 species

µσ 2 = -4
0
4

0.78
1.15
1.92

1.27
1.65
1.98

1.70
1.72
1.85

0.74
1.28
1.08 1.88
1.70 2.08

1.78
3.35
4.39

200 species

µσ 2 = -4
0
4

0.79
0.92
0.97

0.92
1.21
1.15

1.44
1.01
1.51

0.78
0.96
0.90 1.35
0.94 1.80

1.19
3.05
5.06

68

Table 1C.2 Breadths of average change posteriors (i.e., the difference between the 97.5% and 2.5%
quantiles of posterior samples, a measure of posterior distribution precision) under models with
rate variance unconstrained and constrained to 0, averaged across replicates for each simulated
trait evolution scenario and tree size. σ 2
σ 2 and µσ 2 indicate the true, simulated values of rate
variance and trend parameters, respectively.

σ 2
σ 2 =

unconstrained
3

6

0

constrained
3

0

6

50 species

µσ 2 = -4
0
4

5.56
6.25
11.04

7.95
9.89
11.69

10.27
11.46
12.45

4.65
4.50
5.47
6.82
9.48 10.81

5.63
8.45
10.60

µσ 2 = -4
0
4

µσ 2 = -4
0
4

3.42
4.46
7.64

2.82
3.40
4.51

5.48
6.27
8.97

4.14
4.50
5.45

100 species

6.54
7.44
8.56

3.07
3.97
7.12

200 species

5.07
5.13
6.29

2.70
3.25
4.38

3.49
4.58
8.41

2.87
3.30
5.78

3.84
6.60
8.40

3.26
3.37
8.73

69

Table 1C.3 Coverage of average change posteriors (i.e., proportion of times the true, simulated
value is greater than the 2.5% posterior distribution quantile and less than the 97.5% quantile) under
models with rate variance unconstrained and constrained to 0 for each simulated trait evolution
scenario and tree size. σ 2
σ 2 and µσ 2 indicate the true, simulated values of rate variance and trend
parameters, respectively.

σ 2
σ 2 =

unconstrained
6
3
0

constrained
3
0

6

50 species

µσ 2 = -4
0
4

0.90
1.00
1.00

1.00
1.00
1.00

0.90
1.00
0.90

0.60
0.80
0.90 0.80
1.00 1.00

0.50
0.40
0.80

100 species

µσ 2 = -4
0
4

1.00
1.00
0.90

1.00
0.90
1.00

0.90
0.90
1.00

1.00
0.80
0.90 0.60
0.90 0.90

0.60
0.60
0.60

200 species

µσ 2 = -4
0
4

1.00
0.90
1.00

1.00
1.00
1.00

0.90
1.00
1.00

1.00
0.90
0.90 0.60
1.00 0.90

0.80
0.10
0.50

70

APPENDIX 1D

PRIOR SENSITIVITY STUDY

To see how sensitive our method is to alternate prior specifications, we refit models to our

smallest simulations (50 tips) while varying prior settings. We focus on the smallest simulations

because the priors are more influential when there is less data.

In addition to refitting models

with default priors to each simulation (see Priors subsection of Materials and Methods section in

main text), we also refit models with “tight” and “loose” prior settings, whereby the priors for rate

variance (σ 2

σ 2), trend (µσ 2), and root rate (σ 2

0 ) parameters were made more or less informative,

respectively. We did this by either reducing the prior scale parameter (i.e., standard deviation in

the case of normal distributions) 5-fold for more precise, informative priors or increasing 3-fold

for more relaxed, uninformative priors (i.e., prior scales of 1/T for rate variance, 2/T for trend,

and 2 for root rate under the tight settings and 15/T , 30/T , and 30 under the loose settings, where

T is the height of the phylogeny). Within each of these three prior settings (tight, default, or loose),

we additionally shifted the location of the root rate prior by either -3, 0, or 3, yielding a total of 9

prior settings. These shifts correspond to ∼20-fold changes in the expected root rate.

Because this simulation study design requires many more model fits compared to the main

text’s simulation study (9 trait evolution scenarios with 10 replicates refit under 9 different prior

settings, yielding 810 model fits), we only ran 2 Hamiltonian Monte Carlo chains consisting of

1,500 iterations for each model fit and discarded the first 750 iterations as warmup. Chains still

mixed relatively well despite the shorter chains (greatest ˆR ≈ 1.021), though effective sample sizes

were unsurprisingly lower compared to results in the main text. Nonetheless, bulk effective sample

sizes always exceeded the minimum recommended 100 per chain (Vehtari et al., 2021), and all tail

effective sizes exceeded 100. Divergent transitions remained relatively rare, with 18 fits exhibiting

a single divergent transition and another 4 with 2-5 each. Most low tail effective sample sizes

and divergent transitions were associated with loose prior settings, likely reflecting difficulty in

sampling the fat tails of posteriors under such priors.

Overall results suggest that evorates is robust to alternate prior specifications unless the priors

71

are overly informative (Figs. 1D.1–1D.3; Tables 1D.1–1D.12). In particular, shifting the root rate

prior location had little effect on posterior distributions provided the prior’s scale is larger than

the shift magnitude (as in the case of default and loose prior settings). Unsurprisingly, posterior

precision generally decreased with more uninformative priors, and loose priors thus tended to

yield less accurate posteriors with higher median absolute errors. Counterintuitively, however,

default prior settings often resulted in more accurate posteriors than tight prior settings. In the

case of branchwise rates, this is likely due to lower estimates of rate variance under tight priors,

increasing the shrinkage of branchwise rate estimates (Fig. 1D.4). In the case of trend and root rate

inference, this phenomenon mostly occurred when the root rate prior and simulated trend “conflict”

by implying different patterns of rate change over time (e.g., a root rate prior shifted by -3 suggests

rates must have increased over time to yield the observed trait data, while a decreasing trend

implies the opposite). Accordingly, posterior coverage remained essentially constant at ∼95%

under default and loose prior settings, but dropped significantly–sometimes as low as 10%–under

tight prior settings when the root rate prior and simulated trend conflicted in this manner.

Despite the relatively inaccurate inferences of branchwise rate, root rate, and trend parame-

ters under overly informative priors, hypothesis testing was still largely reliable, albeit sometimes

underpowered, under all prior settings we considered. Across the board, error rates remained con-

servative at around 5% or lower, with decreasing trends never mistaken for increasing trends and

vice versa. Error rates for detecting significant rate variance may be slightly inflated under tight pri-

ors (Fig. 1D.5), perhaps due to tighter constraints on trend estimation forcing the model to instead

attribute apparent rate heterogeneity to rate variance. Nonetheless, power to detect significant rate

variance appears consistent regardless of prior settings. Notably, the same is true for anomalous

rate detection, despite the increasing shrinkage of branchwise rate estimation under tighter priors

(Fig. 1D.7). On the other hand, prior settings had considerable influence on power to detect trends

(Fig. 1D.6), with generally increasing power under looser priors – particularly when the root rate

prior shift and simulated trend both imply similar patterns of rate change over time (e.g., a root

rate prior shifted by 3 and decreasing trend).

72

Figure 1D.1 The effect of trait evolution scenario and prior settings on inference of the rate vari-
ance parameter (σ 2
σ 2). Each point is the posterior median from a single fit, while the violins are
combined posterior distributions from all fits for a given trait evolution scenario and prior setting.
Vertical lines represent the 50% (thicker lines) and 95% equal-tailed intervals (thinner lines) of
these combined posteriors, while horizontal lines represent positions of true, simulated values.

73

051015202530Tight priors051015202530Default priors051015202530-404Loose priorsRoot rate (s02) prior shifted by:-303Estimated rate variance (ss22)Simulated trend (ms2)-404Simulated rate variance (ss22)036-404Figure 1D.2 The effect of trait evolution scenario and prior settings on inference of the trend pa-
rameter (µσ 2). Each point is the posterior median from a single fit, while the violins are combined
posterior distributions from all fits for a given trait evolution scenario and prior setting. Vertical
lines represent the 50% (thicker lines) and 95% equal-tailed intervals (thinner lines) of these com-
bined posteriors, while horizontal lines represent positions of true, simulated values.

74

-100102030Tight priors-100102030Default priors-100102030-404Loose priorsRoot rate (s02) prior shifted by:-303Estimated trend (ms2)Simulated trend (ms2)-404Simulated rate variance (ss22)036-404Figure 1D.3 The effect of trait evolution scenario and prior settings on inference of the root rate
parameter (σ 2
0 ). Each point is the posterior median from a single fit, while the violins are combined
posterior distributions from all fits for a given trait evolution scenario and prior setting. Vertical
lines represent the 50% (thicker lines) and 95% equal-tailed intervals (thinner lines) of these com-
bined posteriors, while horizontal lines represent positions of true, simulated values.

75

-20-1001020Tight priors-20-1001020Default priors-20-1001020-404Loose priorsRoot rate (s02) prior shifted by:-303Estimated root rate (ln s02)Simulated trend (ms2)-404Simulated rate variance (ss22)036-404Figure 1D.4 Relationship between simulated and estimated branchwise rate parameters (ln σ 2) un-
der different prior settings, with tight priors being the most informative and loose priors the least.
For each simulation and posterior sample, branchwise rates were first centered by subtracting their
mean. We estimated centered branchwise rates by taking the median of the centered posterior
samples. The solid line represents the position of the true centered branchwise rates, while the
shallower, dashed line represents the observed line of best fit for the data under each prior setting.
Note that tighter, more informative priors result in shallower best fit lines due to increased shrink-
age of branchwise rate estimates.

Table 1D.1 Median absolute errors of rate variance posteriors (i.e., median absolute difference
between posterior samples and their true, simulated values, a measure of posterior distribution
accuracy), averaged across replicates for each simulated trait evolution scenario and prior settings.
σ 2
σ 2 and µσ 2 indicate the true, simulated values of rate variance and trend parameters, respectively,
while σ 2

0 prior shifts refer to alteration of the root rate prior location.

σ 2
σ 2 =

tight priors
3

6

0

default priors
6
3
0

loose priors
3
0

6

µσ 2 = -4
0
4

0.46
0.48
0.52

1.58
1.70
1.62

4.14
3.07
3.39

µσ 2 = -4
0
4

0.43
0.45
0.51

1.52
1.71
1.63

3.98
3.04
3.50

µσ 2 = -4
0
4

0.40
0.47
0.52

1.52
1.74
1.65

4.14
3.10
3.73

σ 2
0 prior shifted by -3
3.49
0.70 1.54
1.67 2.81
0.84
1.72 2.79
0.82

σ 2
0 prior shifted by 0
0.68 1.53
3.51
1.65 2.80
0.81
1.72 2.88
0.83

σ 2
0 prior shifted by 3
0.69 1.53
3.66
1.69 2.73
0.84
1.72 2.82
0.83

0.79
0.94
0.93

2.30
2.21
2.27

3.97
3.40
3.08

0.79
0.95
0.94

2.33
2.20
2.34

4.04
3.34
3.00

0.79
0.96
0.94

2.28
2.20
2.23

3.95
3.46
3.09

76

-8-6-4-202468Estimated branchwise rate(centered ln s2)-8-6-4-202468Tight-8-6-4-202468Prior settingDefaultSimulated branchwise rate (centered ln s2)-8-6-4-202468Truth  Best FitLooseTable 1D.2 Median absolute errors of trend posteriors (i.e., median absolute difference between
posterior samples and their true, simulated values, a measure of posterior distribution accuracy),
averaged across replicates for each simulated trait evolution scenario and prior settings. σ 2
σ 2 and
µσ 2 indicate the true, simulated values of rate variance and trend parameters, respectively, while
σ 2
0 prior shifts refer to alteration of the root rate prior location.

σ 2
σ 2 =

tight priors
3

6

0

default priors
6
3
0

loose priors
3
0

6

µσ 2 = -4
0
4

2.03
1.28
1.34

2.34
1.32
2.22

2.68
1.07
2.18

µσ 2 = -4
0
4

1.63
0.91
2.06

1.88
1.07
3.04

2.16
0.94
2.97

µσ 2 = -4
0
4

1.28
0.88
2.94

1.50
1.16
3.91

1.69
1.12
3.77

σ 2
0 prior shifted by -3
1.32 1.57
1.65
2.22 2.11
1.55
2.33 2.78
2.75

σ 2
0 prior shifted by 0
1.30 1.60
1.61
2.21 2.05
1.54
2.30 2.77
2.61

σ 2
0 prior shifted by 3
1.62
1.32 1.58
2.14 1.98
1.51
2.35 2.68
2.51

1.35
1.65
4.24

1.64
2.45
2.88

1.64
2.53
3.86

1.32
1.64
4.07

1.61
2.43
2.85

1.66
2.51
3.82

1.34
1.64
4.15

1.62
2.50
2.78

1.62
2.50
3.68

Figure 1D.5 Power and error rates for the rate variance parameter (σ 2
σ 2). Lines depict changes in
the proportion of model fits that correctly showed evidence for rate variance significantly greater
than 0 (i.e., power, in black) and incorrectly showed evidence (i.e., error, in light red) as a function
of prior settings, with tight priors being the most informative and loose priors the least. Results are
also shown for fits with the location of the root rate (σ 2
0 ) prior shifted by -3 (solid lines), 0 (dashed
lines), and 3 (dotted lines) from the default setting.

77

Proportion of fits withrate variance (ss22 > 0)01Decreasing (ms2 = -4)tightdefaultlooseSimulated trendNone (ms2 = 0)Prior settingstightdefaultlooseIncreasing (ms2 = 4)Power  Error-3    0    3Root rate (s02) prior shifted by:tightdefaultlooseTable 1D.3 Median absolute errors of branchwise rate posteriors (i.e., median absolute difference
between posterior samples and their true, simulated values, a measure of posterior distribution
accuracy), averaged across replicates for each simulated trait evolution scenario and prior settings.
σ 2
σ 2 and µσ 2 indicate the true, simulated values of rate variance and trend parameters, respectively,
while σ 2

0 prior shifts refer to alteration of the root rate prior location.

σ 2
σ 2 =

tight priors
3

6

0

default priors
6
3
0

loose priors
3
0

6

µσ 2 = -4
0
4

0.53
0.44
0.46

0.87
0.76
0.83

0.98
0.94
0.93

µσ 2 = -4
0
4

0.47
0.40
0.52

0.83
0.73
0.88

0.94
0.95
0.99

µσ 2 = -4
0
4

0.43
0.39
0.61

0.79
0.73
0.95

0.92
0.97
1.06

σ 2
0 prior shifted by -3
0.90
0.48 0.83
0.83 1.01
0.52
0.87 1.02
0.64

σ 2
0 prior shifted by 0
0.48 0.83
0.90
0.82 1.01
0.51
0.87 1.01
0.63

σ 2
0 prior shifted by 3
0.48 0.82
0.90
0.82 1.01
0.51
0.87 1.00
0.62

0.50
0.54
0.82

0.86
0.87
0.95

0.91
1.07
1.16

0.49
0.53
0.80

0.86
0.87
0.95

0.91
1.07
1.16

0.50
0.54
0.81

0.86
0.88
0.94

0.91
1.06
1.14

78

Table 1D.4 Median absolute errors of root rate posteriors (i.e., median absolute difference between
posterior samples and their true, simulated values, a measure of posterior distribution accuracy),
averaged across replicates for each simulated trait evolution scenario and prior settings. σ 2
σ 2 and
µσ 2 indicate the true, simulated values of rate variance and trend parameters, respectively, while
σ 2
0 prior shifts refer to alteration of the root rate prior location.

σ 2
σ 2 =

tight priors
3

6

0

default priors
6
3
0

loose priors
3
0

6

µσ 2 = -4
0
4

1.78
1.18
1.08

2.54
1.24
1.36

1.97
1.18
2.16

µσ 2 = -4
0
4

1.38
0.82
1.71

1.88
0.82
2.17

1.40
1.18
3.05

µσ 2 = -4
0
4

1.02
0.79
2.54

1.33
0.88
3.07

1.09
1.64
4.15

σ 2
0 prior shifted by -3
1.39
1.06 1.41
1.66 1.82
1.38
1.83 2.49
2.49

σ 2
0 prior shifted by 0
1.04 1.41
1.36
1.64 1.79
1.34
1.81 2.50
2.39

σ 2
0 prior shifted by 3
1.06 1.38
1.39
1.56 1.76
1.33
1.83 2.41
2.27

1.07
1.45
3.87

1.45
1.84
2.39

1.40
2.20
3.43

1.06
1.45
3.73

1.41
1.84
2.33

1.41
2.17
3.36

1.06
1.44
3.80

1.41
1.87
2.27

1.38
2.21
3.27

79

Table 1D.5 Breadths of rate variance posteriors (i.e., the difference between the 97.5% and 2.5%
quantiles of posterior samples, a measure of posterior distribution precision), averaged across repli-
cates for each simulated trait evolution scenario and prior settings. σ 2
σ 2 and µσ 2 indicate the true,
simulated values of rate variance and trend parameters, respectively, while σ 2
0 prior shifts refer to
alteration of the root rate prior location.

σ 2
σ 2 =

tight priors
3

6

0

default priors
3

6

0

loose priors
3

0

6

µσ 2 = -4
0
4

2.31
2.42
2.40

8.74
6.13
7.14

11.61
10.76
11.94

µσ 2 = -4
0
4

2.20
2.23
2.40

7.73
6.34
7.04

11.23
10.18
11.75

µσ 2 = -4
0
4

2.10
2.26
2.50

7.82
6.10
6.90

11.03
10.49
12.57

σ 2
0 prior shifted by -3
4.86
13.13
10.48
3.83
12.73 5.33
8.76
4.24
3.89
13.33 4.81
9.60
σ 2
0 prior shifted by 0
13.05
10.45
3.84
4.60
13.17 5.14
8.42
4.21
3.93
13.65 4.78
9.44
σ 2
0 prior shifted by 3
12.66
10.31
3.82
4.74
12.68 5.21
8.34
4.02
13.47 4.90
9.28
4.04

14.17
11.29
12.18

15.74
16.48
16.82

14.78
11.20
12.35

15.98
16.64
16.75

14.44
11.50
12.18

16.06
16.57
17.02

80

Table 1D.6 Breadths of trend posteriors (i.e., the difference between the 97.5% and 2.5% quantiles
of posterior samples, a measure of posterior distribution precision), averaged across replicates for
each simulated trait evolution scenario and prior settings. σ 2
σ 2 and µσ 2 indicate the true, simulated
values of rate variance and trend parameters, respectively, while σ 2
0 prior shifts refer to alteration
of the root rate prior location.

σ 2
σ 2 =

tight priors
3

6

0

default priors
3

0

6

loose priors
3

0

6

σ 2
0 prior shifted by -3

µσ 2 = -4
0
4

3.62
4.42
4.90

4.34
4.95
5.19

4.61
4.80
5.38

µσ 2 = -4
0
4

3.63
4.23
4.64

4.40
4.84
5.00

4.64
4.76
5.18

µσ 2 = -4
0
4

3.64
4.22
4.63

4.37
4.66
4.90

4.65
4.66
5.20

6.68
6.18
8.67
8.60
10.74 12.73

4.87
6.77
12.14
σ 2
0 prior shifted by 0
6.73
6.25
8.46
8.52
10.51 12.32

4.81
6.77
11.57
σ 2
0 prior shifted by 3
6.23
6.69
8.52
8.23
10.28 11.82

4.85
6.81
11.56

4.99
7.53
21.26

7.08
6.70
9.72
10.23
15.61 21.57

4.97
7.44
19.99

7.04
6.72
9.90
10.47
15.26 20.95

4.90
7.36
19.62

6.64
6.94
10.13 10.61
15.68 19.45

81

Table 1D.7 Breadths of branchwise rate posteriors (i.e., the difference between the 97.5% and
2.5% quantiles of posterior samples, a measure of posterior distribution precision), averaged across
replicates for each simulated trait evolution scenario and prior settings. σ 2
σ 2 and µσ 2 indicate the
true, simulated values of rate variance and trend parameters, respectively, while σ 2
0 prior shifts
refer to alteration of the root rate prior location.

σ 2
σ 2 =

tight priors
3

6

0

default priors
6
3
0

loose priors
3
0

6

µσ 2 = -4
0
4

2.01
2.05
2.15

3.06
2.73
2.91

3.21
3.37
3.39

µσ 2 = -4
0
4

1.98
2.04
2.14

3.00
2.72
2.89

3.18
3.33
3.37

µσ 2 = -4
0
4

1.97
2.04
2.15

2.98
2.70
2.90

3.17
3.36
3.41

σ 2
0 prior shifted by -3
3.49
2.33 3.36
3.28 3.85
2.52
3.59 4.14
3.10

σ 2
0 prior shifted by 0
2.33 3.36
3.49
3.26 3.83
2.51
3.57 4.10
3.03

σ 2
0 prior shifted by 3
2.33 3.36
3.49
3.22 3.83
2.49
3.53 4.06
3.03

2.41
2.65
4.32

3.61
3.52
4.25

3.67
4.20
5.26

2.41
2.66
4.11

3.61
3.54
4.24

3.66
4.21
5.23

2.41
2.63
4.07

3.61
3.57
4.26

3.66
4.21
5.03

82

Table 1D.8 Breadths of root rate posteriors (i.e., the difference between the 97.5% and 2.5% quan-
tiles of posterior samples, a measure of posterior distribution precision), averaged across replicates
for each simulated trait evolution scenario and prior settings. σ 2
σ 2 and µσ 2 indicate the true, sim-
ulated values of rate variance and trend parameters, respectively, while σ 2
0 prior shifts refer to
alteration of the root rate prior location.

σ 2
σ 2 =

tight priors
3

6

0

default priors
3

0

6

loose priors
3

0

6

σ 2
0 prior shifted by -3

µσ 2 = -4
0
4

2.91
3.86
4.33

3.76
4.21
4.56

3.85
4.34
4.83

µσ 2 = -4
0
4

2.92
3.71
4.17

3.65
4.10
4.43

3.84
4.20
4.78

5.10
7.34
9.50

4.03
5.94
10.91
σ 2
0 prior shifted by 0

5.54
7.55
11.60

5.22
7.21
9.38

4.03
5.98
10.40
σ 2
0 prior shifted by 3

5.51
7.37
11.22

4.13
6.52
19.65

5.69
8.47
14.21

5.86
9.18
19.87

4.17
6.61
18.51

5.52
8.51
13.81

5.80
9.39
19.42

µσ 2 = -4
0
4

3.01
3.73
4.13

3.63
4.09
4.41

3.94
4.29
4.80

4.06
5.87
10.25

5.14
7.10
9.10

5.59
7.40
10.72

4.12
6.41
18.05

5.66
8.71
14.09

5.76
9.39
17.82

83

Table 1D.9 Coverage of rate variance posteriors (i.e., proportion of times the true, simulated value
is greater than the 2.5% posterior distribution quantile and less than the 97.5% quantile) for each
simulated trait evolution scenario and prior settings. σ 2
σ 2 and µσ 2 indicate the true, simulated
values of rate variance and trend parameters, respectively, while σ 2
0 prior shifts refer to alteration
of the root rate prior location.

σ 2
σ 2 =

tight priors
6
3

0

default priors
6
3
0

loose priors
3
0

6

σ 2
0 prior shifted by -3

µσ 2 = -4 — 1.00
0 — 0.90
4 — 1.00

0.70 — 1.00
0.80 — 1.00
0.90 — 1.00

0.90
1.00 — 1.00
1.00 — 1.00 1.00
1.00 — 1.00 0.90

σ 2
0 prior shifted by 0

µσ 2 = -4 — 1.00
0 — 0.90
4 — 1.00

0.70 — 1.00
0.70 — 1.00
0.80 — 1.00

1.00 — 1.00
0.90
1.00 — 1.00 1.00
0.90 — 1.00 0.90

σ 2
0 prior shifted by 3

µσ 2 = -4 — 1.00
0 — 0.90
4 — 1.00

0.60 — 1.00
0.70 — 1.00
0.80 — 1.00

1.00 — 1.00
0.90
1.00 — 1.00 1.00
1.00 — 1.00 0.90

84

Table 1D.10 Coverage of trend posteriors (i.e., proportion of times the true, simulated value is
greater than the 2.5% posterior distribution quantile and less than the 97.5% quantile) for each
simulated trait evolution scenario and prior settings. σ 2
σ 2 and µσ 2 indicate the true, simulated
values of rate variance and trend parameters, respectively, while σ 2
0 prior shifts refer to alteration
of the root rate prior location.

σ 2
σ 2 =

tight priors
3

6

0

default priors
6
3
0

loose priors
3
0

6

µσ 2 = -4
0
4

0.50
1.00
1.00

0.40
1.00
0.70

0.40
1.00
0.80

µσ 2 = -4
0
4

0.50
1.00
0.80

0.70
1.00
0.10

0.60
1.00
0.30

µσ 2 = -4
0
4

0.70
1.00
0.10

0.70
1.00
0.00

0.80
1.00
0.10

σ 2
0 prior shifted by -3
1.00
1.00 0.90
1.00 0.90
1.00
1.00 1.00
1.00

σ 2
0 prior shifted by 0
1.00 1.00
1.00
1.00 0.90
1.00
1.00 1.00
1.00

σ 2
0 prior shifted by 3
1.00 0.90
1.00
0.90 0.90
1.00
1.00 1.00
1.00

1.00
1.00
0.90

1.00
1.00
1.00

1.00
0.90
0.90

1.00
1.00
1.00

1.00
0.90
1.00

1.00
0.90
0.90

1.00
1.00
1.00

1.00
0.90
1.00

1.00
0.90
0.90

85

Table 1D.11 Coverage of branchwise rate posteriors (i.e., proportion of times the true, simulated
value is greater than the 2.5% posterior distribution quantile and less than the 97.5% quantile) for
each simulated trait evolution scenario and prior settings. σ 2
σ 2 and µσ 2 indicate the true, simulated
values of rate variance and trend parameters, respectively, while σ 2
0 prior shifts refer to alteration
of the root rate prior location.

σ 2
σ 2 =

tight priors
3

6

0

default priors
6
3
0

loose priors
3
0

6

µσ 2 = -4
0
4

0.91
0.99
1.00

0.89
0.93
0.91

0.86
0.91
0.92

µσ 2 = -4
0
4

0.94
1.00
0.97

0.91
0.93
0.87

0.87
0.90
0.89

µσ 2 = -4
0
4

0.97
1.00
0.89

0.93
0.94
0.83

0.88
0.89
0.86

σ 2
0 prior shifted by -3
0.94
0.98 0.95
0.96 0.94
1.00
0.96 0.96
0.99

σ 2
0 prior shifted by 0
0.98 0.95
0.94
0.96 0.93
1.00
0.96 0.95
0.99

σ 2
0 prior shifted by 3
0.98 0.95
0.94
0.97 0.94
1.00
0.96 0.95
0.99

0.98
1.00
0.97

0.96
0.97
0.97

0.95
0.94
0.95

0.98
1.00
0.98

0.96
0.97
0.97

0.95
0.95
0.96

0.98
1.00
0.98

0.96
0.97
0.97

0.95
0.95
0.96

86

Table 1D.12 Coverage of root rate posteriors (i.e., proportion of times the true, simulated value
is greater than the 2.5% posterior distribution quantile and less than the 97.5% quantile) for each
simulated trait evolution scenario and prior settings. σ 2
σ 2 and µσ 2 indicate the true, simulated
values of rate variance and trend parameters, respectively, while σ 2
0 prior shifts refer to alteration
of the root rate prior location.

σ 2
σ 2 =

tight priors
3

6

0

default priors
6
3
0

loose priors
3
0

6

µσ 2 = -4
0
4

0.40
0.90
1.00

0.30
0.90
1.00

0.60
1.00
0.60

µσ 2 = -4
0
4

0.60
1.00
0.80

0.60
1.00
0.40

0.80
1.00
0.60

µσ 2 = -4
0
4

0.90
1.00
0.10

0.80
1.00
0.10

0.90
0.60
0.10

σ 2
0 prior shifted by -3
1.00
1.00 0.80
1.00 1.00
1.00
1.00 1.00
1.00

σ 2
0 prior shifted by 0
1.00 0.80
1.00
1.00 1.00
1.00
1.00 1.00
1.00

σ 2
0 prior shifted by 3
1.00 0.80
1.00
1.00 1.00
1.00
1.00 1.00
1.00

0.90
1.00
1.00

0.90
1.00
1.00

1.00
0.90
1.00

1.00
1.00
1.00

1.00
1.00
1.00

1.00
1.00
1.00

1.00
1.00
1.00

0.80
1.00
1.00

1.00
1.00
1.00

87

Figure 1D.6 Power and error rates for the trend parameter (µσ 2). Lines depict changes in the
proportion of model fits that correctly showed evidence for trends significantly less and greater
than 0 (i.e., power, in black) and incorrectly showed evidence (i.e., error, in light red) as a function
of prior settings, with tight priors being the most informative and loose priors the least. Results are
also shown for fits with the location of the root rate (σ 2
0 ) prior shifted by -3 (solid lines), 0 (dashed
lines), and 3 (dotted lines) from the default setting.

Figure 1D.7 Power and error rates for branchwise rate parameters (ln σ 2) under different prior
settings. Lines depict changes in proportions of branchwise rates considered anomalously slow
(in dark blue) or fast (in light red) as a function of simulated rate deviations (ln σ 2
dev). These
results combine all fits to simulated data that detected rate variance (σ 2
σ 2) significantly greater than
0. The proportions are equivalent to power when the detected rate deviation is of the same sign
as the true, simulated deviation (left of 0 for anomalously slow rates in dark blue and right for
anomalously fast rates in light red), and to error rate when the detected and true rate deviations are
of opposite signs. Here, significant rate deviations for simulated rate deviations that are exactly 0
are considered errors regardless of sign.

88

Proportion of fits withdecreasing trend (ms2 < 0)01None (ss22 = 0)Proportion of fits withincreasing trend (ms2 > 0)01tightdefaultlooseSimulated rate varianceModerate (ss22 = 3)Prior settingstightdefaultloosePower  Error-3    0    3Root rate (s02) prior shifted by:High (ss22 = 6)tightdefaultloose01Proportion of significantrate deviations (ln sdev2¹0)Tight-8-6-4-202468Simulated rate deviation ( ln sdev2)Prior settingDefault-8-6-4-202468sdev2<0   sdev2>0Loose-8-6-4-202468CHAPTER 2

STOCHASTIC CHARACTER MAPPING OF CONTINUOUS TRAITS ON
PHYLOGENIES

2.1

Introduction

A central challenge in macroevolutionary biology is inferring how phenotypes evolve from lim-

ited samples of living and/or fossilized organisms. Accordingly, evolutionary biologists have long

practiced “ancestral state reconstruction" (ASR)–estimating the unobserved phenotypes of (typi-

cally extinct) lineages based on observed phenotypes in their relatives (Dobzhansky and Sturte-

vant, 1938; Sanger et al., 1955; Pauling et al., 1963; Witmer, 1995; Schultz et al., 1996; Sumrall

and Brochu, 2008). With the development of phylogenetic comparative methods that provide a

rigorous statistical framework for performing ASR under various trait evolution models, ASR has

become ubiquitous in macroevolutionary research (Schluter et al., 1997; Groussin et al., 2016;

Joy et al., 2016). One technique for performing ASR, called stochastic character mapping (or

“simmapping”), is particularly popular, allowing researchers to randomly sample evolutionary

histories (often termed “simmaps”) of a character on a phylogeny according to the probability

of a given reconstruction under some trait evolution model (Nielsen, 2002; Huelsenbeck et al.,

2003; Bollback, 2006). By sampling hundreds or thousands of simmaps, researchers can generate

distributions of possible evolutionary histories which may be used in various macroevolutionary

analyses–for example, determining the timing/frequency of evolutionary events (e.g., Baliga and

Law, 2016; Tornabene et al., 2016; Freyman and Höhna, 2019; Hughes et al., 2021; Landis et al.,

2021; Siqueira et al., 2023) or how past life history/environmental factors affected evolution (e.g.,

de Alencar et al., 2017; Borstein et al., 2019; Burns and Bloom, 2020; Fabre et al., 2020; Rincon-

Sandoval et al., 2020; Drury et al., 2021; Nations et al., 2021; Friedman and Muñoz, 2023). In this

way, simmaps allow researchers to flexibly conduct analyses over a range of possible histories and

effectively account for the inherently incomplete and uncertain knowledge of the evolutionary past

in most macroevolutionary studies.

Unfortunately, while simmaps have revolutionized statistical pipelines for studying macroevo-

89

lution, current simmapping implementations are limited to sampling histories of discrete variables.

The lack of simmapping methods for continuous variables constrains approaches for analyzing

macroevolutionary data and leaves a conspicuous methodological gap in the field. For example,

testing whether the tempo and/or mode of trait evolution varies according to some “explanatory

factor” like habitat or diet is generally straight-forward provided the factor’s entire evolution-

ary history is explicitly known (e.g., Revell, 2013; Clavel et al., 2015; Beaulieu and O’Meara,

2023). When a factor’s history is unknown (as is generally the case), researchers are currently

unable to generate simmaps of continuous factor histories to use in their analyses like they would

for discrete factors. Consequently, testing how continuously-varying factors like body size (e.g.,

Friedman et al., 2019), generation time (e.g., Gingerich, 2001), or climatic niche (e.g., Tribble

et al., 2023) affect trait evolution processes is much more difficult, often requiring the develop-

ment of novel methods and/or analysis pipelines tailored for testing specific hypotheses (Cooper

and Purvis, 2009; Hansen et al., 2008; Felsenstein, 2012; Baker et al., 2015; Weir and Lawson,

2015; Clavel and Morlon, 2017; Hansen et al., 2022; Boyko et al., 2023a; Tribble et al., 2023;

Uyeda et al., 2021).

Here, we introduce an efficient method for simmapping continuous variables under Brownian

motion models of evolution. These continuous simmaps, or “contsimmaps” for short, may be used

to directly visualize and analyze the dynamics of phenotypic evolution inferred under trait evolu-

tion models, determine the timing and phylogenetic locations of major evolutionary transitions in

continuous phenotypes while incorporating uncertainty, and explore how continuous factors may

have affected evolutionary processes. First, we present an algorithm for generating contsimmaps

by sampling values of continuous variables at arbitrary points on a phylogeny conditional on ob-

served values under a given Brownian motion model. Second, to showcase potential uses of con-

tsimmaps, we outline a general approach for inferring relationships between aspects of continuous

trait evolution processes and continuous factors based on contsimmapped factor histories. Using

an extensive simulation study, we verify the proposed approach’s ability to accurately infer rela-

tionships between continuous factors and rates of trait evolution. We go on to apply this pipeline

90

to an empirical case study, asking whether divergence in plant height is associated with variation

in rates of leaf and flower trait evolution in a clade of Eucalyptus trees that range from ∼1 to 100

meters tall (Brooker et al., 2015; Thornhill et al., 2019; Falster et al., 2021).

2.2 Materials and Methods

We designed contsimmapping to be a flexible framework for sampling evolutionary histories

of continuous variables on a phylogeny under Brownian motion models, analogous to conven-

tional discrete simmapping methods that sample evolutionary histories of discrete variables under

continuous time Markov chain models. An important difference between contsimmaps and con-

ventional simmaps is how evolutionary histories are represented. Conventional simmaps assign

parts of a phylogeny to regimes representing different values of the discrete variable, providing

reconstructed evolutionary histories in continuous time. This is impractical for continuous vari-

ables evolving under Brownian motion models, which have infinitely many possible values and

constantly change over time. Accordingly, contsimmaps instead sample values of continuous vari-

ables at a finite number of time points evenly distributed across a phylogeny, with the number

of time points controlled by a user-specified “resolution”. We implemented this framework in an

R package called contsimmap, which supports contsimmaping under a flexible Brownian motion

modeling framework capable of accommodating multiple correlated variables, multiple measure-

ments per tip and/or internal node with associated intraspecific variation/measurement error, miss-

ing measurements, and multiple evolutionary trends/rates that differ according to regimes mapped

onto a phylogeny. The R package additionally provides tools for transforming, summarizing, and

visualizing contsimmaps (Fig. 2.1), as well as fitting Brownian motion models with parameters

dependent on contsimmaped variables.

2.2.1 Generating Contsimmaps

Here we present a general algorithm for sampling phenotypic values across a dense set of

time points on a phylogeny given measurements associated with its nodes, “node error” variances,

and a Brownian motion model with potentially regime-dependent evolutionary trends and rates

(notably, such regimes must be mapped onto the phylogeny a priori). Assume we have e edges in

91

Figure 2.1 Phylogram and phenogram-based visualizations of contsimmaps, colored accord-
ing to reconstructed phenotypic values. Left: conventional ancestral state reconstructions of a
continuously-varying phenotype under a Brownian motion model. The top phylogram is colored
according to mean phenotypic estimates, which are also depicted in the phylogram below along
with error bars representing 95% confidence intervals on the estimates at each node. Middle: a
single contsimmap generated using the same data and Brownian motion model. Right: a sample of
25 contsimmaps representing the overall distribution of generated contsimmaps. The bottom mid-
dle and right phylograms also include conventional ancestral state reconstructions for reference.

our phylogeny (including a 0-length root edge indexed 0) with s mapped regimes and data on m

continuous phenotypic variables. Let τ be an e-length vector of edge lengths assumed to be in units

of time. To outline our algorithm, it is useful to focus on the quantities associated with an individual

edge denoted i. First, we define two sets of time points along edge i: ni “interpolant” points at

equally-spaced times ti, excluding times corresponding to i’s ancestral/descendant nodes, and n∗
i

“critical” points at (strictly increasing) times t∗

i , including times corresponding to i’s descendant

node (the ancestral node is excluded because it is equivalent to the descendant node of the edge

ancestral to i) as well as any regime shifts along i. The number of interpolant points, ni, is chosen

by rounding τiξ

T − 1 to the next largest integer, where T represents the total height of the phylogeny

and ξ a user-specified resolution. Specifically, ξ corresponds to approximate number of time

92

points across the height of a phylogeny by defining a “target” time interval for all edges given by

T

ξ . Now let Xi and X ∗
time point along i. Further, let r∗

i be ni × m and n∗

i be an n∗

i × m matrices, respectively, of phenotypic values at each

i -length vector of regimes associated with the preceding

critical time interval for each entry of t∗

i . Edge i may also have measurements associated with its

descendant node: let Yi be an oi × m matrix of these measurements (in the case that there are no

measurements, oi = 0).

Our algorithm samples values of X and X ∗ given the phenotypic measurements/regime map-

pings for each edge described above (Y and r, respectively), e m × m variance-covariance matrices

describing error in measurements at descendant nodes for each edge (Γ, commonly termed “node

error”), s m-length vectors describing deterministic changes in phenotypes per unit time for each

regime (µ, commonly termed “evolutionary trends”), and s m × m matrices describing stochastic

changes in phenotypes per unit time for each regime (Σ, commonly termed “evolutionary rate ma-

trices”). To do this, we additionally keep track of the n∗

i variance-covariance matrices for each edge

i, denoted V ∗

i , which describe the uncertainty of phenotypic estimates at critical time points. Be-

low, we adopt a general notation of using subscripts to denote edge/regime indices first, followed

by time points next (if applicable), and lastly specific phenotypic variables. For example, X ∗

i refers

to the matrix of phenotypic values at critical time points along the ith edge, X ∗

i, j to its jth row

(corresponding to the jth entry of t∗

i or time t∗

i, j), and X ∗

i, j,k to kth value in this row (corresponding

to the kth phenotypic variable). Similarly, µi would refer to trends in evolution for the ith regime

and µi,k to the trend for the kth phenotypic variable specifically.

Our algorithm, similarly to other rapid algorithms for ASR under Brownian motion models

(e.g., Goolsby, 2017; Hassler et al., 2022), consists of three main steps: 1) traversing the phylogeny

from tips to root (i.e., in postorder), 2) handling phenotypic values at the root, and 3) traversing

the phylogeny from root to tips (i.e., in preorder). At a high level, the first step calculates initial

estimates of mean phenotypic values and associated uncertainty at critical time points along all

edges (i.e., X ∗ and V ∗) based only on the descendants of each edge. Notably, because the root

of the phylogeny only has descendants by definition, this first step already returns final estimates

93

of mean phenotypic values at the root and associated uncertainty (i.e., X ∗

0,1 and V ∗

0,1). Thus, the

second step simply consists of sampling phenotypic values at the root based on the multivariate

normal distribution defined by X ∗

0,1. The most complex step is the third and final one,
which updates X ∗ and V ∗ based on the non-descendants of each edge, uses the updated X ∗ and V ∗

0,1 and V ∗

to sample phenotypic values at critical time points, and finally samples values at interpolant time

points conditional on the sampled values at critical time points. We describe these steps in more

detail below:

1) Complete a postorder traversal over all edges. For each edge denoted i with pi immediate

descendant edges:

1a) Initialize calculations for edge i by defining Z as an (oi + pi) × m matrix and setting the

first oi rows of Z to Yi. Similarly, define W as oi + pi m × m matrices and set the first

oi matrices to Γi. Any missing entries in Z are set to 0, and corresponding diagonal

and off-diagonal entries of the associated W matrices are set to ∞ and 0, respectively

(Hassler et al., 2022). Accordingly, if edge i is a tip with no associated measurements

(i.e., oi + pi = 0), create a “dummy observation” by setting oi to 1 and initializing Z as a

1 × m matrix of 0s and W as a single m × m diagonal matrix with ∞ along the diagonal.

Otherwise, if edge i has immediate descendant edges (i.e., not a tip with pi > 0), for

each descendant edge indexed l from 1 to pi:

Zoi+l = X ∗

dl,1 − µr∗

dl ,1

(t∗
dl,1 − t∗
i,n∗
i

)

Woi+l = V ∗

dl,1 + Σr∗

dl ,1

dl,1 − t∗
(t∗
i,n∗
i

)

(1)

(2)

Where Zl denotes the lth row of Z, Wl denotes the lth matrix of W , and d is a pi-length

vector of indices corresponding to the edges immediately descending from i.

1b) Calculate the uncertainty and mean of the phenotypic values at i’s descendant node:

94

V ∗
i,n∗
i

=

X ∗
i,n∗
i

= V ∗
i,n∗
i

W −1
l

(cid:32)oi+pi
∑
l=1
(cid:32)oi+pi
∑
l=1

(cid:33)−1

(cid:33)

ZlW −1
l

(3)

(4)

Where W −1

l

specifically denotes the “pseudo-inverse” of Wl (Hassler et al., 2022). No-

tably, Eq. (4) may be undefined when one or more measurements are missing or as-

sumed to be exact, which can cause entries of V ∗
i,n∗
i

and W −1, respectively, to be ∞.

The former case can be solved by simply defining ∞ ∗ 0 = 0 (Hassler et al., 2022). For

the latter case, we can generalize Eq. (4) by thinking of the expression as a weighted

average of phenotypic value vectors Zl with weights given by V ∗
i,n∗
i

W −1
l

. From this per-

spective, infinite entries along the diagonals of W −1 correspond to “infinitely large”

weights in this average. Accordingly, we define a “normalization” procedure for matri-

ces in W −1: let W −1

.,k,k refer to the diagonal entries in the kth row and column across all

W −1 matrices. Then, for each phenotypic variable k, check whether W −1

.,k,k contains ∞.

If it does, set finite and infinite elements of W −1

.,k,k to 0 and 1, respectively, and change
corresponding off-diagonal entries to 0. If we denote these normalized matrices ˜W −1,

we can write a more “robust” version of Eq. (4) which can handle exact trait measure-

ments:

X ∗
i,n∗
i

=

(cid:32)oi+ci
∑
l=1

˜W −1
l

(cid:33)−1 (cid:32)oi+ci
∑
l=1

(cid:33)

Zl ˜W −1
l

(5)

Intuitively, this means that inexact measurements are “superseded” by exact measure-

ments, which themselves are simply averaged together. We note that this solution

is more pragmatic than rigorous when there are two or more exact measurements–

paradoxically, averages of multiple exact measurements are still considered exact under

this framework. While conceptually problematic, situations where multiple measure-

ments are assumed to be exact may arise in practice due to estimation of 0-length edges

95

during phylogenetic inference and/or simplifying comparative analyses by assuming

no measurement error. In any case, the above expression allows our algorithm to yield

predictable, defined results in such cases.

1c) If edge i includes any regime shifts, calculate expected phenotypic values and uncer-

tainty at each preceding critical time point. For each critical point j from n∗

i − 1 to

1:

i, j = V ∗
V ∗

i, j+1 + Σr∗

i, j+1

i, j+1 − t∗
(t∗

i, j)

i, j = X ∗
X ∗

i, j+1 − µr∗

i, j+1

i, j+1 − t∗
(t∗

i, j)

(6)

(7)

2) Simulate phenotypic values at the root of the phylogeny by sampling new values for X ∗
0,1
0,1 and variance-covariance matrix V ∗
0,1.

from a multivariate normal distribution with mean X ∗

3) Complete the preorder traversal over all edges except the root (i = 0). For each critical time

point j along edge i:

3a) Initialize calculations by defining Z as an m-length row vector and W as a single m × m

matrix. Z and W represent counterparts to X ∗

i, j and V ∗

i, j. While X ∗

i, j and V ∗

i, j define the

distribution of phenotypic values at the jth critical time point along edge i based only on

all descendants of edge i, Z and W are based only on all non-descendants of i. Because

this is a preorder traversal, phenotypic values at the immediately previous critical time

point have already been simulated and Z and W are given by:

Z = X ∗

i, j−1 + µr∗

i, j

i, j − t∗
(t∗

i, j−1)

W = Σr∗
i, j

i, j − t∗
(t∗

i, j−1)

(8)

(9)

In the case j − 1 = 0, we must instead use the trait values and time associated with

the last critical time point along the edge directly ancestral to i, denoted a. Specifi-

96

cally, X ∗

i, j−1 and t∗

i, j−1 are substituted with X ∗

a,n∗
a

and t∗

a,n∗
a

, respectively, in the expression

above.

3b) Update the uncertainty and mean of the phenotypic values at the jth critical time point

along edge i:

V ∗
i, j =

(cid:16)
W −1 +V ∗ −1

i, j

(cid:17)−1

X ∗
i, j =

(cid:16)
˜W −1 + ˜V ∗ −1

i, j

(cid:17)−1 (cid:16)

Z ˜W −1 + X ∗

i, j ˜V ∗ −1
i, j

(cid:17)

(10)

(11)

Where ˜W −1 and ˜V ∗ −1

i, j

again denote normalized versions of these matrices. This nor-

malization procedure is identical to the one used for Eq.

(5), with W −1 and V ∗ −1

i, j

treated as a set of two matrices, with one key exception. Notably, W −1 will only con-

tain ∞ when one or more phenotypic variables evolve with a rate of 0 (i.e., at least

one diagonal entry of Σr∗
i, j

is 0). To keep resulting contsimmaps consistent with this

condition, in the case that W −1

k,k and (V ∗ −1

i, j

)k,k are both ∞, only W −1

k,k is set to 1 and

(V ∗ −1
i, j

)k,k is instead set to 0. This critically ensures that phenotypic variables with an

evolutionary rate of 0 only exhibit changes over time due to trends defined by µ.

3c) Simulate phenotypic values at the jth critical time point along edge i by replacing X ∗
i, j

with new values sampled from a multivariate normal distribution with mean X ∗

i, j and

variance-covariance matrix V ∗
i, j.

3d) Simulate phenotypic values at all interpolant time points along edge i lying between

critical time points j − 1 and j. For each interpolant time point l for which t∗

i, j−1 <

ti,l < t∗

i, j (if j − 1 = 0, swap t∗

i, j−1 for t∗

a,n∗
a

as above), sample Xi,l from a multivariate

normal distribution with mean Z and variance-covariance matrix W given by:

Z =

W =

(X ∗

i, j − Xi,l−1)(ti,l − ti,l−1)
t∗
i, j − ti,l−1
i, j − ti,l)(ti,l − ti,l−1)
t∗
i, j − ti,l−1

Σr∗
i, j

(t∗

+ Xi,l−1

(12)

(13)

97

In the case l − 1 = 0, we must instead use the phenotypic values and time associated

with the immediately previous critical time point j − 1. Specifically, Xi,l−1 and ti,l−1

are swapped with X ∗

i, j−1 and t∗

i, j−1, respectively, in the expressions above (if j − 1 = 0,

these are in turn swapped with X ∗

a,n∗
a

and t∗

a,n∗
a

, respectively, as above).

2.2.2 Using Contsimmaps to Analyze Factor-Dependent Trait Evolution

Simmaps are particularly useful for “sequential inference” pipelines, whereby the impact of

some “explanatory factor” on an evolutionary process is inferred by first generating simmaps of

factor histories, then fitting factor-dependent evolutionary models conditional on the simmaps. For

conventional discrete simmaps, arbitrary evolutionary models can be rendered factor-dependent

by simply allowing the parameters of the evolutionary model–such as evolutionary rates or trends

in the case of Brownian motion models–to vary among lineages in different discrete states (e.g.,

lineages in mountain versus lowland habitats, annual versus perennial plant lineages; see Rev-

ell, 2013 for further description of this approach). For contsimmaps, with an infinite continuum

of states, models must instead use “parameter functions” to map factor values to associated pa-

rameter values (see FitzJohn, 2010 for an example of this approach). For example, a researcher

could test whether increased temperatures are associated with accelerated body size evolution by

fitting a Brownian motion model assuming an exponential relationship, σ 2 = eβ0+β1T , between

contsimmapped thermal niche values (T ) and rates of change in body size over time (σ 2). Here,

β0 and β1 are free parameters inferred by fitting the model to data, with β1 directly quantifying the

relationship between temperature and rates.

We implemented tools in our R package for fitting factor-dependent Brownian motion models

to trait data conditional on samples of contsimmapped factor histories. Specifically, our implemen-

tation constructs a likelihood function for a given trait dataset based on a sample of n contsimmaps

and user-defined parameter functions that map contsimmapped values to evolutionary rates, trends,

and/or node error variances. Given estimates for the free parameters in all parameter functions,

the outputted likelihood function automatically transforms contsimmapped values into parameter

values and uses the pruning algorithm outlined by Hassler et al., 2022 to calculate likelihoods

98

conditional on each contsimmap. To derive a single overall likelihood, ˜L, from the n conditional

likelihoods, L, the likelihood function marginalizes over the contsimmaps by assuming either a

“flat” or “nuisance” prior. Under the flat prior, each contsimmap is assumed equally likely, with

the overall likelihood given by the average of all conditional likelihoods:

˜L =

1
n

n
∑
i=1

Li

(14)

Under the nuisance prior, the conditional likelihoods for each contsimmap are weighted by the

probability they gave rise to the trait data among all other contsimmaps and summed (often termed

the “Fitzjohn root prior” in the context of marginalizing over root states; see FitzJohn et al., 2009):

˜L =

n
∑
i=1

L2
i

1

n
∑
i=1

Li

(15)

Intuitively,

the nuisance prior allows the trait data and model to influence which con-

tsimmapped factor histories are considered most likely by taking a weighted average of the con-

ditional likelihoods, with higher weights assigned to contsimmaps that explain the observed data

relatively well under a given model. The rough contribution of each contsimmap’s conditional

likelihood to the overall likelihood, which we term “normalized conditional likelihoods”, can be

quantified as L/ ∑n

i=1 L and L2/ ∑n

i=1 L2 in the cases of flat and nuisance priors, respectively.

Notably, the conditional likelihoods under each contsimmap also depend on the trait values at

the root of the phylogeny, which are often inferred as additional free parameter in other phylo-

genetic comparative methods for modeling continuous trait evolution (e.g., Revell, 2012; Pennell

et al., 2014; Boucher et al., 2018). However, because conditional likelihoods under any single root

trait value will vary drastically across different contsimmaps (e.g., see Figure 4 in Boyko et al.,

2023a), we calculate conditional likelihoods for each contsimmap while marginalizing over root

trait values by assuming either a flat or nuisance prior over the root trait values as well. Note

that the likelihood of observed data under any (potentially multivariate) Brownian motion model

conditional on a vector of root trait values x is given by rΦ(x; ˆx,V ), where r is a proportionality

99

constant (called the “remainder” in Hassler et al., 2022), ˆx and V denote the expected root trait

values and associated uncertainty (in the form of a variance-covariance matrix), and Φ(x; µ, Σ)

represents the probability density function of a multivariate normal distribution with mean µ and

variance-covariance matrix Σ evaluated at x. Because the integral of any probability density func-

tion is 1 by definition, the integral of this expression is r–thus, the overall likelihood for a given

contsimmap is r under a flat prior. Under a nuisance prior, the overall likelihood is equal to the

proportionality constant r multiplied by the integral of the squared multivariate normal probability
density function (r/(cid:112)|V |(4π)k, where k denotes the number of traits/dimensions of the multivari-

ate normal distribution). Assuming a nuisance prior will deflate overall likelihoods conditional on

any given contsimmap when root trait values are highly uncertain (i.e., |V | is large) and vice versa.

To fit these factor-dependent Brownian motion models to data, optimization or Bayesian infer-

ence algorithms may be used to infer values of the free parameters defining parameter functions.

Our R package currently provides tools for using the C++ library NLOPT (Johnson, 2021), inter-

faced through the R package nloptr (Ypma et al., 2022), to find free parameter values that maximize

the likelihood of an outputted likelihood function. In general, we found that likelihood surfaces

under continuous factor-dependent Brownian motion models may be quite complex, exhibiting

multiple optima and/or relatively flat “ridges” that present challenges for numerical optimization.

While our implementation allows users to apply any algorithm available in the NLOPT library and

customize optimizer settings as they see fit, we developed a convenient default optimization pro-

cedure, consisting of three phases that leverage the complementary strengths/weaknesses of three

distinct algorithms. The initial “warmup” phase uses NLOPT’s gradient-based truncated New-

ton algorithm (Dembo and Steihaug, 1983) with preconditioning and random restarts (gradients

are stochastically approximated using finite differences; see APPENDIX 2B for details), which

rapidly improves model fit but often gets stuck on suboptimal peaks of the likelihood surface.

The subsequent “exploratory” phase uses NLOPT’s sbplx (based on subplex; see Rowan, 1990) to

search for higher-likelihood regions in the vicinity of this initial estimate. However, because the

sbplx algorithm tends to terminate in relatively flat areas and/or saddle points of the likelihood sur-

100

face rather than true maxima, a final “polish” phase uses NLOPT’s principal axis algorithm (Brent,

2013) to find a local maximum either within or close to the high-likelihood region. Ultimately,

this procedure appears to offer a good compromise between robustness and speed, finding and

converging on peaks associated with relatively high likelihoods in a practical amount of time. No-

tably, we found that the algorithms used in the warmup and polish phases do not work well in (and

are largely unnecessary for) searching low-dimensional parameter spaces, so our implementation

skips these phases by default when fitting models with two parameters or less.

2.2.3 Simulation Study

To assess the performance of our approach for inferring relationships between trait evolution

processes and continuous factors, we tested whether our method could reliably detect and quantify

factor-dependent rates of continuous trait evolution from simulated data. Broadly speaking, we

simulated the evolution of a single continuous trait, Y , with rates depending on a simulated con-

tinuous factor, either an “observed” factor Xo or “unobserved” factor Xh, and applied our approach

to the simulated data to infer relationships between rates of Y evolution and Xo. In an empirical

context, Xh represents an unobserved or “hidden” factor that nonetheless affected rates of trait evo-

lution and may thus mislead hypothesis testing for factor-dependent rates (May and Moore, 2020;

see also Beaulieu and O’Meara, 2016; Boyko and Beaulieu, 2023; Boyko et al., 2023b; Tribble

et al., 2023). After outlining our simulation procedure below, we outline our analysis procedure,

which includes a pragmatic technique for constructing additional null models that help account for

hidden factors and thereby mitigate their potential effects on hypothesis testing.

We allowed rates of Y evolution, σ 2, to vary with factor values (denoted X below) accord-

ing to one of four parameter functions: 1) a “simple” function whereby rates exponentially in-

crease/decrease, 2) a “threshold” function whereby rates shift between some minimum and maxi-

mum value, 3) a “sweetspot” function whereby rates peak or dip around a particular factor value,

and 4) a “null” function where rates stay constant. We define the simple function as:

σ 2 = eβ0+β1X

101

(16)

Where β0 and β1 represent the intercept and slope of the factor-rate relationship on a natural

logarithmic scale. This function provides a simple means to test hypotheses that only claim rates

tend to increase or decrease in association with some factor like temperature (e.g., Clavel and

Morlon, 2017; Slater et al., 2017) or generation time (e.g., Gingerich, 2001). However, it also

allows rates of trait evolution to become arbitrarily close to 0 and grow without bound, motivating

the threshold function, a logistic curve bounded between strictly positive minimum and maximum

rate values:

σ 2 = eα

(cid:18)

tanh δ

(cid:18)
2

(cid:16)
1 + e−2π

√

3 X−θ
eω

(cid:17)−1

(cid:19)

(cid:19)

− 1

+ 1

(17)

Here, α alters the overall scale of rates, specifically corresponding to the natural log of the rate

halfway between the minimum and maximum possible rates or “mid rate”, while θ determines

the factor value or “location” at which rates shift (i.e., the value of X at which σ 2 = eα ). We

denote δ , which controls the direction and magnitude of the rate shift, the “rate deviation”. The

fold-difference between minimum and maximum rates is explicitly given by e2|δ |, and positive and

negative values of δ yield upward and downward shifts with increasing factor values, respectively.

Lastly, ω adjusts the “width” of the shift, with rates roughly reaching their minimum/maximum

values at factor values of θ ± eω /2. Notably, both the simple and threshold functions only allow

rates to monotonically increase/decrease with increasing factor values, yet some empirical evi-

dence suggest evolutionary rates may exhibit more complex modal relationships with factors like

body size, peaking or dipping at intermediate factor values (Cooper and Purvis, 2009; FitzJohn

et al., 2009; Feldman et al., 2016; Amado et al., 2021). Thus, we defined the sweetspot function,

whereby factor-rate relationships follow a Gaussian curve of arbitrary height and orientation which

takes on strictly positive values:

σ 2 = eα

(cid:18)

tanh δ

(cid:18)

2e−18( X−θ

eω )2

(cid:19)

(cid:19)

− 1

+ 1

(18)

Where α, θ , δ , and ω largely have the same effects and interpretations as they do for the

threshold function. Now, however, if δ is positive, rates will peak to their maximum at θ and

102

roughly reach their minimum at factor values of θ ± eω /2. Conversely, if δ is negative, rates will

instead dip to their minimum at θ and roughly reach their maximum at θ ± eω /2. We use these

unconventional parameterizations of logistic and Gaussian curves to limit interdependence among

parameters in controlling the overall shape of parameter functions, which improves the behavior of

numerical optimization routines during model fitting, while also conveniently allowing parameters

to take on any value and still form valid factor-rate relationships whereby rates never take on

negative values (in practice, we still found it necessary to impose boundaries on parameters to

improve model fitting behavior; see below). Lastly, we defined the null parameter function as

σ 2 = eα for consistency with threshold and sweetspot functions.

For each simulation, we used phytools (Revell, 2012) to simulate an ultrametric, pure-birth

phylogeny of height 1 with either 50, 100, or 200 tips. To simulate the continuous factors Xo and

Xh, we generated two densely-sampled continuous factor histories (with a resolution or ξ value of

500) evolving under Brownian motion processes with root trait values of 0, no trends, and constant

rates of 4. For the trait Y , we simulated an additional Brownian motion process with starting value

0 and no trends, but with rates varying according one of 12 possible factor-rate relationships which

differed both in the overall magnitude (i.e., “strength”) of rate variation as well as how gradually

rates changed with respect to factor values (i.e., “width”). Specifically, we used one null parameter

function; six “strong” versions of the simple, threshold, or sweetspot functions depending on either

Xo or Xh whereby rates varied ∼ 20-fold between factor values of -2 and 2; three “weak” versions

of the simple, threshold, or sweetspot functions depending on Xo only whereby rates only varied

∼ 5-fold on the same interval; and two “wide” versions of the threshold or sweetspot functions

depending on Xo only whereby rates again varied ∼ 20-fold but between factor values of -4 and 4.

See Table 2A.1 for the specific parameter values used.

Notably, our approach assumes a given factor perfectly predicts rates of trait evolution, which

is unlikely for empirical data. To test the robustness of our approach to imperfect correspondences

between factors and rates, we additionally modified some simulations by adding “noise” to factor-

rate relationships. We added noise by multiplying rate values at each time point with a random

103

variable sampled from a gamma distribution with shape and rate t

ν , where t is the length of the

time interval preceding a time point. These multipliers represent the average value of a white noise

gamma process with mean 1 and unit variance ν over a time interval of length t. By multiplying the

rates over each time increment with these random variables, the simulated trait evolution process is

transformed from Brownian motion to variance-gamma (a type of Lévy or “pulsed” trait evolution

process explored in some prior work; see Landis et al., 2013; Landis and Schraiber, 2017). This

procedure ensures that random noise around rates tends to “average out” over sufficiently long

periods of time, such that rates converge to what would be expected under the simulated factor-

rate relationship given enough data. Thus, this kind of rate noise only weakens the factor-rate

relationship rather than completely altering it. For our simulations, we set ν to 0.05, corresponding

to rates ranging between roughly 10%-300% their expected value over a time interval spanning a

tenth of the phylogeny’s height, 40%-200% for a third of the height, and 60%-150% for the entire

height.

Ultimately, the three phylogeny sizes (50, 100, or 200 tips), 12 possible factor-rate relation-

ships, and presence/absence of noise around factor-rate relationships yield 3 × 12 × 2 = 72 sim-

ulation conditions. We repeated the simulation process 100 times for each condition, yielding a

grand total of 7,200 simulations. For the analysis procedure, we retained the phylogeny and tip

factor/trait values of Xo and Y for each simulation (assuming no node error in Xo/Y values to ren-

der the simulation study more manageable). We analyzed the factor/trait datasets by first fitting

Brownian motion models to the Xo data (assuming no node error or evolutionary trends) and using

the fitted model to generate 100 contsimmaps of Xo with a resolution or ξ value of 100. We then

fit four Brownian motion models to the Y data conditioned on these contsimmaps (again assuming

no node error or evolutionary trends)–three non-null models assuming rates depend on mapped Xo

values through either a simple, threshold, or sweetspot function with unknown free parameters,

plus a null model assuming an unknown, constant rate (given by the mid rate/α parameter). Trait

data simulated with rates varying due to noise and/or the hidden factor, Xh, are likely to exhibit a

poor fit to the null model and provide spurious support for non-null models that at least allow rates

104

to vary across the phylogeny (May and Moore, 2020; see also Beaulieu and O’Meara, 2016; Boyko

and Beaulieu, 2023; Boyko et al., 2023b; Tribble et al., 2023). Thus, we fit three additional null

Brownian motion models to the Y data assuming rates depend on a mapped “dummy factor” D,

which is simulated under the Brownian motion model fitted to the Xo data, through either a simple,

threshold, or sweetspot relationship. Because D exhibits the same evolutionary dynamics as Xo but

with random tip values, support for D-dependent over Xo-dependent models provide evidence that

rates of Y evolution vary, but in a way not necessarily related to the observed factor Xo (see also

Tribble et al., 2023, which employs a similar approach in testing for relationships between discrete

trait evolution and continuous factors).

We fit all Brownian motion models to simulated data by running the default optimization pro-

cedure (see previous section) 10 times from initial parameter values sampled from a uniform dis-

tribution spanning from -5 to 5 and retaining the inferred parameter values from whichever op-

timization run found the highest likelihood. To save on the time needed to run the simulation

study, we only allowed the warmup and exploratory/polish phases to run for a maximum of 1,000

and 10,000 iterations, respectively. We always assumed flat priors over root trait values. On the

other hand, we assumed nuisance priors over contsimmaps for D-dependent models and flat priors

for all other models. Intuitively, this forces non-null models to “fairly” integrate over probable

histories of observed factors while allowing null models to assign higher weights to simulated

dummy factor contsimmaps which are particularly likely to explain the trait data. Preliminary tests

of our approach showed that this “mixed prior” technique renders null models more competitive

with non-null models and greatly reduces false positive errors at a modest cost to statistical power.

Preliminary tests also demonstrated that the default optimization procedure sometimes gets stuck

in likelihood ridges when fitting threshold and sweetspot function-based models, which may col-

lapse to effectively constant rate models as the location (θ ) and/or width (ω) parameters become

arbitrarily small/large. To prevent optimization algorithms from spending too much time explor-

ing these regions of parameter space, we imposed boundaries on these parameters by defining a

“factor interval” spanning from min X − (max X − min X)/2 to max X + (max X − min X)/2, where

105

X represents whatever factor a given parameter function depends on (either the observed factor

Xo or dummy factor D depending on the model). We constrained the location of inferred shifts

and peaks/dips to lie within the factor interval and the corresponding widths (given by eω ) to be

between 1/100th and 5 times the range of the factor interval. We also limited the rate deviation

(δ ) parameter to be between -10 and 10, as rate deviations with an absolute value of 10 or greater

cause minimum and maximum rates to effectively equal 0 and 2eα (i.e., the maximum allowable

rate for a given mid rate or α value), respectively.

For each simulation, we selected the model with lowest Bayesian Information Criterion (BIC;

see Schwarz, 1978; Dziak et al., 2020) as the best-fitting model. We chose to use BIC over the more

widely used small sample size corrected Akaike Information Criterion (AICc) after preliminary in-

vestigations demonstrated that AICc-based model selection exhibits somewhat elevated error rates

among simulations with noisy rates on large phylogenies (Tables 2A.2 and 2A.3). BIC penalizes

model complexity more harshly than AICc–particularly for large sample sizes–and substantially

reduced the error rates of our method at little cost to overall power and accuracy. We calculated

error (“false positive”) and power (“true positive”) rates as the percent of simulations for which

Xo-dependent models were selected as the best-fitting model for simulations with constant/Xh-

dependent rates and Xo-dependent rates, respectively. Among the simulations yielding true positive

results, we additionally calculated “differentiation rates” as the percent for which the best-fitting

model assumed the same Xo-rate relationship used to simulate the data. To further explore how

accurately our approach can estimate the relationship between Xo and rates of Y evolution, we also

calculated BIC weights and generated model-averaged rate estimates across values of Xo for all

simulations with either constant or Xo-dependent rates, ignoring simulations with Xh-dependent

rates because they lack a straight-forward expected Xo-rate relationship to compare with rate esti-

mates. We measured the overall accuracy, bias, and precision of model-averaged rates estimates

across values of Xo by calculating the inverse of the median absolute fold difference between es-

timated and simulated rates (i.e., the median absolute difference between estimated and simulated

rates on the log scale, negated such that higher values correspond to increased accuracy), the per-

106

cent of estimated rates greater than simulated rates, and the fold difference between the 2.5% and

97.5% quantiles of estimated rates, respectively.

2.2.4 Empirical Example

We applied our approach for inferring continuous factor-dependent rates of trait evolution to

test whether rates of phenotypic evolution are associated with size in the Eucalyptus subgenus

Eucalyptus, sometimes called “Monocalypts”, a clade of ∼124 Australian woody plant species

ranging from shrubby “mallees” only reaching a few meters in height (e.g., E. acies Brooker, E.

cunninghamii Sweet, E. erectifolia Brooker & Hopper) to gargantuan trees growing to nearly 100

meters in height (e.g., E. obliqua L’Hér, E. regnans F.Muell.) (Ladiges et al., 2010; Brooker et al.,

2015; Thornhill et al., 2019; Falster et al., 2021; Nicolle, 2022). Body size tends to be associ-

ated with many aspects of life history, as larger organisms generally live in smaller populations

with slower generational turnover (Niklas and Enquist, 2001; White et al., 2007; Sibly and Brown,

2007; Adler et al., 2014; Salguero-Gómez et al., 2016; Bakewell et al., 2020), leading to the hy-

pothesis that evolutionary rates are slower among lineages of larger organisms compared to their

smaller relatives. Nonetheless, it remains unclear how rates of phenotypic evolution generally

scale with size in many groups, particularly plants (Cooper and Purvis, 2009; Baker et al., 2015;

Chira et al., 2018; Friedman et al., 2019; Zimova et al., 2023; see also Lanfear et al., 2013). We

obtained a phylogeny of 108 Monocalypt species by pruning the clade out from a recently inferred,

time-calibrated phylogeny of over 700 species in Eucalyptus and related genera (the “Maximum

Likelihood 1” tree in Thornhill et al., 2019). We used the online Eucalyptus encyclopedia EU-

CLID (Brooker et al., 2015) and the AusTraits database (Falster et al., 2021) to gather data on

10 continuous traits for these species, which we categorized into four modules: 1) juvenile leaves

consisting of juvenile leaf widths/lengths, 2) adult leaves consisting of petiole lengths and adult

leaf widths/lengths, 3) inflorescences consisting peduncle and inflorescence pedicel lengths, and 4)

flowers consisting of bud, fruit, and seed lengths. We used the same sources to also aggregate data

on height for these species, which acts as a proxy for size and plays the role of a factor in our anal-

yses. All of these traits exhibit substantial within-species variation. Accordingly, trait values for a

107

given species were generally reported as ranges rather than average values, though only maximum

values were reported for some 0-29 species depending on the trait (e.g., “Grows up to 8 meters in

height”). The only average trait values reported were bud lengths for 13 species. Ultimately, for

each trait, we obtained data for between 102 and 105 out of the 108 species in the phylogeny, with

full value ranges (i.e., both minimum and maximum values) known for 73-105 species.

We assumed trait values were log-normally distributed within species with provided minimum

and maximum trait values corresponding to the 2.5% and 97.5% quantiles of this distribution, such

that averages and associated intraspecific variances (on the natural log scale) are given by the mid-

points and 1/42 the squared differences, respectively, of log-transformed minimum and maximum

trait values. Notably, some value ranges for petiole, peduncle, and pedicel lengths had maximum

and/or minimum values of 0 in some species, which is undefined on a log scale. Because these

traits were measured to a precision of 1 mm (Brooker et al., 2015), we rounded up maximum trait

values of 0 to 0.05 mm (i.e., the average assuming a flat prior over measurements from 0 to 1 mm).

For minimum trait values of 0, we either rounded them up to 0.05 mm if paired with a non-zero

maximum value or treated them as missing if paired with a maximum value also equal to 0. For all

cases where only the maximum trait value for a given species was known, we developed a crude

but simple procedure for imputing minimum trait values to mitigate bias resulting from comparing

average trait values in some species to maximum trait values in others. Specifically, for each trait,

we regressed log-transformed minimum trait values on maximum values for all species with fully-

known value ranges, and used the fitted relationship to predict minimum trait values for species

with known maximum values only. For species with only average bud length measurements avail-

able, we simply log-transformed the given averages and treated associated intraspecifc variance as

an unknown parameter to be estimated during model fitting.

Ultimately, we generated 349 contsimmaps (resolution/ξ = 100) of Monocalypt height under an

evolving rates or “evorates” model whereby rates of Monocalypt height evolution were allowed to

gradually change over time and across lineages (Martin et al., 2023; see APPENDIX 2C for further

details). We elected to use this relatively complicated model of height evolution as a constant-rate

108

Brownian motion model exhibited a rather poor fit to our Monocalypt height data by comparison,

reflecting substantial variation in rates of height evolution among Monocalypts. We then generated

349 corresponding dummy factor contsimmaps by simulating trait evolution with starting values

and rates identical to those of the height contsimmaps. Using these contsimmaps, we fit seven mul-

tivariate Brownian motion models to each trait module–six models assuming rates depend either

on height or the dummy factor through a simple, threshold, or sweetspot relationship, plus a null

model assuming constant rates. All models allowed each trait to have independent mid rate (α)

or intercept (β0) parameters to account for overall differences in rates among traits within a given

module, though we assumed all other free parameters were constant across traits for statistical

tractability (i.e., a single parameter function “shape” for each module which is rescaled among dif-

ferent traits). We fixed node error variances based on calculated intraspecific variances (excepting

bud lengths in some species for which only average trait values were known). Because our data

were largely derived from minimum-maximum ranges within species, we lacked any meaningful

signal of trait correlations within species and thus set all intraspecific correlations among traits

to 0 in our models, though we did estimate evolutionary correlations among traits. We used the

correlation matrix transform outlined in Lewandowski et al., 2009 and Stan Development Team,

2019 to avoid issues with exploring correlation parameters that form invalid correlation matrices.

To fit these models, we used the same optimization procedure and parameter boundaries which

were used in the simulation study, but ran the optimization procedure 20 rather than 10 times.

2.3 Results

2.3.1 Simulation Study

To illustrate potential uses of contsimmaps, we performed an extensive simulation study to

investigate whether contsimmaps of continuous variables can be used to infer relationships be-

tween rates of continuous trait evolution and continuously-varying factors, in analogy with popu-

lar approaches for inferring relationships between rates and discretely-varying factors (e.g., Revell,

2013). Overall, our contsimmap-based pipeline exhibited appropriate error and modest power rates

when applied to simulated data (Tables 2.1 and 2.2; Fig. 2.2). Across all simulation conditions,

109

error rates remained consistent and conservative at around 2-4%, only exceeding 5% in 2 out of

the 24 null simulation conditions. On the other hand, power rates varied dramatically across sim-

ulation conditions, ranging from only ∼5-25% for simulations exhibiting subtler patterns of rate

variation (i.e., weak and wide factor-rate relationships) on phylogenies with 50 tips, to ∼75-90%

and ∼90-100% for those exhibiting strong rate variation on phylogenies with 100 and 200 tips,

respectively. Unsurprisingly, higher power rates were generally associated with more accurate, un-

biased, and precise model-averaged estimates of factor-rate relationships, while lower power rates

were associated with increased bias towards inference of overly conservative and “flat” relation-

ships (Fig. 2.3; Figs. 2A.1–2A.3). Differentiation rates–that is, the ability to distinguish among

different kinds of factor-rate relationships–largely depended on the kind of factor-rate relationship

used to simulate data. Specifically, our method could easily dinstiguish sweetspot (i.e., modal) and

especially simple (i.e., exponential) factor-rate relationships from other kinds of relationships, yet

often mistook threshold (i.e., logistic) relationships for either simple or sweetspot ones.

110

Table 2.1 Proportions of times the contsimmap-based pipeline for detecting continuous factor-dependent rates of trait evolution selected
a model assuming a given kind of factor-rate relationship (i.e., simple, threshold, or sweetspot) as the best-fitting one (based on having
the lowest Bayesian Information Criterion) across all simulation conditions without random variation around factor-rate relationships
(“noise”). All models assuming either constant or dummy factor-dependent rates were considered “null” models.

constant

hidden factor-dependent

simple

threshold

sweetspot

simple
strong weak

threshold
strong weak wide

sweetspot
strong weak wide

null
simple
threshold
sweetspot

null
simple
threshold
sweetspot

null
simple
threshold
sweetspot

0.98
0.00
0.02
0.00

0.98
0.02
0.00
0.00

0.98
0.02
0.00
0.00

1.00
0.00
0.00
0.00

0.98
0.01
0.00
0.01

0.96
0.01
0.00
0.03

0.99
0.01
0.00
0.00

1.00
0.00
0.00
0.00

0.98
0.01
0.00
0.01

50 tips

0.51
0.48
0.01
0.00

0.80
0.18
0.01
0.01

100 tips

0.11
0.87
0.02
0.00

0.56
0.44
0.00
0.00

200 tips

0.00
0.99
0.00
0.01

0.13
0.87
0.00
0.00

0.98
0.00
0.00
0.02

0.96
0.00
0.01
0.03

0.96
0.01
0.01
0.02

0.60
0.34
0.02
0.04

0.19
0.49
0.13
0.19

0.07
0.17
0.39
0.37

0.83
0.15
0.01
0.01

0.60
0.34
0.02
0.04

0.16
0.57
0.09
0.18

0.68
0.29
0.02
0.01

0.43
0.51
0.01
0.05

0.10
0.65
0.13
0.12

0.74
0.06
0.02
0.18

0.10
0.04
0.09
0.77

0.00
0.00
0.04
0.96

0.94
0.02
0.00
0.04

0.68
0.10
0.00
0.22

0.29
0.07
0.01
0.63

0.90
0.06
0.00
0.04

0.60
0.13
0.04
0.23

0.21
0.08
0.08
0.63

111

Table 2.2 Proportions of times the contsimmap-based pipeline for detecting continuous factor-dependent rates of trait evolution selected a
model assuming a given kind of factor-rate relationship (i.e., simple, threshold, or sweetspot) as the best-fitting one (based on having the
lowest Bayesian Information Criterion) across all simulation conditions with random variation around factor-rate relationships (“noise”).
All models assuming either constant or dummy factor-dependent rates were considered “null” models.

constant

hidden factor-dependent

simple

threshold

sweetspot

simple
strong weak

threshold
strong weak wide

sweetspot
strong weak wide

null
simple
threshold
sweetspot

null
simple
threshold
sweetspot

null
simple
threshold
sweetspot

0.99
0.00
0.01
0.00

0.96
0.00
0.02
0.02

0.93
0.00
0.04
0.03

0.99
0.00
0.01
0.00

0.96
0.00
0.02
0.02

0.97
0.00
0.02
0.01

0.97
0.01
0.02
0.00

0.98
0.00
0.01
0.01

0.96
0.02
0.00
0.02

50 tips

0.71
0.26
0.02
0.01

0.87
0.11
0.01
0.01

100 tips

0.24
0.71
0.00
0.05

0.64
0.27
0.03
0.06

200 tips

0.07
0.86
0.05
0.02

0.37
0.55
0.04
0.04

0.99
0.00
0.01
0.00

0.94
0.01
0.02
0.03

0.98
0.00
0.00
0.02

0.62
0.24
0.06
0.08

0.33
0.24
0.21
0.22

0.06
0.17
0.44
0.33

0.92
0.07
0.00
0.01

0.69
0.15
0.09
0.07

0.34
0.36
0.11
0.19

0.83
0.12
0.01
0.04

0.48
0.38
0.06
0.08

0.23
0.51
0.14
0.12

0.76
0.05
0.02
0.17

0.25
0.00
0.07
0.68

0.02
0.00
0.07
0.91

0.95
0.01
0.01
0.03

0.73
0.05
0.03
0.19

0.44
0.04
0.02
0.50

0.82
0.08
0.01
0.09

0.67
0.07
0.06
0.20

0.22
0.05
0.13
0.60

112

Figure 2.2 Error, power, and differentiation rates of the contsimmap-based pipeline for detecting
continuous factor-dependent rates of trait evolution. Different colors correspond to simulations
with different factor-rate relationships; different symbols to simulations with differing relationship
strength and, in the case of threshold and sweetspot models, width; and dashed versus solid lines
to simulations with versus without random variation in rates (“noise”) around simulated factor-
rate relationships. Top left: percent of simulations with either constant or hidden factor-dependent
rates for which the best-fitting model (i.e., lowest Bayesian Information Criterion) was an observed
factor-dependent model (i.e., error rates). Bottom left: percent of simulations with observed factor-
dependent rates for which the best-fitting model was also an observed factor-dependent model
(i.e., power rates). Bottom right: percent of observed-factor dependent simulations for which the
best fitting model assumed the very same kind factor-rate relationship (i.e., simple, threshold, or
sweetspot) used to simulate the data (i.e., differentiation rates).

Random variation around simulated factor-rate relationships, or “noise”, had minor yet no-

ticeable effects on our method’s performance. Specifically, noise was associated with slightly

increased error rates, albeit inconsistently, as well as ∼5-15% decreases to power rates across all

113

0246810errorconstantsimplethresholdsweetspotstrongweakwideno noisenoise501002000255075100power50100200differentiationnumber of tipspercent of simulationsFigure 2.3 Mean model-averaged factor-rate relationships inferred using the contsimmap-based
pipeline for detecting continuous factor-dependent rates of trait evolution in comparison to sim-
ulated factor-rate relationships under all simulation conditions with either constant or observed
factor-dependent rates. Different colors correspond to different phylogeny sizes (i.e., number of
tips) and dashed versus solid lines to simulation conditions with versus without random variation
in rates (“noise”) around simulated factor-rate relationships. Simulated factor-rate relationships
are represented by thick gray lines for reference.

simulation conditions. In terms of differentiation rates, noise appeared to increase support for more

complex models assuming threshold or sweetspot factor-rate relationships at the expense of those

assuming simple relationships. Overall, however, phylogeny size and/or the factor-relationship

used to simulate data had a much stronger effect on both power and differentiation rates compared

to noise. Additionally, low error rates across the board indicate that our method is fairly robust to

random rate variation due to noise and/or unobserved factors. Interestingly, noise generally had the

least severe effects on our method’s performance for simulations on both small and large phyloge-

nies. This pattern is particularly apparent in the estimated factor-rate relationships for simulations

114

-4-20241/161/41416constantstrongsimplethresholdsweetspot-4-20241/161/41416weak50 tips100 tips200 tipsno noisenoisesimulated relationship-4-20241/161/41416wide-4-2024rate (σ2)factor (Xo)with strong relationships in Fig. 2.3. While noise had virtually no effect on estimated relationships

for simulations on phylogenies with 100 and 200 tips, it tended to conservatively bias estimated

relationships for simulations with 100 tips. Roughly speaking, we hypothesize that simulations on

smaller phylogenies often failed to yield enough signal of factor-dependent rates to support factor-

dependent models regardless of noise, while simulations on larger phylogenies yielded enough

signal to “overcome” the effect of noise on inference. Indeed, for simulation conditions with weak

or wide factor-rate relationships–which generated weaker signals of factor-dependent rates–noise’s

effect on estimated relationships was most pronounced for phylogenies with 200 rather than 100

tips, presumably reflecting greater amount of data needed to reliably infer subtler factor-rate rela-

tionships.

2.3.2 Empirical Example

In addition to the simulation study, we also applied our contsimmap-based pipeline for inferring

factor-dependent rates of continuous trait evolution to test whether rates of leaf and flower trait evo-

lution are associated with height variation in the Eucalyptus subgenus Eucalyptus, or Monocalypts.

Ultimately, we found no evidence of height-rate associations for any of the trait modules investi-

gated, instead finding overwhelming support for associations between rates and simulated dummy

factors. These results strongly suggest rates of leaf and flower trait evolution in Monocalypts are

variable but not closely related to height (Fig. 2.4). To further investigate these results, we calcu-

lated marginal rate estimates for each trait module across the Monocalypt phylogeny by first tak-

ing the weighted average of estimated rates across all contsimmaps under each fitted model, with

weights given by the normalized conditional likelihoods for each contsimmap (i.e., prior-weighted

conditional likelihoods for each contsimmap rescaled to have a sum to 1), then model-averaging

the resulting rate estimates across all models using BIC weights. Notably, marginal rate varia-

tion patterns for each trait module appear similar to those for height evolution as inferred under

an evorates model (see APPENDIX 2C for further details on this model)–particularly for juvenile

leaves and flowers–suggesting that overall rates of phenotypic evolution among Monocalypts vary

according to some common yet currently unknown factor.

115

Figure 2.4 Variation in rates of leaf and flower trait evolution in the Eucalyptus subgenus Euca-
lyptus (“Monocalypts”). Top left: bar graph displaying Bayesian Information Criterion weights
for the contsimmap-based models fit to each trait module. Different colors correspond to different
factor-rate relationships, while solid and hatched bars correspond to models assuming rates depend
on height and simulated dummy factors, respectively. Bottom left: Phylogram colored according
to the average of 349 rate contsimmaps sampled from the posterior of an evolving rates model fit
to the Monocalypt height data, with dark blue and light red corresponding to slow and fast rates,
respectively. Right: phylograms depicting marginal rates estimated under contsimmap-based mod-
els for each trait module, with dark blue and light red once again corresponding to slow and fast
rates, respectively.

116

flowersinfloresencesadult leavesjuvenile leavesBIC weight0.00.20.40.60.81.0constantsimplethresholdsweetspotheight-dependentdummy factor-dependent0.0070.0180.0500.1350.368average juvenile leaf rate0.0070.0180.0500.135average adult leaf rate0.020.050.140.37average height rate3e-042e-032e-021e-011e+00average inflorescence rate0.0070.0180.0500.135average flower rate2.4 Discussion

Here, we introduced contsimmapping, a flexible and efficient method for sampling histories

of continuous variables on phylogenies under Brownian motion models of trait evolution. Con-

tsimmapping provides an intuitive framework for reconstructing and analyzing the evolution of

continuous phenotypes while accounting for our inherently incomplete knowledge of the evolu-

tionary past. Further, like conventional simmapping for discrete variables, contsimmapping pro-

vides a way to generate distributions of probable evolutionary histories for some variable, enabling

straight-forward development of rigorous statistical pipelines for determining how evolutionary

processes are affected by continuous variables like body size (e.g., Friedman et al., 2019), gener-

ation time (e.g., Gingerich, 2001), or climatic niche (e.g., Tribble et al., 2023). To this end, we

implemented a contsimmap-based pipeline capable of robustly detecting and accurately quanti-

fying relationships between continuously-varying factors and rates of trait evolution. Using this

novel approach, we show that rates of leaf and flower trait evolution are generally unrelated to

height in a clade of Eucalyptus trees spanning nearly two orders of magnitude in height variation

(Brooker et al., 2015; Thornhill et al., 2019; Falster et al., 2021). Nonetheless, our analysis uncov-

ered striking similarities in patterns of evolutionary rate variation across different Eucalyptus traits

(Figure 2.4), demonstrating how contsimmaps provide a powerful toolkit for flexibly interrogating

the evolutionary dynamics of continuous traits.

Our implementation of contsimmapping is already quite comprehensive, allowing for con-

tsimmapping of multiple correlated continuous variables while incorporating within-species varia-

tion/measurement error and even regime-dependent rates and trends. Additionally, the contsimmap

R package provides a variety of tools for transforming, summarizing, and visualizing contsimmaps,

which may be used to conveniently explore patterns in phylogenetic comparative data on contin-

uous traits while accounting for uncertainty as well as produce publication-quality figures (e.g.,

Figs. 2.1,2.4; see also Figs. 2C.3 and 2C.5). While we largely focused on using contsimmaps

to infer associations between rates of continuous trait evolution and continuously-varying factors

in the current study, our pipeline could be easily adapted to infer factor-dependent evolutionary

117

trends and node error variances using tools already available in our R package. Looking forward,

contsimmapping could be extended to sample evolutionary histories under other popular models

of continuous trait evolution such as Ornstein-Uhlenbeck and Lévy processes, which are generally

interpreted as models of adaptive and pulsed evolution, respectively. Another promising future di-

rection would be the development of methods for fitting Ornstein-Uhlenbeck models, discrete trait

evolution models, or lineage diversification models with factor-dependent parameters conditional

on contsimmaps. Such methods would further enhance macroevolutionary biologists’ ability to

explore and answer questions regarding the interplay of evolutionary processes with continuous

variables.

2.4.1 Using contsimmaps to infer associations between continuous variables and rates of

trait evolution

Overall, our contsimmap-based pipeline provides an effective and practical way to detect and

quantify relationships among rates of trait evolution and continuous variables–at least given a suf-

ficiently large effect size and phylogeny. Our pipeline is not the first statistical method developed

for inferring relationships between rates of trait evolution and continuous factors, though it is cur-

rently the most flexible and thoroughly-tested one to our knowledge. Indeed, many previous meth-

ods were developed for particular empirical studies to circumvent the lack of a standard, general-

purpose approach. Accordingly, the general statistical performance of these methods often remains

largely unknown (e.g., Cooper and Purvis, 2009; Uyeda et al., 2021). The evorag method devel-

oped by Weir and Lawson, 2015 constitutes a notable exception as a well-studied method suitable

for a variety of purposes, though it only uses sister pair contrasts to infer factor-rate relationships.

By using entire phylogenies, our method theoretically more fully utilizes the information available

in phylogenetic comparative data to infer factor-rate relationships. More recently, Hansen et al.,

2022 developed an elegant regression-based approach for inferring correlations between rates and

continuous variables, though this method is only capable of inferring linear or exponential rela-

tionships between a single continuous factor and rates for a single continuous trait (or node error

variances, which are interpreted as “microevolutionary rates” by the authors). In comparison to this

118

method, our contsimmap-based pipeline sacrifices computational simplicity for flexibility, allow-

ing researchers to incorporate explicit reconstructions of discrete and multiple continuous factors

into analyses, fit multivariate models of continuous trait evolution, and/or simultaneously link mul-

tiple trait evolution process parameters (i.e., rates, node error variances, and/or evolutionary trends)

with reconstructed factors via arbitrary parameter functions (computational/statistical tractability

notwithstanding). Overall, our new pipeline is a substantial generalization of previous methods,

greatly expanding the hypothesis-testing capabilities of macroevolutionary biologists and allow-

ing researchers to shed light on many famous long-standing questions regarding how phenotypic

evolutionary processes are affected by continuous variables.

While our pipeline for inferring associations between rates of trait evolution and continous fac-

tors is generally quite accurate and robust, it notably struggles to detect relatively weak factor-rate

relationships from smaller phylogenetic comparative datasets. Based on the simulation conditions

examined here, our method generally seems to require phylogenies with around 100 tips or more

to reliably detect ∼20-fold differences in rates associated with a continuous factor, and at least

200 tips for 5-fold differences. Nonetheless, our approach is still able to reliably detect and re-

ject factor-rate associations even in the face of rate variation due to unobserved factors and/or

general “noise” around relationships. Thus, empirical applications of this approach may fail to

detect weak factor-rate associations in some cases yet are importantly unlikely to infer spurious

associations (though, as with any correlative statistical method, associations may in fact reflect

unobserved factors which correlated/confounded with observed factors). Interestingly, while the

error rates of our method remained low across all simulation conditions, we do note subtly elevated

error among simulations on larger phylogenies involving noise and/or sweetspot (i.e., modal) re-

lationships between rates and unobserved factors. We hypothesize these patterns ultimately reflect

“poorly replicated bursts” of trait evolution (sensu Maddison and FitzJohn, 2015; Uyeda et al.,

2018), whereby apparent yet transient rate fluctuations occur in a few subclades that happen to ex-

hibit similar observed factor values by chance. In practice, one could investigate marginalized rate

estimates (see Figure 2.4) to detect such scenarios and/or incorporate node error into analyses to re-

119

duce the impact of recent, transient rate fluctuations on the inference of trait evolutionary processes

(Landis and Schraiber, 2017). Notably, support for constant rate models was rather low across all

simulation conditions, including those with truly constant rates unaffected by noise and/or unob-

served factors (Tables 2A.4 and 2A.5). We suspect the highly stochastic nature of trait divergence

under Brownian motion processes (i.e., subclades exhibiting somewhat anomalous levels of trait

disparity by chance), along with the phylogenetic autocorrelation of factor data, may lead to phy-

logenetic pseudoreplication of spurious factor-rate associations with surprising frequency. Such

phenomena would be naturally exacerbated by actual rate variation driven by unobserved factors

and/or within-species variation/measurement error creating the illusion of unexpectedly large dis-

parity among recently-diverged lineages (Felsenstein, 2008; Landis and Schraiber, 2017). In any

case, dummy factor-dependent models were absolutely critical for controlling our method’s error

rates by providing more “competitive” null models that effectively account for such spurious cor-

relations generated either by chance or unobserved factors. Our work adds to a growing body of

literature demonstrating that accounting for “background” or “residual” heterogeneity in evolution-

ary processes is absolutely critical for robust phylogenetic comparative inference and hypothesis

testing (Maddison and FitzJohn, 2015; Beaulieu and O’Meara, 2016; Uyeda et al., 2018; May and

Moore, 2020; Boyko and Beaulieu, 2023; Boyko et al., 2023b; Tribble et al., 2023).

Beyond the magnitude of rate variation and phylogeny size, differences in the shapes of factor-

rate relationships had a profound influence on our method’s power and accuracy. Perhaps unsur-

prisingly, “wider” factor-rate relationships that featured more gradual changes in rates over factor

values were generally harder to both detect and accurately infer compared to narrower relationships

with the same effect size. Thus, our method’s ability to infer factor-rate relationships importantly

depends not only on the overall magnitude of rate differences, but also how frequently apparent

rate shifts tend to occur across a phylogeny, which is primarily determined by how abruptly rates

change with respect to factor values. Additionally, sweetspot factor-rate relationships, whereby

rates peak or dip at intermediate factor values, appear to require particularly strong effect sizes

and/or large phylogenies to reliably detect, likely due to difficulties in correlating the transient rate

120

fluctuations that occur under such relationships with factor values only observed at the tips of a

phylogeny. By comparison, simple (i.e., exponential) and threshold (i.e., logistic) factor-rate rela-

tionships, whereby rates strictly decrease or increase with respect factor values, appear to generate

more persistent rate shifts and thus yield stronger signals of factor-rate associations. However,

similarly to methods for detecting relationships between continuous factors and lineage diver-

sification rates (FitzJohn, 2010), despite consistently detecting factor-dependent rates from data

simulated under more complex threshold relationships, our approach largely failed to distinguish

such relationships from sweetspot and especially simple relationships–even with large phylogenies

including 200 tips.

Notably, some recent work suggests detecting correlations between rates and factors from phy-

logenetic comparative data is generally quite challenging (Hansen et al., 2022). Thus, the low

power documented in our simulation study may well reflect fundamental limits to phylogenetic

comparative inference of factor-rate relationships rather than shortcomings particular to our ap-

proach. Shrinkage of estimated rates, whereby high and low rates tend to be under- and over-

estimated, respectively, has also been found in previous studies investigating methods for infer-

ring variation in rates of trait evolution (Revell, 2013; Martin et al., 2023). Notably, the shrink-

age of factor-rate relationships inferred from larger phylogenies was generally mild compared to

the shrinkage of relationships inferred from smaller phylogenies–particularly when the simulated

factor-rate relationship was weak–suggesting the shrinkage observed in our simulation study pri-

marily arises from our method’s limited power with small effect and sample sizes. Nonetheless,

as noted by Revell, 2013, uncertainty in factor histories may cause factor values associated with

low rates to be inferred in lineages evolving at relatively high rates and vice versa, weakening

apparent factor-rate associations and further exacerbating shrinkage of estimated rates. A joint in-

ference approach, whereby the factor history and its effect on trait evolution processes are inferred

simultaneously rather than sequentially, could help mitigate these effects and enable more accu-

rate inference of factor-rate associations. However, such methods entail significant mathematical,

statistical, and computational challenges due to their complexity (e.g., FitzJohn, 2010, May and

121

Moore, 2020, Boyko et al., 2023b).

2.4.2 Size is unrelated to rates of trait evolution in Monocalypts

Using our contsimmap-based pipeline for inferring associations between rates of trait evo-

lution and continuous factors, we show that rates and size appear unrelated in the Eucalyptus

subgenus Eucalyptus (i.e., Monocalypts), despite numerous theoretical predictions and empirical

findings suggesting body size is intertwined with many aspects life history and thus evolutionary

rates (Niklas and Enquist, 2001; White et al., 2007; Sibly and Brown, 2007; Adler et al., 2014;

Salguero-Gómez et al., 2016; Bakewell et al., 2020). Broadly, lineages of larger organisms are

thought to accumulate genetic and phenotypic variation more slowly due to their relatively small

population sizes and long generation times. Indeed, larger animals and plants do generally ex-

hibit slower rates of molecular evolution (Gillooly et al., 2005; Fontanillas et al., 2007; Bromham,

2011; Lanfear et al., 2013; Weber et al., 2014; but see Thomas et al., 2006; Wright et al., 2011;

Lourenço et al., 2013). However, smaller populations can also evolve rapidly due to genetic drift,

and empirical relationships between body size and rates of phenotypic evolution/lineage diversi-

fication have notably proven inconsistent compared to relationships between size and molecular

rates. In line with our results, a recent analysis suggests size and lineage diversification rates are

unrelated within the genus Eucalyptus (Vasconcelos et al., 2022), agreeing with patterns found in

several animal groups (Cardillo et al., 2003; Feldman et al., 2016; Rainford et al., 2016; but see

Amado et al., 2021) but contrasting with a broader pattern of slower lineage diversification among

larger plants (Boucher et al., 2017; Igea et al., 2017; see also Wollenberg et al., 2011; Tedesco

et al., 2017). Previous work on phenotypic rates predominantly focuses on vertebrates and has also

yielded mixed results, suggesting that body shape evolution is slower among larger fish and birds

(Friedman et al., 2019; Zimova et al., 2023) while body size and cranial shape evolution is faster

among larger mammals (Cooper and Purvis, 2009; Baker et al., 2015).

Ultimately, the lack of general scaling relationships between body size and rates of phenotypic

evolution (or lineage diversification rates) likely reflects idiosyncratic adaptations in some clades

weakening or completely altering expected relationships among body size, life history traits, and

122

evolutionary rates. For example, Eucalyptus species may exhibit an inverse relationship between

size and lifespan, as smaller, shrubby species, termed “mallees”, tend to be highly fire-tolerant and

live for many centuries through countless cycles of burning followed by resprouting. On the other

hand, the tallest Eucalyptus species generally live in more mesic environments and rely on banks

of serotinous seed capsules to sprout and replace the previous generation following intense, mature

tree-killing fires (Nicolle, 2006). These peculiarities of Eucalyptus biology may account for dif-

ferences in size-lineage diversification rate correlations between Eucalyptus and other plant clades

(Boucher et al., 2017; Igea et al., 2017; Vasconcelos et al., 2022), though further investigations

of size-phenotypic rate correlations are needed to gauge whether the results of our study are truly

anomalous or reflect broader patterns across plants.

While rates of phenotypic evolution among Monocalypts showed no relationship to size, they

are highly variable based on our analysis’ strong support for models assuming rates are associ-

ated with simulated dummy factors. By further interrogating the patterns of rate variation inferred

under the dummy factor-dependent models, we show that variation in rates of phenotypic evo-

lution across the Monocalypt phylogeny are relatively consistent across traits. We hypothesize

these patterns reflect some common as-of-yet unknown factor modulating the apparent pace of

phenotypic evolution among Monocalypts. Given that the highest rates consistently occur in the

subclade roughly corresponding to the section Eucalyptus–a relatively young radiation concen-

trated in southeast Australia (Ladiges et al., 2010; Nicolle, 2022)–such rate variation may reflect

a genuine shift in evolutionary dynamics among this subclade, an overall acceleration in rates of

phenotypic evolution over time, or phenotypic evolution among Monocalypts generally following

Ornstein-Uhlenbeck-like processes (Blomberg et al., 2003). Alternatively, such results may also

reflect inaccurate estimation of phylogenetic topology and/or branch lengths, as accurate phyloge-

netic inference is challenging for Eucalyptus due to frequent hybridization and incomplete lineage

sorting (Rutherford et al., 2016; Thornhill et al., 2019; McLay et al., 2023). In any case, con-

tsimmapping provides an effective tool for more thoroughly exploring the interplay of body size

and rates of phenotypic evolution, potentially helping researchers develop more cohesive, general-

123

ized theories explaining how the rates of different evolutionary processes interrelate to one another

and various life history attributes across the tree of life.

2.4.3 Conclusion

Here, we developed a novel method for sampling the histories of continuous variables on

phylogenies under Brownian motion models of trait evolution, generalizing conventional discrete

stochastic character mapping or “simmapping” methods to work with continuous variables. We

further show how these continuous stochastic character maps or “contsimmaps” may be used to ac-

curately and robustly infer relationships between evolutionary processes and continuously-varying

factors such as body size, generation time, or climatic niche. In the process, we notably developed

pragmatic techniques to account for the influence of unobserved factors on evolutionary processes

in macroevolutionary hypothesis testing. Lastly, we used contsimmaps to test whether height is

associated with rates of leaf and flower trait evolution in a clade of eucalyptus trees. Despite find-

ing no evidence for height-rate associations, our empirical case study nonetheless demonstrates

the empirical utility of contsimmaps in characterizing general patterns of variation in evolutionary

processes across clades. Ultimately, contsimmapping will empower researchers with new and in-

novative strategies for analyzing the evolutionary dynamics of continuous phenotypes and testing

macroevolutionary hypotheses that require knowing how continuous variables have changed over

evolutionary time.

124

BIBLIOGRAPHY

Adler P.B., Salguero-Gómez R., Compagnoni A., Hsu J.S., Ray-Mukherjee J., Mbeau-Ache C.,
and Franco M. 2014. Functional traits explain variation in plant life history strategies. Proc Natl
Acad Sci USA 111:740–745.

Amado T.F., Martinez P.A., Pincheira-Donoso D., and Olalla-Tárraga M.Á. 2021. Body size dis-
tributions of anurans are explained by diversification rates and the environment. Glob Ecol Bio-
geogr 30:154–164.

Baker J., Meade A., Pagel M., and Venditti C. 2015. Adaptive evolution toward larger size in

mammals. Proc Natl Acad Sci USA 112:5093–5098.

Bakewell A.T., Davis K.E., Freckleton R.P., Isaac N.J.B., and Mayhew P.J. 2020. Comparing life
histories across taxonomic groups in multiple dimensions: How mammal-like are insects? Am
Nat 195:70–81.

Baliga V.B. and Law C.J. 2016. Cleaners among wrasses: Phylogenetics and evolutionary patterns

of cleaning behavior within Labridae. Mol Phylogenet Evol 94:424–435.

Beaulieu J. and O’Meara B. 2023. OUwie: Analysis of Evolutionary Rates in an OU Framework.

R package version 2.10.

Beaulieu J.M. and O’Meara B.C. 2016. Detecting hidden diversification shifts in models of trait-

dependent speciation and extinction. Syst Biol 65:583–601.

Beaulieu J.M., O’Meara B.C., and Donoghue M.J. 2013. Identifying hidden rate changes in the
evolution of a binary morphological character: The evolution of plant habit in campanulid an-
giosperms. Syst Biol 62:725–737.

Betancourt M. and Girolami M. 2019. Hamiltonian Monte Carlo for hierarchical models. Pages 79–
97 in Current Trends in Bayesian Methodology with Applications (S. K. Upadhyay, U. Singh,
D. K. Dey, and A. Loganathan, eds.). Chapman and Hall/CRC Press, Boca Raton, FL.

Blomberg S.P., Garland T. Jr, and Ives A.R. 2003. Testing for phylogenetic signal in comparative

data: Behavioral traits are more labile. Evolution 57:717–745.

Bollback J.P. 2006. SIMMAP: Stochastic character mapping of discrete traits on phylogenies.

BMC Bioinformatics 7:88.

Borstein S.R., Fordyce J.A., O’Meara B.C., Wainwright P.C., and McGee M.D. 2019. Reef fish

functional traits evolve fastest at trophic extremes. Nat Ecol Evol 3:191–199.

Bottou L., Curtis F.E., and Nocedal J. 2018. Optimization methods for large-scale machine learn-

ing. SIAM Rev 60:223–311.

125

Boucher F.C., Démery V., Conti E., Harmon L.J., and Uyeda J. 2018. A general model for estimat-

ing macroevolutionary landscapes. Syst Biol 67:304–319.

Boucher F.C., Verboom G.A., Musker S., and Ellis A.G. 2017. Plant size: A key determinant of

diversification? New Phytol 216:24–31.

Boyko J.D. and Beaulieu J.M. 2023. Reducing the biases in false correlations between discrete

characters. Syst Biol 72:476–488.

Boyko J.D., Hagen E.R., Beaulieu J.M., and Vasconcelos T. 2023a. The evolutionary responses of
life-history strategies to climatic variability in flowering plants. New Phytol 240:1587–1600.

Boyko J.D., O’Meara B.C., and Beaulieu J.M. 2023b. A novel method for jointly modeling the

evolution of discrete and continuous traits. Evolution 77:836–851.

Brent R.P. 2013. Algorithms for Minimization Without Derivatives. Dover Publications, Mineola,

NY.

Bromham L. 2011. The genome as a life-history character: Why rate of molecular evolution varies

between mammal species. Philos Trans R Soc B 366:2503–2513.

Brooker I., Slee A., Connor J., Duffy S., and West J. 2015. EUCLID Eucalypts of Australia. 4 ed.

Identic Pty Ltd, Brisbane, QLD.

Burns M.D. and Bloom D.D. 2020. Migratory lineages rapidly evolve larger body sizes than non-

migratory relatives in ray-finned fishes. Proc R Soc B 287:20192615.

Cardillo M., Huxtable J.S., and Bromham L. 2003. Geographic range size, life history and rates of

diversification in Australian mammals. J Evol Biol 16:282–288.

Chira A.M., Cooney C.R., Bright J.A., Capp E.J.R., Hughes E.C., Moody C.J.A., Nouri L.O.,
Varley Z.K., and Thomas G.H. 2018. Correlates of rate heterogeneity in avian ecomorphological
traits. Ecol Lett 21:1505–1514.

Clavel J., Escarguel G., and Merceron G. 2015. mvMORPH: An R package for fitting multivariate

evolutionary models to morphometric data. Methods Ecol Evol 6:1311–1319.

Clavel J. and Morlon H. 2017. Accelerated body size evolution during cold climatic periods in the

Cenozoic. Proc Natl Acad Sci USA 114:4183–4188.

Cooper N. and Purvis A. 2009. What factors shape rates of phenotypic evolution? A comparative

study of cranial morphology of four mammalian clades. J Evol Biol 22:1024–1035.

de Alencar L.R.V., Martins M., Burin G., and Quental T.B. 2017. Arboreality constrains morpho-

logical evolution but not species diversification in vipers. Proc R Soc B 284:20171775.

126

Dembo R.S. and Steihaug T. 1983. Truncated-Newton algorithms for large-scale unconstrained

optimization. Math Program 26:190–212.

Dobzhansky T. and Sturtevant A.H. 1938. Inversions in the chromosomes of Drosophila pseudoob-

scura. Genetics 23:28–64.

Drury J.P., Clavel J., Tobias J.A., Rolland J., Sheard C., and Morlon H. 2021. Tempo and mode of

morphological evolution are decoupled from latitude in birds. PLoS Biol 19:e3001270.

Dufresne D. 2004. The log-normal approximation in financial and other computations. Adv Appl

Probab 36:747–773.

Dziak J.J., Coffman D.L., Lanza S.T., Li R., and Jermiin L.S. 2020. Sensitivity and specificity of

information criteria. Brief Bioinform 21:553–565.

Fabre A.C., Bardua C., Bon M., Clavel J., Felice R.N., Streicher J.W., Bonnel J., Stanley E.L.,
Blackburn D.C., and Goswami A. 2020. Metamorphosis shapes cranial diversity and rate of
evolution in salamanders. Nat Ecol Evol 4:1129–1140.

Falster D., Gallagher R., Wenk E.H., Wright I.J., Indiarto D., Andrew S.C., Baxter C., Lawson
J., Allen S., Fuchs A., Monro A., Kar F., Adams M.A., Ahrens C.W., Alfonzetti M., Angevin
T., Apgaua D.M.G., Arndt S., Atkin O.K., Atkinson J., Auld T., Baker A., von Balthazar M.,
Bean A., Blackman C.J., Bloomfield K., Bowman D.M.J.S., Bragg J., Brodribb T.J., Buckton
G., Burrows G., Caldwell E., Camac J., Carpenter R., Catford J.A., Cawthray G.R., Cernusak
L.A., Chandler G., Chapman A.R., Cheal D., Cheesman A.W., Chen S.C., Choat B., Clinton
B., Clode P.L., Coleman H., Cornwell W.K., Cosgrove M., Crisp M., Cross E., Crous K.Y.,
Cunningham S., Curran T., Curtis E., Daws M.I., DeGabriel J.L., Denton M.D., Dong N., Du P.,
Duan H., Duncan D.H., Duncan R.P., Duretto M., Dwyer J.M., Edwards C., Esperon-Rodriguez
M., Evans J.R., Everingham S.E., Farrell C., Firn J., Fonseca C.R., French B.J., Frood D., Funk
J.L., Geange S.R., Ghannoum O., Gleason S.M., Gosper C.R., Gray E., Groom P.K., Grootemaat
S., Gross C., Guerin G., Guja L., Hahs A.K., Harrison M.T., Hayes P.E., Henery M., Hochuli
D., Howell J., Huang G., Hughes L., Huisman J., Ilic J., Jagdish A., Jin D., Jordan G., Jurado E.,
Kanowski J., Kasel S., Kellermann J., Kenny B., Kohout M., Kooyman R.M., Kotowska M.M.,
Lai H.R., Laliberté E., Lambers H., Lamont B.B., Lanfear R., van Langevelde F., Laughlin
D.C., Laugier-Kitchener B.A., Laurance S., Lehmann C.E.R., Leigh A., Leishman M.R., Lenz
T., Lepschi B., Lewis J.D., Lim F., Liu U., Lord J., Lusk C.H., Macinnis-Ng C., McPherson H.,
Magallón S., Manea A., López-Martinez A., Mayfield M., McCarthy J.K., Meers T., van der
Merwe M., Metcalfe D.J., Milberg P., Mokany K., Moles A.T., Moore B.D., Moore N., Morgan
J.W., Morris W., Muir A., Munroe S., Nicholson Á., Nicolle D., Nicotra A.B., Niinemets Ü.,
North T., O’Reilly-Nugent A., O’Sullivan O.S., Oberle B., Onoda Y., Ooi M.K.J., Osborne
C.P., Paczkowska G., Pekin B., Guilherme Pereira C., Pickering C., Pickup M., Pollock L.J.,
Poot P., Powell J.R., Power S.A., Prentice I.C., Prior L., Prober S.M., Read J., Reynolds V.,
Richards A.E., Richardson B., Roderick M.L., Rosell J.A., Rossetto M., Rye B., Rymer P.D.,
Sams M.A., Sanson G., Sauquet H., Schmidt S., Schönenberger J., Schulze E.D., Sendall K.,
Sinclair S., Smith B., Smith R., Soper F., Sparrow B., Standish R.J., Staples T.L., Stephens R.,

127

Szota C., Taseski G., Tasker E., Thomas F., Tissue D.T., Tjoelker M.G., Tng D.Y.P., de Tombeur
F., Tomlinson K., Turner N.C., Veneklaas E.J., Venn S., Vesk P., Vlasveld C., Vorontsova M.S.,
Warren C.A., Warwick N., Weerasinghe L.K., Wells J., Westoby M., White M., Williams N.S.G.,
Wills J., Wilson P.G., Yates C., Zanne A.E., Zemunik G., and Ziemi´nska K. 2021. AusTraits, a
curated plant trait database for the Australian flora. Sci Data 8:254.

Feldman A., Sabath N., Pyron R.A., Mayrose I., and Meiri S. 2016. Body sizes and diversification

rates of lizards, snakes, amphisbaenians and the tuatara. Glob Ecol Biogeogr 25:187–197.

Felsenstein J. 2008. Comparative methods with sampling error and within-species variation: Con-

trasts revisited and revised. Am Nat 171:713–725.

Felsenstein J. 2012. A comparative method for both discrete and continuous characters using the

threshold model. Am Nat 179:145–156.

FitzJohn R.G. 2010. Quantitative traits and diversification. Syst Biol 59:619–633.

FitzJohn R.G., Maddison W.P., and Otto S.P. 2009. Estimating trait-dependent speciation and ex-

tinction rates from incompletely resolved phylogenies. Syst Biol 58:595–611.

Fontanillas E., Welch J.J., Thomas J.A., and Bromham L. 2007. The influence of body size and net
diversification rate on molecular evolution during the radiation of animal phyla. BMC Evol Biol
7:95.

Freyman W.A. and Höhna S. 2019. Stochastic character mapping of state-dependent diversification
reveals the tempo of evolutionary decline in self-compatible Onagraceae lineages. Syst Biol
68:505–519.

Friedman S.T., Martinez C.M., Price S.A., and Wainwright P.C. 2019. The influence of size on

body shape diversification across Indo-Pacific shore fishes. Evolution 73:1873–1884.

Friedman S.T. and Muñoz M.M. 2023. A latitudinal gradient of deep-sea invasions for marine

fishes. Nat Commun 14:773.

Geyer C. 2011. Introduction to Markov chain Monte Carlo. in Handbook of Markov Chain Monte
Carlo (S. Brooks, A. Gelman, G. L. Jones, and M. Xiao-Li, eds.). Chapman and Hall/CRC, Boca
Raton, FL.

Gillooly J.F., Allen A.P., West G.B., and Brown J.H. 2005. The rate of DNA evolution: Effects of

body size and temperature on the molecular clock. Proc Natl Acad Sci USA 102:140–145.

Gingerich P.D. 2001. Rates of evolution on the time scale of the evolutionary process. Genetica

112-113:127–144.

Goolsby E.W. 2017. Rapid maximum likelihood ancestral state reconstruction of continuous char-

128

acters: A rerooting-free algorithm. Ecol Evol 7:2791–2797.

Groussin M., Daubin V., Gouy M., and Tannier E. 2016. Ancestral reconstruction: Theory and
practice. Pages 70–77 in Encyclopedia of Evolutionary Biology (R. M. Kliman, ed.). Academic
Press, Waltham, MA.

Hansen T.F., Bolstad G.H., and Tsuboi M. 2022. Analyzing disparity and rates of morphological

evolution with model-based phylogenetic comparative methods. Syst Biol 71:1054–1072.

Hansen T.F., Pienaar J., and Orzack S.H. 2008. A comparative method for studying adaptation to

a randomly evolving environment. Evolution 62:1965–1977.

Hassler G., Tolkoff M.R., Allen W.L., Ho L.S.T., Lemey P., and Suchard M.A. 2022. Inferring
phenotypic trait evolution on large trees with many incomplete measurements. J Am Stat Assoc
117:678–692.

Huelsenbeck J.P., Nielsen R., and Bollback J.P. 2003. Stochastic mapping of morphological char-

acters. Syst Biol 52:131–158.

Hughes J.J., Berv J.S., Chester S.G.B., Sargis E.J., and Field D.J. 2021. Ecological selectivity
and the evolution of mammalian substrate preference across the K-Pg boundary. Ecol Evol
11:14540–14554.

Igea J., Miller E.F., Papadopulos A.S.T., and Tanentzap A.J. 2017. Seed size and its rate of evolu-

tion correlate with species diversification across angiosperms. PLoS Biol 15:e2002792.

Johnson S.G. 2021. The NLopt nonlinear-optimization package. Version 2.7.1.

Joy J.B., Liang R.H., McCloskey R.M., Nguyen T., and Poon A.F.Y. 2016. Ancestral reconstruc-

tion. PLoS Comput Biol 12:e1004763.

Ladiges P.Y., Bayly M.J., and Nelson G.J. 2010. East-west continental vicariance in: Eucalyptus:
Subgenus: Eucalyptus. Pages 267–302 in Beyond Cladistics: The Branching of a Paradigm
(D. M. Williams and S. Knapp, eds.). University of California Press, Oakland, CA.

Landis M.J., Eaton D.A.R., Clement W.L., Park B., Spriggs E.L., Sweeney P.W., Edwards E.J.,
and Donoghue M.J. 2021. Joint phylogenetic estimation of geographic movements and biome
shifts during the global diversification of Viburnum. Syst Biol 70:67–85.

Landis M.J. and Schraiber J.G. 2017. Pulsed evolution shaped modern vertebrate body sizes. Proc

Natl Acad Sci USA 114:13224–13229.

Landis M.J., Schraiber J.G., and Liang M. 2013. Phylogenetic analysis using Lévy processes:

finding jumps in the evolution of continuous traits. Syst Biol 62:193–204.

129

Lanfear R., Ho S.Y.W., Jonathan Davies T., Moles A.T., Aarssen L., Swenson N.G., Warman L.,
Zanne A.E., and Allen A.P. 2013. Taller plants have lower rates of molecular evolution. Nat
Commun 4:1879.

Lepage T., Bryant D., Philippe H., and Lartillot N. 2007. A general comparison of relaxed molec-

ular clock models. Mol Biol Evol 24:2669–2680.

Lewandowski D., Kurowicka D., and Joe H. 2009. Generating random correlation matrices based

on vines and extended onion method. J Multivar Anal 100:1989–2001.

Lourenço J.M., Glémin S., Chiari Y., and Galtier N. 2013. The determinants of the molecular

substitution process in turtles. J Evol Biol 26:38–50.

Maddison W.P. and FitzJohn R.G. 2015. The unsolved challenge to phylogenetic correlation tests

for categorical characters. Syst Biol 64:127–136.

Martin B.S., Bradburd G.S., Harmon L.J., and Weber M.G. 2023. Modeling the evolution of rates

of continuous trait evolution. Syst Biol 72:590–605.

May M.R. and Moore B.R. 2020. A Bayesian approach for inferring the impact of a discrete char-
acter on rates of continuous-character evolution in the presence of background-rate variation.
Syst Biol 69:530–544.

McLay T.G.B., Fowler R.M., Fahey P.S., Murphy D.J., Udovicic F., Cantrill D.J., and Bayly M.J.
2023. Phylogenomics reveals extreme gene tree discordance in a lineage of dominant trees:
Hybridization, introgression, and incomplete lineage sorting blur deep evolutionary relation-
ships despite clear species groupings in Eucalyptus subgenus Eudesmia. Mol Phylogenet Evol
187:107869.

Nations J.A., Mount G.G., Morere S.M., Achmadi A.S., Rowe K.C., and Esselstyn J.A. 2021.
Locomotory mode transitions alter phenotypic evolution and lineage diversification in an eco-
logically rich clade of mammals. Evolution 75:376–393.

Nelder J.A. and Mead R. 1965. A simplex method for function minimization. Comput J 7:308–

313.

Nicolle D. 2006. A classification and census of regenerative strategies in the eucalypts (Angophora,
Corymbia and Eucalyptus–Myrtaceae), with special reference to the obligate seeders. Aust J Bot
54:391–407.

Nicolle D. 2022. Classification of the eucalypts (Angophora, Corymbia and Eucalyptus) version 6.

https://www.dn.com.au/Classification-Of-The-Eucalypts.pdf accessed: 2023-11-29.

Nielsen R. 2002. Mapping mutations on phylogenies. Syst Biol 51:729–739.

130

Niklas K.J. and Enquist B.J. 2001. Invariant scaling relationships for interspecific plant biomass

production rates and body size. Proc Natl Acad Sci USA 98:2922–2927.

Pauling L., Zuckerkandl E., Henriksen T., and Lövstad R. 1963. Chemical paleogenetics: Molecu-

lar “restoration studies” of extinct forms of life. Acta Chem Scand 17:9–16.

Pennell M.W., Eastman J.M., Slater G.J., Brown J.W., Uyeda J.C., FitzJohn R.G., Alfaro M.E.,
and Harmon L.J. 2014. geiger v2.0: An expanded suite of methods for fitting macroevolutionary
models to phylogenetic trees. Bioinformatics 30:2216–2218.

Rainford J.L., Hofreiter M., and Mayhew P.J. 2016. Phylogenetic analyses suggest that diversifi-

cation and body size evolution are independent in insects. BMC Evol Biol 16:8.

Revell L.J. 2012. phytools: an R package for phylogenetic comparative biology (and other things).

Methods Ecol Evol 3:217–223.

Revell L.J. 2013. A comment on the use of stochastic character maps to estimate evolutionary rate

variation in a continuously valued trait. Syst Biol 62:339–345.

Rincon-Sandoval M., Duarte-Ribeiro E., Davis A.M., Santaquiteria A., Hughes L.C., Baldwin
C.C., Soto-Torres L., Acero P A., Walker H.J. Jr, Carpenter K.E., Sheaves M., Ortí G., Arcila
D., and Betancur-R R. 2020. Evolutionary determinism and convergence associated with water-
column transitions in marine fishes. Proc Natl Acad Sci USA 117:33396–33403.

Rowan T.H. 1990. Functional stability analysis of numerical algorithms. Ph.D. thesis Department

of Computer Science, University of Texas at Austin, TX.

Rutherford S., Wilson P.G., Rossetto M., and Bonser S.P. 2016. Phylogenomics of the green ash eu-
calypts (Myrtaceae): A tale of reticulate evolution and misidentification. Aust Syst Bot 28:326–
354.

Salguero-Gómez R., Jones O.R., Jongejans E., Blomberg S.P., Hodgson D.J., Mbeau-Ache C.,
Zuidema P.A., de Kroon H., and Buckley Y.M. 2016. Fast-slow continuum and reproductive
strategies structure plant life-history variation worldwide. Proc Natl Acad Sci USA 113:230–
235.

Sanger F., Thompson E.O., and Kitai R. 1955. The amide groups of insulin. Biochem J 59:509–

518.

Sauer T. 2013. Numerical Analysis: Pearson New International Edition. Pearson Education Lim-

ited, Harlow, UK.

Schluter D., Price T., Mooers A.Ø., and Ludwig D. 1997. Likeliood of ancestor state in adaptive

radiation. Evolution 51:1699–1711.

131

Schultz T.R., Cocroft R.B., and Churchill G.A. 1996. The reconstruction of ancestral character

states. Evolution 50:504–511.

Schwarz G. 1978. Estimating the dimension of a model. Ann Stat 6:461–464.

Sibly R.M. and Brown J.H. 2007. Effects of body size and lifestyle on evolution of mammal life

histories. Proc Natl Acad Sci USA 104:17707–17712.

Siqueira A.C., Muruga P., and Bellwood D.R. 2023. On the evolution of fish-coral interactions.

Ecol Lett 26:1348–1358.

Slater G.J., Goldbogen J.A., and Pyenson N.D. 2017. Independent evolution of baleen whale gi-

gantism linked to Plio-Pleistocene ocean dynamics. Proc R Soc B 284:20170546.

Stan Development Team . 2019. Stan Modeling Language Users Guide and Reference Manual.

Version 2.21.0.

Sumrall C.D. and Brochu C.A. 2008. Viewing paleobiology through the lens of phylogeny. Pale-

ontol Soc Papers 14:165–183.

Tedesco P.A., Paradis E., Lévêque C., and Hugueny B. 2017. Explaining global-scale diversifica-

tion patterns in actinopterygian fishes. J Biogeogr 44:773–783.

Thomas J.A., Welch J.J., Woolfit M., and Bromham L. 2006. There is no universal molecular
clock for invertebrates, but rate variation does not scale with body size. Proc Natl Acad Sci USA
103:7366–7371.

Thornhill A.H., Crisp M.D., Külheim C., Lam K.E., Nelson L.A., Yeates D.K., and Miller J.T.
2019. A dated molecular perspective of eucalypt taxonomy, evolution and diversification. Aust
Syst Bot 32:29–48.

Tornabene L., Van Tassell J.L., Robertson D.R., and Baldwin C.C. 2016. Repeated invasions into
the twilight zone: Evolutionary origins of a novel assemblage of fishes from deep Caribbean
reefs. Mol Ecol 25:3662–3682.

Tribble C.M., May M.R., Jackson-Gain A., Zenil-Ferguson R., Specht C.D., and Rothfels C.J.
2023. Unearthing modes of climatic adaptation in underground storage organs across Liliales.
Syst Biol 72:198–212.

Uyeda J.C., Bone N., McHugh S., Rolland J., and Pennell M.W. 2021. How should functional rela-
tionships be evaluated using phylogenetic comparative methods? A case study using metabolic
rate and body temperature. Evolution 75:1097–1105.

Uyeda J.C., Zenil-Ferguson R., and Pennell M.W. 2018. Rethinking phylogenetic comparative

methods. Syst Biol 67:1091–1109.

132

Vasconcelos T., O’Meara B.C., and Beaulieu J.M. 2022. A flexible method for estimating tip diver-
sification rates across a range of speciation and extinction scenarios. Evolution 76:1420–1433.

Weber C.C., Nabholz B., Romiguier J., and Ellegren H. 2014. Kr/Kc but not dN/dS correlates
positively with body mass in birds, raising implications for inferring lineage-specific selection.
Genome Biol. 15:542.

Weir J.T. and Lawson A. 2015. Evolutionary rates across gradients. Methods Ecol Evol 6:1278–

1286.

Welch J.J. and Waxman D. 2008. Calculating independent contrasts for the comparative study of

substitution rates. J Theor Biol 251:667–678.

White E.P., Ernest S.K.M., Kerkhoff A.J., and Enquist B.J. 2007. Relationships between body size

and abundance in ecology. Trends Ecol Evol 22:323–330.

Witmer L. 1995. The extant phylogenetic bracket and the importance of reconstructing soft tissues
in fossils. Pages 19–33 in Functional Morphology in Vertebrate Paleontology (J. J. Thomason,
ed.). Cambridge University Press, New York, NY.

Wollenberg K.C., Vieites D.R., Glaw F., and Vences M. 2011. Speciation in little: The role of
range and body size in the diversification of Malagasy mantellid frogs. BMC Evol Biol 11:217.

Wright S.D., Ross H.A., Jeanette Keeling D., McBride P., and Gillman L.N. 2011. Thermal energy

and the rate of genetic evolution in marine fishes. Evol Ecol 25:525–530.

Ypma J., Johnson S.G., Stamm A., Borchers H.W., Eddelbuettel D., Ripley B., Hornik K., Chiquet
J., Adler A., Dai X., and Ooms J. 2022. nlotpr: R Interface to NLOPT. R package version 2.0.3.

Zimova M., Weeks B.C., Willard D.E., Giery S.T., Jirinec V., Burner R.C., and Winger B.M. 2023.
Body size predicts the rate of contemporary morphological change in birds. Proc Natl Acad Sci
USA 120:e2206971120.

133

APPENDIX 2A

SUPPLEMENTAL TABLES AND FIGURES

Figure 2A.1 Accuracy of model-averaged factor-rate relationships inferred using the contsimmap-
based method for detecting continuous factor-dependent rates of trait evolution for all simulations
with either constant or observed factor-dependent rates (i.e., median absolute differences between
estimated and simulated relationships on log scale, negated such that higher values correspond to
greater accuracy). Different colors correspond to different sample sizes (i.e., number of tips in
simulated phylogeny) and solid versus dashed lines to simulations without versus with random
variation in rates (“noise”) around inferred relationships.

134

-4-20241/161/81/41/21constantstrongsimplethresholdsweetspot-4-20241/161/81/41/21weak50 tips100 tips200 tipsno noisenoise-4-20241/161/81/41/21wide-4-2024accuracy (median |fold difference|−1)factor (Xo)Figure 2A.2 Bias of model-averaged factor-rate relationships inferred using the contsimmap-based
method for detecting continuous factor-dependent rates of trait evolution for all simulations with
either constant or observed factor-dependent rates (i.e., percent of overestimated model-averaged
rates). Different colors correspond to different sample sizes (i.e., number of tips in simulated
phylogeny) and solid versus dashed lines to simulations without versus with random variation in
rates (“noise”) around inferred relationships. Position of unbiased estimation depicted with thick
gray line.

135

-4-2024050100constantstrongsimplethresholdsweetspot-4-2024050100weak50 tips100 tips200 tipsno noisenoiseunbiased estimate-4-2024050100wide-4-2024bias (percent overestimated)factor (Xo)Figure 2A.3 Precision of model-averaged factor-rate relationships inferred using the contsimmap-
based method for detecting continuous factor-dependent rates of trait evolution for all simulations
with either constant or observed factor-dependent rates (i.e., the lower bound of 95% interval of
model-averaged rates divided by the corresponding upper bound). Different colors correspond to
different sample sizes (i.e., number of tips in simulated phylogeny) and solid versus dashed lines to
simulations without versus with random variation in rates (“noise”) around inferred relationships.

136

-4-202410-310-210-11constantstrongsimplethresholdsweetspot-4-202410-310-210-11weak50 tips100 tips200 tipsno noisenoise-4-202410-310-210-11wide-4-2024precision (95% interval fold difference)factor (Xo)Table 2A.1 Table of parameter values for each version of simple, threshold, sweetspot, and null parameter functions used to generate
data for the simulation study of the contsimmap-based pipeline for detecting continuous factor-dependent rates of trait evolution. In
general, we consider the “strong” versions of each function the default, which are standardized to yield an overall rate around 4 that
varies ∼20-fold as factor values range from -2 to 2. “Weak” functions are largely identical to strong functions but instead only cause
rates to vary ∼5-fold over the -2 to 2 interval. On the other hand, “wide” functions still cause rates to vary ∼20-fold but over a wider
range of factor values from -4 to 4 (note that there is no wide version of the simple function). Intercept and slope parameters were chosen
to ensure simple functions reach the minimum/maximum values of corresponding threshold/sweetspot functions at factor values of -2.5
and 2.5.

function version

strong
weak
wide

intercept (β0)
ln 4 − ln cosh ( ln 20
2 )
ln 4 − ln cosh ( ln 5
2 )
—

slope (β1) mid-rate (α)

location (θ )

rate deviation (δ ) width (ω)

ln 20
5
ln 5
5
—

ln 4
ln 4
ln 4

0
0
0

ln 20
2
ln 5
2
ln 20
2

ln 4
ln 4
ln 8

parameter

137

Table 2A.2 Proportions of times the contsimmap-based pipeline for detecting continuous factor-dependent rates of trait evolution selected
a model assuming a given factor-rate relationship as the best-fitting one (based on having the lowest sample size corrected Akaike
Information Criterion) across all simulation conditions without random variation around factor-rate relationships (“noise”). Models
assuming either constant or dummy factor-dependent rates were considered “null” models.

constant

hidden factor-dependent

simple

threshold

sweetspot

simple
strong weak

threshold
strong weak wide

sweetspot
strong weak wide

null
simple
threshold
sweetspot

null
simple
threshold
sweetspot

null
simple
threshold
sweetspot

0.96
0.02
0.02
0.00

0.96
0.02
0.01
0.01

0.93
0.03
0.00
0.04

0.98
0.00
0.01
0.01

0.98
0.00
0.00
0.02

0.95
0.00
0.00
0.05

0.99
0.01
0.00
0.00

0.99
0.00
0.00
0.01

0.97
0.00
0.00
0.03

50 tips

0.53
0.38
0.04
0.05

0.83
0.11
0.04
0.02

100 tips

0.12
0.74
0.07
0.07

0.62
0.30
0.02
0.06

200 tips

0.00
0.92
0.01
0.07

0.18
0.57
0.10
0.15

0.97
0.00
0.01
0.02

0.95
0.00
0.01
0.04

0.96
0.01
0.01
0.02

0.59
0.21
0.06
0.14

0.17
0.15
0.34
0.34

0.04
0.03
0.44
0.49

0.87
0.07
0.02
0.04

0.58
0.18
0.10
0.14

0.19
0.11
0.28
0.42

0.67
0.19
0.04
0.10

0.45
0.25
0.13
0.17

0.09
0.25
0.24
0.42

0.65
0.01
0.07
0.27

0.07
0.00
0.13
0.80

0.00
0.00
0.04
0.96

0.91
0.01
0.02
0.06

0.64
0.03
0.00
0.33

0.19
0.01
0.04
0.76

0.82
0.06
0.03
0.09

0.42
0.05
0.10
0.43

0.13
0.01
0.11
0.75

138

Table 2A.3 Proportions of times the contsimmap-based pipeline for detecting continuous factor-dependent rates of trait evolution selected
a model assuming a given factor-rate relationship as the best-fitting one (based on having the lowest sample size corrected Akaike
Information Criterion) across all simulation conditions with random variation around factor-rate relationships (“noise”) are given without
and within parentheses, respectively. Models assuming either constant or dummy factor-dependent rates were considered “null” models.

constant

hidden factor-dependent

simple

threshold

sweetspot

simple
strong weak

threshold
strong weak wide

sweetspot
strong weak wide

null
simple
threshold
sweetspot

null
simple
threshold
sweetspot

null
simple
threshold
sweetspot

0.98
0.00
0.01
0.01

0.93
0.00
0.04
0.03

0.87
0.00
0.04
0.09

0.98
0.00
0.01
0.01

0.95
0.00
0.02
0.03

0.95
0.00
0.02
0.03

0.96
0.01
0.03
0.00

0.97
0.00
0.01
0.02

0.92
0.00
0.01
0.07

50 tips

0.74
0.19
0.02
0.05

0.86
0.07
0.02
0.05

100 tips

0.32
0.39
0.04
0.25

0.68
0.13
0.05
0.14

200 tips

0.07
0.64
0.12
0.17

0.52
0.28
0.07
0.13

0.94
0.00
0.02
0.04

0.92
0.01
0.04
0.03

0.97
0.00
0.00
0.03

0.64
0.17
0.08
0.11

0.29
0.10
0.32
0.29

0.04
0.05
0.52
0.39

0.90
0.04
0.01
0.05

0.66
0.04
0.12
0.18

0.31
0.08
0.29
0.32

0.82
0.09
0.05
0.04

0.47
0.18
0.13
0.22

0.23
0.16
0.25
0.36

0.70
0.01
0.03
0.26

0.20
0.00
0.07
0.73

0.02
0.00
0.07
0.91

0.90
0.00
0.01
0.09

0.64
0.03
0.06
0.27

0.37
0.01
0.06
0.56

0.81
0.04
0.02
0.13

0.65
0.03
0.08
0.24

0.17
0.02
0.16
0.65

139

Table 2A.4 Proportions of times the contsimmap-based pipeline for detecting continuous factor-dependent rates of trait evolution selected
a model assuming a given factor-rate relationship as the best-fitting one (based on having the lowest Bayesian Information Criterion; d.
denotes models assuming relationships with simulated “dummy” factors, while o. denotes models assuming relationships with observed
factors) across all simulation conditions without random variation around factor-rate relationships (“noise”).

constant

hidden factor-dependent

simple

threshold

sweetspot

simple
strong weak

threshold
strong weak wide

sweetspot
strong weak wide

50 tips

100 tips

200 tips

constant
d. simple
d. threshold
d. sweetspot
o. simple
o. threshold
o. sweetspot

constant
d. simple
d. threshold
d. sweetspot
o. simple
o. threshold
o. sweetspot

constant
d. simple
d. threshold
d. sweetspot
o. simple
o. threshold
o. sweetspot

0.38
0.49
0.08
0.03
0.00
0.02
0.00

0.35
0.49
0.07
0.07
0.02
0.00
0.00

0.40
0.54
0.01
0.03
0.02
0.00
0.00

0.05
0.78
0.12
0.05
0.00
0.00
0.00

0.01
0.71
0.12
0.14
0.01
0.00
0.01

0.00
0.65
0.21
0.10
0.01
0.00
0.03

0.02
0.80
0.08
0.09
0.01
0.00
0.00

0.01
0.67
0.13
0.19
0.00
0.00
0.00

0.00
0.69
0.12
0.17
0.01
0.00
0.01

0.01
0.42
0.04
0.04
0.48
0.01
0.00

0.00
0.10
0.01
0.00
0.87
0.02
0.00

0.00
0.00
0.00
0.00
0.99
0.00
0.01

0.07
0.67
0.04
0.02
0.18
0.01
0.01

0.05
0.45
0.05
0.01
0.44
0.00
0.00

0.00
0.12
0.01
0.00
0.87
0.00
0.00

0.03
0.46
0.03
0.08
0.34
0.02
0.04

0.01
0.14
0.02
0.02
0.49
0.13
0.19

0.01
0.05
0.00
0.01
0.17
0.39
0.37

0.10
0.64
0.06
0.03
0.15
0.01
0.01

0.04
0.51
0.02
0.03
0.34
0.02
0.04

0.02
0.13
0.00
0.01
0.57
0.09
0.18

0.09
0.44
0.09
0.06
0.29
0.02
0.01

0.03
0.37
0.01
0.02
0.51
0.01
0.05

0.01
0.06
0.02
0.01
0.65
0.13
0.12

0.03
0.54
0.10
0.07
0.06
0.02
0.18

0.01
0.08
0.01
0.00
0.04
0.09
0.77

0.00
0.00
0.00
0.00
0.00
0.04
0.96

0.13
0.75
0.04
0.02
0.02
0.00
0.04

0.00
0.55
0.05
0.08
0.10
0.00
0.22

0.00
0.26
0.01
0.02
0.07
0.01
0.63

0.10
0.67
0.08
0.05
0.06
0.00
0.04

0.04
0.44
0.05
0.07
0.13
0.04
0.23

0.00
0.18
0.01
0.02
0.08
0.08
0.63

0.02
0.69
0.12
0.15
0.00
0.00
0.02

0.00
0.70
0.06
0.20
0.00
0.01
0.03

0.01
0.51
0.21
0.23
0.01
0.01
0.02

140

Table 2A.5 Proportions of times the contsimmap-based pipeline for detecting continuous factor-dependent rates of trait evolution selected
a model assuming a given factor-rate relationship as the best-fitting one (based on having the lowest Bayesian Information Criterion; d.
denotes models assuming relationships with simulated “dummy” factors, while o. denotes models assuming relationships with observed
factors) across all simulation conditions with random variation around factor-rate relationships (“noise”).

constant

hidden factor-dependent

simple

threshold

sweetspot

simple
strong weak

threshold
strong weak wide

sweetspot
strong weak wide

50 tips

100 tips

200 tips

constant
d. simple
d. threshold
d. sweetspot
o. simple
o. threshold
o. sweetspot

constant
d. simple
d. threshold
d. sweetspot
o. simple
o. threshold
o.sweetspot

constant
d. simple
d. threshold
d. sweetspot
o. simple
o. threshold
o. sweetspot

0.14
0.57
0.24
0.04
0.00
0.01
0.00

0.09
0.46
0.26
0.15
0.00
0.02
0.02

0.08
0.54
0.19
0.12
0.00
0.04
0.03

0.01
0.67
0.19
0.12
0.00
0.01
0.00

0.00
0.64
0.09
0.23
0.00
0.02
0.02

0.00
0.62
0.23
0.12
0.00
0.02
0.01

0.01
0.59
0.14
0.23
0.01
0.02
0.00

0.00
0.58
0.19
0.21
0.00
0.01
0.01

0.00
0.45
0.32
0.19
0.02
0.00
0.02

0.00
0.42
0.12
0.17
0.26
0.02
0.01

0.00
0.14
0.06
0.04
0.71
0.00
0.05

0.00
0.02
0.02
0.03
0.86
0.05
0.02

0.02
0.53
0.20
0.12
0.11
0.01
0.01

0.01
0.39
0.12
0.12
0.27
0.03
0.06

0.00
0.24
0.06
0.07
0.55
0.04
0.04

0.01
0.33
0.14
0.14
0.24
0.06
0.08

0.00
0.21
0.04
0.08
0.24
0.21
0.22

0.00
0.04
0.02
0.00
0.17
0.44
0.33

0.00
0.59
0.18
0.15
0.07
0.00
0.01

0.02
0.37
0.12
0.18
0.15
0.09
0.07

0.00
0.27
0.05
0.02
0.36
0.11
0.19

0.01
0.54
0.17
0.11
0.12
0.01
0.04

0.00
0.26
0.13
0.09
0.38
0.06
0.08

0.00
0.13
0.02
0.08
0.51
0.14
0.12

0.03
0.46
0.14
0.13
0.05
0.02
0.17

0.00
0.10
0.09
0.06
0.00
0.07
0.68

0.00
0.00
0.00
0.02
0.00
0.07
0.91

0.01
0.56
0.24
0.14
0.01
0.01
0.03

0.01
0.42
0.17
0.13
0.05
0.03
0.19

0.01
0.23
0.07
0.13
0.04
0.02
0.50

0.03
0.49
0.16
0.14
0.08
0.01
0.09

0.01
0.35
0.17
0.14
0.07
0.06
0.20

0.00
0.12
0.06
0.04
0.05
0.13
0.60

0.03
0.59
0.15
0.22
0.00
0.01
0.00

0.00
0.43
0.23
0.28
0.01
0.02
0.03

0.00
0.49
0.24
0.25
0.00
0.00
0.02

141

APPENDIX 2B

STOCHASTIC APPROXIMATION OF LIKELIHOOD FUNCTION GRADIENTS

As discussed in subsection 2.2.2, likelihood functions for continuous factor-dependent Brown-

ian motion models (at least when conditioned on contsimmapped–as opposed to jointly inferred–

factor histories) appear to often feature multiple local optima and relatively flat ridges, which can

pose severe challenges for numerical optimization routines. Accordingly, while simplex-based op-

timization algorithms like Nelder-Mead (Nelder and Mead, 1965) and Subplex (Rowan, 1990) are

commonly used to fit models to phylogenetic comparative data (e.g., Beaulieu et al., 2013), initial

tests revealed that simplex-based algorithms tend to be very sensitive to initial parameter values

in the case of our method. Specifically, given poor initial estimates of parameter values, simplex-

based algorithms tended to terminate in suboptimal peaks or ridges of the likelihood surface and/or

take impractically long amounts of time to converge. Thus, we sought to use gradient-based opti-

mization algorithms, which leverage information on how likelihoods change around a given point

on the likelihood surface (i.e., gradients) to more rapidly and reliably converge on high-likelihood

regions of parameter space given an arbitrary starting point. Unfortunately, there is no simple,

closed-form solution for computing gradients of factor-dependent Brownian motion likelihood

functions (barring the full-blown implementation of an automatic differentiation engine), so we

instead focused on approximating gradients using finite differences.

Let x denote a n-length vector of parameter values, f the corresponding likelihood function,

and g(x) the (unknown) gradient of f about x. Note that g(x), unlike f (x), does not represent a

single number but instead an n-length vector of “slopes” of the likelihood surface ( f ) along each

of its input dimensions, corresponding to each of the n different parameters. Thus, the ith entry of

g(x) can approximated by calculating a central finite difference about xi:

g(x)i ≈

f (x + hi1i) − f (x − hi1i)
2hi

(1)

Where hi is a small step size to take along the dimension corresponding to the ith parameter

and 1i an n-length indicator vector with 1 in its ith entry and 0s in all other entries. In practice, hi

142

should be small enough to accurately approximate the instantaneous slope/partial derivative of f

with respect to the ith parameter value, but large enough to avoid numerical errors stemming from

rounding during floating point arithmetic operations (Sauer, 2013). To this end, in our implemen-

tation, we define hi as:





√
3

ε/2

√
|xi| 3

ε/2

hi =

if |xi| ≤ 1

if |xi| > 1

(2)

Where ε denotes machine epsilon (∼ 1e−16 for R on a typical computer).

Unfortunately, Eq. (1) breaks down when f (x + hi1i) and/or f (x − hi1i) are undefined, which

occurs frequently under some circumstances due to the wide variety of parameter functions and

boundaries that may specified using our implementation. Notably, undefined values of f can also

result from the fact that our pruning algorithm rounds rates/variances to 0 when they fall below a

certain threshold (

√

ε by default), as we found “pseudoinversion” of matrices including 0s (Has-

sler et al., 2022) generally produced more numerically stable likelihood calculations compared

to direct inversion of matrices with arbitrarily small values. In rare cases, such rounding implies

distinct trait measurements must be exactly identical, yielding undefined likelihoods due to un-

expected contradictions in observed trait data. Fortunately, we only use g(x) to help numerical

optimization algorithms explore parameter spaces efficiently, rather than precisely identify peaks

in the likelihood surface (somewhat analogously to stochastic gradient descent algorithms com-

monly employed in machine learning; Bottou et al., 2018). Thus, we implemented a crude but

pragmatic procedure to roughly approximate g(x) in cases involving undefined evaluations of f :

If L = f (x) is undefined, randomly sample entries of g(x) by drawing either −0.1 or 0.1 with

equal probability. Otherwise, for each parameter i from 1 to n:

1) Complete the “forward” likelihood evaluation, L+ = f (x + hi1i), as well as the “back-

ward” evaluation, L− = f (x − hi1i).

143

2) If both L+ and L− are undefined, randomly sample g(x)i by drawing either −0.1 or 0.1

with equal probability.

3) If only L+ is defined, set g(x)i to the forward difference, L+−L
hi

. If only L− is defined,

set g(x)i to the backward difference, L−L−
hi

.

4) If both L+ are L− are defined, set g(x)i to the central difference, L+−L−

2hi

.

While switching between the central, forward, or backward finite difference approximations

based on which likelihood function evaluations are defined is quite intuitive, there is no theoretical

basis to justify sampling random slopes when finite differences are undefined. We settled on this

particular procedure through trial and error by implementing a few different strategies for handling

undefined gradients and investigating the resulting performance of NLOPT’s truncated Newton

algorithm (Dembo and Steihaug, 1983) in fitting factor-dependent Brownian motion models to

several of the simulated and empirical datasets described in subsections sections 2.2.3 and 2.2.4.

Generally, we found that leaving entries of g(x) undefined frequently caused errors that lead to

premature termination of the algorithm, while setting undefined entries of g(x) to 0 (or sampling

g(x) from a narrow normal distribution centered at 0) resulted in the algorithm getting “stuck”

in complex boundary regions of parameter space. Accordingly, our final strategy is meant to

consistently sample slopes substantially different from 0 to prevent the algorithm from getting

stuck, while not sampling slopes so large as to make the algorithm “fly off” in some random

direction. Nonetheless, in the context of our method, the former condition proved much more vital

to the performance of gradient-based optimization than the latter. Should these sampled slopes

send the algorithm to effectively random regions of parameter space, subsequent iterations and/or

follow-up optimization algorithms (e.g., simplex-based, principal axis) will in any case continue

to improve on these estimated parameter values. Ultimately, the specific slopes sampled to replace

undefined entries of g(x) should have very little impact on final parameter estimates as long as the

slopes substantially differ from 0, and our admittedly non-systematic investigation of model fitting

performance largely supported this conclusion.

144

Unfortunately, this gradient approximation strategy has one more key problem: its computa-

tional complexity scales with the number of parameters. Because the likelihood function must

reevaluated 2n times to approximate a single gradient, fitting models with many parameters via

gradient-based optimization can become prohibitively slow under this approach, yet the goal of im-

plementing this gradient approximation procedure is precisely to more efficiently and robustly fit

complex, parameter-rich models in the first place. To mitigate this problem, we exploit the fact that

our model fitting procedure works by taking weighted averages of likelihoods conditional on each

contsimmap. In agreement with typical distributions of likelihoods under factor-depepdent contin-

uous trait evolution models conditional on discrete simmaps (Boyko et al., 2023b), contsimmap-

based models generally yield only a few substantially high conditional likelihoods for any given

vector of parameter estimates, with the majority of conditional likelihoods being low to negligi-

ble. In other words, relatively few contsimmaps will often contribute to nearly all variation in f

about a certain point, rendering full reevaluation of the likelihood function largely unnecessary.

Instead, one may reevaluate the likelihood much more quickly based on a relatively small subsam-

ple of contsimmaps, similar in spirit to how stochastic gradient descent algorithms subsample data

before computing gradients to speed up calculations (Bottou et al., 2018).

To outline our contsimmap subsampling procedure more explicitly, let M denote the total num-

ber of contsimmaps and L an M-length vector of their associated normalized conditional likeli-

hoods (i.e., L is normalized to have a sum of 1) sorted in decreasing order. To define an appropriate

subsample size for gradient calculations, m, we find the position, j, of the first entry of L to cumula-

tively sum to greater than a user-specified quality parameter between 0 and 1, q (defaulting to 0.9),

and set m = 2 j. Then, m contsimmaps are randomly sampled without replacement in proportion to

their normalized conditional likelihoods, L, generating stochastic subsamples that consist of con-

tsimmaps with especially high conditional likelihoods. We chose to set m = 2 j because it resulted

in the subsampled normalized conditional likelihoods summing to around q or greater rather con-

sistently. Notably, there are some edge cases which may force m to be lower than 2 j: in particular,

any normalized conditional likelihoods that underflow to 0 are never subsampled, and, to prevent

145

excessively large subsamples, m is never allowed to exceed qM rounded to the nearest integer.

Generally speaking, this procedure yielded gradients accurate to at least a couple of decimal digits

while only using around a tenth of all the contsimmaps used in a given factor-dependent Brown-

ian motion model–at least when applied to fitting factor-dependent Brownian Motiopn models to

several of the simulated and empirical datasets described in subsections sections 2.2.3 and 2.2.4.

Ultimately, while the crudeness of our gradient approximation procedure is likely inappropriate

for fully gradient-based optimization, it nonetheless works well enough to find high likelihood re-

gions of parameter space much more efficiently and consistently than simplex-based optimization.

146

APPENDIX 2C

GENERATING CONTSIMMAPS UNDER EVORATES MODELS

Empirical data on height variation across the Eucalyptus subgenus Eucalyptus (hereafter Mono-

calypts) exhibits strong evidence of evolutionary rate variation. Overall, Monocalypts have accu-

mulated a little over 4 units of log height variation (corresponding to around a 70-fold difference)

over roughly 30 million years of evolution, yet several groups of closely-related species in the

section Eucalyptus span 1.5 to 3 units of log height variation (i.e., ∼5 to 20-fold differences) de-

spite originating less than 2 million years ago. For example, E. obliqua L’Hér is around 40 m

tall on average, yet shares a 1.5 million-year-old common ancestor with the 3-4 m tall E. greg-

soniana L.A.S.Johnson & Blaxell and E. kybeanensis Maiden & Camabage. Likewise, the most

recent common ancestor of the 40 m tall E. delegatensis R.T.Baker and the 2 m tall E. cunning-

hamii Sweet dates to a mere 650,000 years ago. While our data do suggest Monocalypts exhibit

substantial levels of within-species height variation (usually ranging between some 50 and 200%

the species mean), these disparities among recently-diverged lineages appear too extreme and fre-

quent within the section Eucalyptus to result from errors in estimating mean species heights alone.

Such rate variation violates the assumptions of a constant-rate Brownian motion model, resulting

in inflated estimates of rates of height evolution and highly uncertain ancestral state reconstruc-

tions. Additionally, across-species differences in observed height data tends to be attributed to

within-species variation rather than evolutionary divergence under constant-rate models, causing

an overall “shrinkage” of inferred mean heights at the tips. To mitigate these issues, we decided

to take advantage of our contsimmapping algorithm’s ability to handle evolutionary rate variation

according to mapped regimes, fitting models that allow rates to vary to the Monocalypt height data

using the evorates package (Martin et al., 2023) and implementing tools to generate contsimmaps

under such models.

More specifically, we fit four models to the Monocalypt height data: 1) a “trend” model

whereby rates of height evolution exponentially decrease or increase over time (equivalent to an

“early/late burst” model), 2) a “rate variance” model whereby different subclades gradually di-

147

verge in rates of height evolution, 3) a “full” model combing both the trend and rate variance

models, and 4) a “null” model with constant rates (equivalent to a constant-rate Brownian motion

model). We fit all models using four independent Hamiltonian Monte Carlo chains consisting of

2,000 iterations, discarding the first 1,000 as warmup for a total of 4,000 posterior samples. All

chains adequately converged ( ˆR < 1.01) and achieved sufficient effective sample sizes (effective

sample sizes > 100 per chain) (Stan Development Team, 2019). Ultimately, we found high sup-

port for increasing rates through time under the trend model (posterior probability > 0.99) and

substantial rate heterogeneity among clades under the rate variance model (Savage-Dickey ratio

< 0.01). However, the full model yielded equivocal support for both increasing (posterior prob-

ability = 0.75) and heterogeneous rates (Savage-Dickey ratio = 0.36), suggesting that apparent

variation in rates of Monocalypt height evolution could be due to accelerating rates, differences in

rates among subclades, or both. Indeed, there was a strong negative correlation between posterior

samples of the trend and rate variance parameters, such that the full model’s posterior consisted

of both trend-like and rate variance-like models (Fig. 2C.1). This conclusion is further supported

by a cursory look at the posterior samples of likelihoods under each model, as the posterior likeli-

hoods under the full model overlap with those under both the trend and rate variance models (Fig.

2C.2) We thus chose to integrate over this uncertainty in the underlying height evolution model,

generating 4,000 contsimmaps (resolution/ξ = 100) of Monocalypt height for each posterior sam-

ple under the full model. We ultimately found that generating contsimmaps of Monocalypt height

evolution under the null model yielded unrealistically wide distributions of sampled height values

while simultaneously “rounding out” observed across-species differences in height. On the other

hand, contsimmapping under the full model yielded inferred ancestral states and tip means more

concordant with the observed height data (Figs. 2C.3 and 2C.4).

While contsimmaps under the full model generally exhibited a narrower distribution of height

values, we nonetheless chose to filter out 516 (∼13%) of the 4,000 contsimmaps that included

biologically unrealistic height values less than 10 cm or greater than 150 meters (by comparison,

822 or ∼21% of contsimmaps under the null model included height values outside this range).

148

Figure 2C.1 Posterior samples of the trend and rate variance parameters inferred under the full,
evolving rates model fit to the Monocalypt height data. Note the negative correlation between pos-
terior samples of these parameters, indicating that Monocalypt height evolution is largely consis-
tent with either rapid accumulation of random evolutionary rate variation (i.e., high rate variance,
trend around 0) or accelerating evolutionary rates over time (i.e., low rate variance, positive trend).

Then, to render downstreams analyses more manageable, we thinned the resulting set of 3,484

contsimmaps to every tenth contsimmap, which was the maximum thinning rate whereby effective

sample sizes (Geyer, 2011; Stan Development Team, 2019) of height values at each time point

remained above 100. This procedure yielded the final set of 349 contsimmaps used in our empirical

case study of Monocalypts. Conveniently, our procedure for generating contsimmaps under models

fitted via the evorates package requires contsimmapping both the focal trait/factor (height in this

case) as well as the rates at which the focal trait/factor evolved. Thus, to generate dummy factor

histories, we simply simulated the evolution of a trait with starting trait values and contsimmapped

rates identical to those in our final sample of 349 contsimmaps.

We now outline the details of our approach to generating contsimmaps based on posterior

149

-0.20.00.20.40.610-310-210-11trendrate varianceFigure 2C.2 Posterior traces of likelihoods of the Monocalypt height data under the null, constant-
rate Brownian motion model; the trended, “early/late” burst model; the rate variance model
whereby rates gradually diverge among lineages over time; and the full model including both an
overall trend in rates over time and rate variance. The partially-shaded “ribbons” represent rolling
95% credible intervals of posterior likelihood samples. Different colors correspond to the differ-
ent models, while the angle of the shading in each ribbon correspond to independent Hamiltonian
Monte Carlo chains (note that the lines representing the actual traces of posterior likeihoods for
independent chains are not distinguishable). Note that the ribbon corresponding to the full model
(in light yellow) overlaps with those for the rate variance and (to a lesser extent) trend models (in
a medium shade of green and dark purple, respectively).

150

02004006008001000-170-160-150-140-130-120modelnulltrendrate variancefullchain 1chain 2chain 3chain 4iteration (post-warmup)ln likelihoodFigure 2C.3 Phenograms depicting the overall distribution of all 4,000 contsimmaps of Monocalypt
height (i.e., prior to any filtering/thinning of samples) under both the null, constant-rate Brownian
motion model and the full, evolving rates model including both an overall trend in rates over
time and rate variance. The thick, solid lines depict the “central tendency” of the contsimmaps,
corresponding to mean height value across all contsimmaps for each lineage in the Monocalypt
phylogeny. To depict the overall range of the contsimmapped height values–represented by the
lighter, dashed lines–we first calculated 95% confidence intervals of sampled height values for
each lineage, then took the minimum and maximum interval bounds at 100 equally-spaced time
points spanning the height of the phylogeny. Note that ancestral reconstructions were generally
more precise under the full model.

151

30201001/3131030100null model3020100full modelheight (m)millions of year before presentFigure 2C.4 Inferred versus observed mean species heights at the tips of the Monocalypt phylogeny
under both the null, constant-rate Brownian motion model and the full, evolving rates model in-
cluding both an overall trend in rates over time and rate variance. Correlation coefficients, denoted
r, for the relationship between inferred and observed heights are provided in the top left corner of
each plot. Note that inferred mean species heights under the full model better align with the dashed
line, which depicts the position of observed mean species heights along the vertical axis.

samples of parameters under arbitrary models fitted via the evorates package. Briefly, models

implemented in the evorates package generally work by assuming the rate parameter of a Brownian

motion model of trait evolution itself “evolves” according to a constant-rate geometric Brownian

motion process (i.e., the natural log of the evolutionary rates evolve according to a constant-rate

Brownian motion process). Thus, the model infers a “rate evolution process” controlled by a trend

(µσ 2) parameter determining whether rates tend to decrease or increase over time, as well as a rate
variance (σ 2

σ 2) parameter controlling how quickly random variation in rates accumulates over time.
0 ) and average rates

Additionally, the model estimates both the rate at the root of the phylogeny (σ 2

of evolution along each branch of the phylogeny, which we term “branchwise rates” or σ 2

e , where

152

r = 0.9312481632641248163264null modelr = 0.971248163264full modelobserved mean tip heights (m)inferred mean tip heights (m)e denotes the index of a particular edge. At a broad level, to generate contsimmaps under such

models, we implemented a two-step procedure whereby the inferred rate evolution process and

branchwise rates are first used to generate contsimmaps of evolutionary rates of a user-specified

resolution, which are in turn used to generate contsimmaps of trait (or factor) values of the same

resolution by treating the rate contsimmaps as high-resolution regime maps (i.e., a regime for each

time point spanning the preceding time interval). We summarize this process graphically in Fig.

2C.5.

Unfortunately, because the time-average of a geometric Brownian motion process has no

closed-form probability distribution (Lepage et al., 2007), there is no simple and/or exact method

for sampling rate values at arbitrary time points across a phylogeny (i.e., to generate rate con-

tsimmaps) conditional on the inferred rate evolution process and branchwise rates. That being

said, we can take advantage of the crude but effective approximation of geometric Brownian mo-

tion time-averages used by the evorates package itself (see APPENDIX 1B; see also Dufresne,

2004; Welch and Waxman, 2008; Martin et al., 2023). Namely, we can assume each branchwise

rate is the sum of a trend and noise component, with the noise component assumed to follow the

distribution of geometric, rather than arithmetic, time-averages of an untrended (i.e., µσ 2 = 0) ge-

ometric Brownian motion process. Under this assumption, we can derive straight-forward normal

distributions describing how the natural log of rates at the nodes of a phylogeny are distributed.

To start out, let ln σ 2

τ1 and ln σ 2

τ2 represent the natural log of the starting and ending values of

a geometric Brownian motion process over an interval of length t with trend µσ 2, rate variance
σ 2, and time-average σ 2. Then the distributions of ln σ 2
σ 2
approximation are given by:

τ2 under the aforementioned

τ1 and ln σ 2

153

Figure 2C.5 A brief graphical summary of our procedure for generating contsimmaps under evolv-
ing rates models, using the full, evolving rates model fit to the Monocalypt height data as an
example. The phenograms on bottom depict the distributions of: inferred average rates of trait
evolution along each branch (i.e., branchwise rates), which form the main input for our proce-
dure (left); 2) the contsimmapped rate values sampled conditional on the inferred rate evolution
process and branchwise rates, which are generated by the first step of our procedure (middle);
and 3) contsimmapped height values sampled conditional on the observed height data at the tips
and contsimmapped rates, which are generated by the second step of our procedure (right). Each
phenogram consists of the overall mean rates/heights for each lineage in solid black lines, as well
as a single example posterior sample/contsimmap in thinner lines, which are colored according
to their position along the vertical axis (different color gradients are used to visually distinguish
rates from heights). The phylograms on top provide an alternate view of the example posterior
samples/contsimmaps, with rate/height values represented solely by these color gradients rather
than positions along vertical axes. The lighter, dashed lines in each phenogram on bottom depict
the overall range of the rate/heights, and were derived from calculating 95% confidence intervals
along each lineage, then taking the minimum and maximum interval bounds at 100 equally-spaced
time points spanning the height of the phylogeny.

154

branchwise ratescontsimmapped ratescontsimmapped heights302010010-410-310-210-11rate of height evolution302010030201001/3131030100height (m)millions of year before present(cid:32)

ln σ 2
τ1

∼ N

ln σ 2 − β ,

(cid:33)

σ 2
σ 2t
3

(cid:32)

ln σ 2
τ2

∼ N

ln σ 2 − β + µσ 2t,

(cid:33)

σ 2
σ 2t
2



0

β =

if µσ 2 = 0



ln |eµσ 2t − 1| − ln |µσ 2| − lnt

if µσ 2 ̸= 0

(1)

(2)

(3)

Where N(µ, σ 2) denotes a normal distribution with mean µ and variance σ 2. We can gen-

eralize these results to define normal distributions for the natural log of rates at each node in a

phylogeny given branchwise rates along each edge based on basic algebra of normal random vari-

ables. Specifically, Let σ 2

e represent the rate at the node immediately descending from edge e (not

to be confused with σ 2

e , the branchwise rate along edge e), then:

(cid:32)

(cid:32)

ln σ 2

e ∼ N

α −1
e

2(ln σ 2
e − βe)
te

+ ∑

d∈des(e)

(cid:33)

3(ln σ 2
d − βd)
td

(cid:33)

+ µσ 2τe, α −1

e σ 2
σ 2

+ ∑

d∈des(e)

3
td

αe =

βe =

2
te


0


ln |eµσ 2 τe − eµσ 2 τanc(e)| − ln |µσ 2| − lnte

if µσ 2 = 0

if µσ 2 ̸= 0

(4)

(5)

(6)

Where te now denotes the length of edge e, τe the height (i.e., distance from root) of the node

immediately descending from edge e, des(e) the indices for all edges immediately descending

from edge e, and anc(e) the index for the edge immediately ancestral to edge e. Based on these

formulae, we implemented a root-to-tips or preorder traversal algorithm that generates complete

rate contsimmaps by jointly sampling rate values at the time points along each edge conditional

on each edge’s branchwise rate and sampled rate value at the edge’s immediately ancestral node.

Under our approximation, the joint distribution of the natural log of rate values along a given

edge follows a straight-forward multivariate normal distribution conditioned to have a particular

155

weighted sum given by the corresponding branchwise rate. Let ⃗σ 2

e and ⃗τe denote ne-length vectors

of the rate values and corresponding time points along edge e (i.e., the neth entries correspond to

the node immediately descending from edge e; σ 2
that ⃗σ 2

e must have a particular weighted sum, the distribution of ⃗σ 2

e = ⃗σ 2

e is given by:

e,ne and τe = ⃗τe,ne). Ignoring the condition

ln ⃗σ 2

e ∼ MVN

(cid:32)(E[ln σ 2

e ] − ln σ 2
anc(e))(⃗τe − τanc(e))
te

(cid:33)

+ ln σ 2

anc(e), Σe

Σe,i, j =

σ 2(τe − max { ⃗τe,i, ⃗τe, j})(min { ⃗τe,i, ⃗τe, j} − τanc(e)) + Var(ln σ 2
σ 2

e )(cid:112) ⃗τe,i ⃗τe, j

te

(7)

(8)

Where MVN(⃗µ, Σ) denotes a multivariate normal distribution with mean vector ⃗µ and

variance-covariance matrix Σ, E[ln σ 2

e ] and Var(ln σ 2

e ) are given by Eq. (4), and Σe,i, j is the en-

try in the ith row and jth column of matrix Σe. Notably, for edges descending from the root node,

anc(e) corresponds to the inferred root rate parameter, ln σ 2

ln σ 2
an approximate branchwise rate for a given sample of ⃗σ 2

e as the weighted sum:

0 , and τanc(e) to 0. We can calculate

ˆ
σ 2
e =

1
te

n
∑
i=1

⃗σ 2
e ( ⃗τe,i − ⃗τe,i−1)

(9)

Where ⃗τe,0 is replaced with τanc(e) in the above expression (or with 0 if e is an edge descending

from the root node). Here, we assume each sampled rate at a given time point is constant along

its preceding time interval, which greatly simplifies calculations and negligibly differs from more

accurate interpolations of rates along an edge given a sufficiently dense set of time points (notably,

we use the same approximation to calculate likelihoods of continuous factor-dependent Brownian

motion models).

We use Markov chain Monte Carlo to sample from Eq. (7) under a highly informative prior

conditioning the sampled branchwise rate,

ˆ
σ 2
e , to approximately equal the inferred branchwise rate,

σ 2

e . Specifically, we place a normal prior on ln

ˆ
e with mean ln σ 2
σ 2

e and a user-specified variance

or “tolerance”, which defaults to 0.001. Lower tolerance values will produce more accurate rate

samples at the cost of increased computation time (and vice versa for higher values). To simplify
this procedure, we use an “uncentered” parameterization and do not sample ⃗σ 2

e directly. Instead,

156

we sample ne independent standard normal random variables, ⃗z, which are transformed to fol-
e⃗z + E[ln ⃗σ 2

1
2
e denotes the lower triangular

low the distribution given by Eq. (7) via Σ

e ], where Σ

1
2

Cholesky decomposition of Σe (Betancourt and Girolami, 2019). We propose new samples of ⃗z

by simply adding ne normal random variables with mean 0 and variance 0.01 to the previously

sampled values of⃗z. For each edge, Markov chain Monte Carlo sampling of rate values is run for a

user-specified maximum number of iterations (which defaults to 100,000), but terminates early if

(ln

ˆ
e − ln σ 2
σ 2

e )2 is less than the user-specified tolerance value (i.e., the sampled branchwise rate is

within a standard deviation of the normal prior’s mean). So far, we have found that this procedure

essentially never reaches the maximum number of iterations under our default settings, ensuring a

close match between inferred branchwise rates and the sampled branchwise rates in the resulting

rate contsimmaps.

Finally, after generating rate contsimmaps, we can again assume that the sampled rates at each

time point are constant along the preceding interval, forming a high-resolution map of regimes

corresponding to different rates for each time interval (i.e., all time points are critical time points).

Then, we can simply use the contsimmapping algorithm described in subsection 2.2.1 to sam-

ple trait/factor values at all time points conditional on the contsimmapped rates and observed

trait/factor data.

157

CHAPTER 3

A NEW APPROACH FOR INFERRING STATE-DEPENDENT VARIATION IN
CONTINUOUS TRAIT EVOLUTION DYNAMICS

3.1

Introduction

Both paleontological and, more recently, phylogenetic comparative studies demonstrate that

variation in the tempo and mode of macroevolutionary processes like phenotypic evolution and

lineage diversification is pervasive across the tree of life (Simpson, 1944; Gingerich, 2009; Jablon-

ski, 2017; Sauquet and Magallón, 2018; Harmon et al., 2021), resulting in the uneven distribution

of biodiversity across space and taxa observed today. Elucidating the mechanisms driving this ap-

parent “evolutionary heterogeneity”–for example, novel ecological opportunities or variation in life

history traits–is critical for understanding many patterns that have long puzzled biologists, like the

anomalous hyperdiversity of some taxa compared to their closest relatives (e.g., flowering plants,

bats, beetles; Davies et al., 2004; Crepet and Niklas, 2009; Brock Fenton and Simmons, 2015;

Stork et al., 2015) or the elevated species richness/phenotypic diversity of tropical ecosystems ver-

sus temperate ones (Hillebrand, 2004; Stevens et al., 2006; Schumm et al., 2019; Diamond and

Roy, 2023; Saupe, 2023). Accordingly, recent decades have seen the development of many robust

and powerful methods for investigating potential drivers of heterogeneity in both lineage diversi-

fication (Maddison et al., 2007; FitzJohn, 2010; Goldberg et al., 2011; FitzJohn, 2012; Beaulieu

and O’Meara, 2016; Rabosky and Goldberg, 2017; Vasconcelos et al., 2022) and discrete trait

evolution dynamics (Beaulieu et al., 2013; Goldberg and Foo, 2020; Boyko and Beaulieu, 2021,

2023). However, despite the plethora of available methods for estimating evolutionary correla-

tions between continuously-measured traits and other variables (Felsenstein, 1985; Martins and

Hansen, 1997; Hansen et al., 2008; Bartoszek et al., 2012; Felsenstein, 2012; Tolkoff et al., 2018;

Hassler et al., 2022b), approaches for inferring heterogeneity in continuous trait evolution dynam-

ics (e.g., rate shifts, changes in the frequency/magnitude of evolutionary “pulses” sensu Landis

and Schraiber, 2017) continue to lag behind those for lineage diversification and discrete traits in

several key ways.

158

Most available methods for estimating how some variable or “factor” (e.g., habitat, repro-

ductive strategy) is associated with heterogeneity in continuous trait evolution dynamics rely on

“sequential” approximations consisting of two main steps: 1) inferring the evolutionary history of

the factor via ancestral state reconstruction, and 2) fitting factor-dependent continuous trait evolu-

tion models based on these reconstructions. Such approaches are cumbersome and exhibit notable

biases under certain conditions (Revell, 2013; May and Moore, 2020; Boyko et al., 2023b). In

particular, by completely ignoring continuous trait data in the first step, these methods tend to

rely on rather improbable reconstructions of factor histories (Caetano et al., 2018; Boyko et al.,

2023b). This is because the continuous trait data reflect past evolutionary processes which, as-

suming a statistically well-supported relationship between the factor and continuous trait evolu-

tion dynamics, provide substantial information regarding the factor’s evolutionary history beyond

the factor data itself (which is nearly always limited to “phylogenetically sparse” measurements

among extant/fossilized taxa). Furthermore, methods based on sequential approximations tend

to simplistically assume all heterogeneity in continuous trait evolution is caused by only one to

a few explicitly-measured factors. In reality, however, continuous trait evolution dynamics are

presumably affected by a tangled web of countless interconnected and context-dependent factors,

inevitably generating “residual” or “background” heterogeneity in evolutionary dynamics on top of

any heterogeneity associated with the particular factors a given study happens to focus on (Cooper

and Purvis, 2009; May and Moore, 2020; Boyko et al., 2023b; Tribble et al., 2023; see also

Donoghue and Sanderson, 2015). In turn, these methods often misattribute residual heterogene-

ity caused by unconsidered and/or unobserved factors to measured factors instead, thereby yield-

ing spurious support for researchers’ a priori hypotheses (May and Moore, 2020; Boyko et al.,

2023a,b).

Critically, inferring evolutionary heterogeneity driven by unobserved factors is not as imprac-

tical as it may first seem. In fact, phylogenetic comparative methods for analyzing heterogeneity

in evolutionary dynamics are often described as either “hypothesis-driven” approaches designed

to test whether heterogeneity is associated with particular factors of interest, or “data-driven” ap-

159

proaches meant to detect and quantify heterogeneity based on the provided comparative data alone,

independently of any a priori hypothesis. In other words, data-driven approaches are precisely

designed to infer general evolutionary heterogeneity driven by unobserved factors. Accordingly,

methods that integrate both hypothesis- and data-driven approaches have become increasingly pop-

ular in recent years, as they are able to account for evolutionary heterogeneity driven by both ob-

served and unobserved factors (Beaulieu and O’Meara, 2016; Uyeda et al., 2018; May and Moore,

2020; Boyko and Beaulieu, 2023; Boyko et al., 2023b). Hidden Markov model-based (HMM)

frameworks are particularly effective in this regard, allowing researchers to directly infer unob-

served discrete variables or “hidden states” (presumably representing simplified summaries of var-

ious interconnected factors; see discussion in Boyko et al., 2023b) based on their apparent impact

on lineage diversification and/or discrete trait evolution dynamics (Beaulieu et al., 2013; Beaulieu

and O’Meara, 2016; Caetano et al., 2018; Goldberg and Foo, 2020; Vasconcelos et al., 2022; Boyko

and Beaulieu, 2023). Ancestral state reconstruction can even be used to map the occurrence of hid-

den states across a phylogeny, enabling powerful explorations of how evolutionary dynamics have

changed throughout the history of a clade (e.g., Beaulieu and O’Meara, 2016; Vasconcelos et al.,

2022). Unfortunately, existing approaches for inferring heterogeneity in continuous trait evolution

dynamics are not directly compatible with HMM frameworks, instead employing sequential ap-

proximations to infer general/residual heterogeneity by iteratively sampling possible distributions

of heterogeneity (implicitly driven by unobserved factors) across a phylogeny (Eastman et al.,

2011; Thomas and Freckleton, 2012; Rabosky et al., 2014; May and Moore, 2020; Pagel et al.,

2022; Boyko et al., 2023b; Martin et al., 2023; Tribble et al., 2023). While such methods are

certainly powerful tools for exploring variation in continuous trait evolution dynamics, they are

typically computationally-expensive and often incompatible with hypothesis-driven approaches

(but see May and Moore, 2020; Boyko et al., 2023b; Tribble et al., 2023).

Ultimately, to make accurate and robust conclusions regarding what mechanisms drive hetero-

geneity in continuous trait evolution dynamics across the tree of life, researchers need methods

capable of “jointly” inferring heterogeneous continuous trait evolutionary processes along with the

160

history of factors–both observed and unobserved–potentially driving such heterogeneity. Notably,

currently available state-dependent speciation and extinction (SSE) models meet all these crite-

ria in regards to inferring links between discretely-measured factors or “states” (but see FitzJohn,

2010) and lineage diversification dynamics (Maddison et al., 2007; Beaulieu and O’Meara, 2016).

Accordingly, the SSE modeling framework has become extremely popular among empirical re-

searchers and method developers alike (Helmstetter et al., 2023). The publication of the original

SSE model has been cited over 1,000 times (Maddison et al., 2007) and since been elaborated

into a diverse array of more sophisticated models themselves often associated with highly-cited

publications (FitzJohn et al., 2009; FitzJohn, 2010; Goldberg et al., 2011; FitzJohn, 2012; Gold-

berg and Igi´c, 2012; Magnuson-Ford and Otto, 2012; Beaulieu and O’Meara, 2016; Caetano et al.,

2018; Freyman and Höhna, 2018; Herrera-Alsina et al., 2019; Nakov et al., 2019; Verboom et al.,

2020; Vasconcelos et al., 2022). Thus, an analogous framework for State-dependent Continuous

trait Evolution or “SCE” models would likely constitute a tremendously useful tool for empirical

macroevolutionary research, not to mention continued development of more realistic continuous

trait evolution models.

Here, we begin to address this methodological gap by developing a new phylogenetic com-

parative method for inferring variation in continuous trait evolution dynamics driven by observed

and/or unobserved states (i.e., discretely-varying factors), implemented in an R package named

sce (pronounced “ski”). Critically, our approach employs a novel “pruning algorithm” (Felsen-

stein, 1973) that efficiently accounts for all possible state histories given the observed state and

continuous trait data, thereby avoiding explicit reconstruction of state histories as in similar meth-

ods based on sequential approximations. Through an extensive simulation study, we verify that this

SCE modeling framework can be used to reliably detect and estimate variation in rates of continu-

ous trait evolution from phylogenetic comparative data. Furthermore, we use SCE models to show

that tropical environments are associated with higher rates of flower size evolution in sages (Lami-

aceae: Salvia L.), providing insights into the possible processes underlying latitudinal gradients of

increasing phenotypic diversity (and perhaps species richness; see Schemske, 2001) towards the

161

equator (Stevens et al., 2006; Schumm et al., 2019; Diamond and Roy, 2023). Overall, SCE mod-

els provide a powerful and flexible new framework for studying heterogeneity in continuous trait

evolution dynamics, ultimately allowing researchers to more confidently and accurately elucidate

how various factors drive shifts in phenotypic evolutionary processes across the tree of life.

3.2 Materials and Methods

The sce package provides tools for inferring state-dependent variation in the evolution-

ary dynamics of a single (i.e., univariate) continuous trait given a phylogeny and comparative

state/continuous trait data associated with its tips. Currently, our implementation only allows for

up to one state and continuous trait measurement per tip (note that missing state/trait measure-

ments are allowed), though we plan to extend the method to handle multiple measurements per tip

in the future. Using the sce package, researchers can build, fit, and compare various SCE models

corresponding to particular hypotheses–for example, one might compare models assuming rates

of flower size evolution either are constant, differ among habitats, vary according to some unob-

served factor/hidden state, or are affected by both habitat and hidden states. Notably, the package

additionally allows researchers to map probable states, traits, and rates onto a phylogeny under a

fitted model via marginal ancestral state reconstruction (Yang, 2007; Hiscott et al., 2016).

3.2.1 Model and Implementation

Our new framework for inferring SCE models largely depends on a novel pruning algorithm for

calculating the joint likelihood of continuous and discrete phylogenetic comparative data evolving

under “Markov-modulated Lévy processes” (MMLPs). Briefly, MMLPs model the evolution of a

continuous trait as a continuous-time random walk (of which Brownian motion or BM is a special

case) whose parameters depend on discrete states, which themselves evolve via a continuous-time

Markov chain (CTMC). To achieve this, we effectively treat states and continuous traits as a unified

compound trait, and use a modified version of Felsenstein’s pruning algorithm for discrete traits

(Felsenstein, 1973) to recursively calculate the likelihood of observed data conditional on the state

and continuous trait value at the root of a phylogeny. To effectively clarify the details of our

new pruning algorithm, we first outline some key notations and concepts by providing a general

162

introduction to pruning algorithms below.

Assume we have a rooted phylogeny with m branches, including n terminal “tip branches”

with associated phenotypic data. Let τ be an m-length vector of branch lengths (typically mea-

sured in units of time) and φe(xl) denote the likelihood that the node immediately descending from

branch e exhibits phenotype xl given the observed phenotypes of all its descendants. Hereafter,

we represent these so-called “partial likelihoods” (sensu Hassler et al., 2022a) of all possible phe-

notypes for a given branch e as φe(x). Notably, the partial likelihoods for each tip branch are

directly derived from observed phenotypic data. Generally speaking, pruning algorithms aim to

recursively calculate partial likelihoods for all remaining (i.e., non-tip) branches in the phylogeny

to compute the full likelihood of all observed phenotypic data under a given model of phenotypic

evolution. Specifically, for non-tip branch e, assuming phenotypic evolution is conditionally inde-

pendent among sister branches (an assumption common to nearly all trait evolution models), φe(xl)

is given by:

(cid:90)

φe(xl) = ∏

d∈des(e)

ψ(y; xl, τd, θ )φd(y)dy

(1)

Where des(e) is a function that returns the indices of all branches immediately descending from

branch e and ψ denotes the “transition probability function” under the given model of phenotypic

evolution. More precisely, ψ(y; xl, τd, θ ) represents the probability of observing phenotype y after

τd units of time given a starting phenotype of xl and evolutionary model parameters θ . The expres-
sion (cid:82) ψ(y; xl, τd, θ )φd(y)dy represents the likelihood that the node immediately descending from

branch e exhibits phenotype xl given the observed phenotypic data associated with all descendants

of branch d only, and is sometimes called the “branch-inflated” partial likelihood for branch d

(Hassler et al., 2022a), here denoted φ ∗

d (xl). Because phenotypic evolution along sister branches
is assumed to be conditionally independent, taking the product of φ ∗(xl) across all descendants

of branch e thus yields the partial likelihood of phenotype xl for branch e. To calculate the over-

all likelihood given all observed data, the phylogeny is traversed in “postorder” (i.e., from tips to

root), calculating φ (x) for all branches according to Eq. 1, including an implicit “root branch”

163

indexed m + 1. The final likelihood L is computed by integrating the partial likelihoods for the

root branch multiplied by some root prior, π(x). For this work, we use “Fitzjohn’s root prior”,

marginalizing over all possible root phenotypes according to the relative probability they gave rise

to the observed data (FitzJohn et al., 2009):

(cid:90)

L =

φm+1(x)π(x)dx

φm+1(x)
(cid:82) φm+1(z)dz

(cid:90)

=

=

φm+1(x)
(cid:82) φm+1(x)2dx
(cid:82) φm+1(x)dx

dx

(2)

(3)

(4)

Note that the use of integrals in imply that the phenotype x in this example varies continuously.

In the case of discretely-valued phenotypes, these integrals are instead replaced by sums over all s

possible states.

Unfortunately, outside of special cases, MMLPs do not admit simple closed-form expressions

for partial likelihood or transition probability functions over continuous trait values. Thus, like

other pruning algorithms for calculating the likelihood of complex continuous trait evolution mod-

els from phylogenetic comparative data, our implementation discretizes a specified range of con-

tinuous trait values into an equally-spaced grid of c points and tracks the partial likelihoods at

each point to flexibly approximate arbitrary functions over continuous trait values (FitzJohn, 2010;

Boucher and Démery, 2016; Hiscott et al., 2016; Boucher et al., 2018). We specifically represent

the partial likelihoods for each branch e as an s × c matrix Φe. More concretely, if we let x rep-

resent an c-length vector of trait values corresponding to each grid point, the partial likelihood of

exhibiting a continuous trait value of xl and state j is thus given by the entry in the jth row and lth

column of Φe denoted Φe, j,l.

Unfortunately, even with this flexible representation of partial likelihoods over both discrete

states and continuous trait values, calculating transition probabilities under MMLPs (and branch-

inflated partial likelihoods by extension) remains challenging. To simplify these calculations,

we represent transition probability functions under Lévy processes, ψ(x; 0,t, θ ), via their Fourier

164

transform or characteristic function, ˆψ(ξ ; 0,t, θ ). This representation is particularly convenient

because, as continuous-time random walks, Lévy process transition probability functions are “in-

finitely divisible probability distributions” (i.e., distributions describing the sum of an arbitrary

number of independent and identically distributed random variables; Sato, 2013). According to

the convolution theorem, the characteristic function describing the sum of random variables is

given by the element-wise product of each variable’s individual characteristic function, and Lévy

process characteristic functions thus always correspond to “infinitely divisible products” of the

form:

ˆψ(ξ ; 0,t, θ ) = exp [ζ (ξ , θ )t]

(5)

Where t denotes elapsed time and ζ (ξ , θ ) is the so-called “characteristic exponent” of a given

Lévy process with parameters θ evaluated at ξ (note that the domain of the characteristic function,

ξ , is distinct from that of its corresponding probability distribution function, x). Ultimately, all

Lévy processes admit a characteristic function representation of their transition probability func-

tion that consists of (relatively) simple exponential functions with respect to time for any particu-

lar value of ξ (stated more formally by the Lévy-Khintchine formula for characteristic functions

of Lévy processes; Sato, 2013). The convolution theorem also greatly simplifies calculation of

branch-inflated partial likelihoods for a branch d, φ ∗

d (x), which is equivalent to the distribution

of the sum of two random variables–one distributed according to the transition probability func-

tion given a starting state of x = 0 and another to the partial likelihoods for branch d. Thus, we

can Fourier transform the partial likelihoods for branch d to yield its characteristic function ˆφd(ξ ),

multiply ˆφd(ξ ) by ˆψ(ξ ; 0, τd, θ ), and take the inverse Fourier transform of the result to finally yield

φ ∗
d (x).

Representing Lévy process transition probability functions as exponential functions with re-

spect to time allows us to form an important bridge between Lévy process and CTMC transition

probability functions, ultimately allowing us to efficiently compute transition probabilities under

MMLPs. Briefly, under a CTMC, lineages switch between s states according to an s × s matrix,

165

Q, where q j,k denotes the instantaneous rate of transition from state j to state k. Exponentiating

tQ yields a new matrix, P, where p j,k now denotes the probability of a lineage starting in state j

ending up in state k after t time units (Pagel, 1994). Importantly, branch-inflated partial likelihoods

for branch d under a CTMC process are given by the matrix-vector product exp [τdQ]φd(x), where

φd(x) now represents an s-length vector of partial likelihoods for each state (Felsenstein, 1973). In

any case, transition probability functions under CTMCs, similarly to characteristic functions rep-

resentations of Lévy process transition probability functions, are ultimately defined by a kind of

“characteristic exponent”–in this case, the matrix Q. Importantly, because characteristic functions

are linear transformations of their corresponding probability distribution functions (i.e., the Fourier

transform of Aψ1(x) + Bψ2(x) is equal to A ˆψ1(ξ ) + B ˆψ2(ξ ); Pinsky, 2009), characteristic func-

tion representations of MMLP transition probability functions for any ξ value can be calculated

by exponentiating the Q matrix describing the CTMC evolution of states with modified diagonal

entries. Under normal CTMCs, the diagonal entries of Q are given by:

q j, j = −

q j,k

s
∑
k=1
k̸= j

(6)

Which ensures that probabilities are conserved under the CTMC process by balancing the

overall transition rate into state j with the transition rate out of state j (Pagel, 1994).

In the

case of MMLPs, we must modify these diagonals for a given ξ value by adding the corre-

sponding characteristic exponent for the Lévy process governing continuous trait evolution in
state j, denoted ζ j(ξ , θ ) (note that the exponents are added because exp (cid:2)ζ j(ξ , θ )t(cid:3) exp (cid:2)q j, jt(cid:3) =
exp (cid:2)(cid:0)ζ j(ξ , θ ) + q j, j

(cid:1)t(cid:3)).

Because we represent the partial likelihoods for each branch with s × c matrices Φ, we do not

treat ξ as a continuous variable but, like x, as a c-length vector of grid points. More specifically,

the lth entry ξ is given by:

ξl = −2π

l − 1 − c⌊ l+c/2−2
xc − x1

c

⌋

(7)

166

Where ⌊a⌋ denotes the greatest integer less than or equal to a. Now, let R represent a “rate

array” of c s × s matrices, with the lth matrix encoding the characteristic function representation of

the MMLP transition probability function at ξl. The off-diagonal elements of all matrices in R are

equal and given by the Q matrix describing the CTMC evolution of discrete states. On the other

hand, the diagonal entries for the lth matrix are equal to:

r j, j,l = ζ j(ξl, θ ) −

q j,k

s
∑
k=1
k̸= j

(8)

To calculate branch-inflated partial likelihoods for a given branch d under MMLPs, we ex-

ponentiate each matrix in R multiplied by the branch length τd, resulting in a new array denoted
ˆΨd. Next, we use the highly efficient fast Fourier transform algorithm (FFT) to compute the dis-

crete Fourier transform of each row in the partial likelihood matrix Φd to yield the characteristic
ˆΨd is

function representation of the partial likelihood matrix denoted ˆΦd. Then the lth matrix of

multiplied by the lth column of

ˆΦd to compute the characteristic function representation of the

of

branch-inflated partial likelihood matrix ˆΦ∗

d. Next, we use the inverse FFT to convert each row
d back to its normal representation, yielding the branch-inflated partial likelihood matrix Φ∗
d.
Lastly, the partial likelihood matrix for a given branch e is simply the element-wise product of Φ∗
d

ˆΦ∗

for all branches d immediately descending from e. This finally allows us to compute the necessary

partial likelihoods to carry out a pruning algorithm.

Intriguingly, representing partial likelihoods in matrix form theoretically allows for incredibly

flexible initialization of our pruning algorithm–that is, partial likelihood matrices for tip branches

may be arbitrarily specified in accordance with observed phenotypic data. For example, provided

sufficient within-tip sampling, one could even initialize the partial likelihood matrices for each tip

with kernel density estimates of continuous trait distributions for any given discrete state. Nonethe-

less, for simplicity, we assume here that continuous traits for each tip e are normally distributed

with mean ηe and variance ε 2

e across all discrete states, while the probability that a given tip

exhibits discrete state j is given by the jth entry of an s-length vector ρe, denoted ρe, j. Under

these assumptions, the characteristic function representation for the partial likelihood matrix for

167

tip branch e, ˆΦe, is given by (DasGupta, 2011):

ˆΦe, j,l = ρe, j exp

(cid:20)
i

(cid:18)

ηe −

(cid:19)

xc + x1
2

ξl −

(cid:21)

e ξ 2
ε 2
l
2

(9)

Where, critically, i is not a parameter or variable and actually denotes the imaginary unit (i.e.,

√

−1).

Notably, our method for efficiently calculating partial likelihoods, branch-inflated partial like-

lihoods, and transition probabilities all assume continuous trait evolution exhibits “periodic bound-

ary conditions”–a rather technical yet important caveat (Bowman and Roberts, 2011). Practically

speaking, this means that continuous trait evolution “wraps around” on itself when it hits the

boundaries of the specified grid (i.e., x1 to the left and xc to the right), such that low trait values

are unrealistically “teleported” over to high trait values and vice versa. Fortunately, this problem is

quite easily and effectively managed by simply extending the grid of trait values well beyond the

range of observed data. In our current implementation, given an n-length vector of mean trait values

for each tip η, we derive a “primary grid” of c/2 points spanning from min η − ω(max η − min η)

to max η + ω(max η − min η). By default, we set ω to 0.5 to expand the bounds of observed con-

tinuous trait data by 50% of the overall range of trait values on either side. The remaining c/2 grid

points come from symmetrically “zero-padding” partial likelihoods–that is, appending c/4-length

vectors of 0s to either side of the partial likelihood vectors (Bowman and Roberts, 2011). By forc-

ing these extreme trait values to be associated with partial likelihoods of 0 prior to each Fourier

transformation, the risk of periodic boundary conditions influencing model inference is rendered

effectively negligible, though we recommend against setting ω to small values below ∼0.05-0.1.

In the current work, we focus on the simplest MMLP–namely, Markov-Modulated Brownian

motion (MMBM). The characteristic exponent of a BM process is given by (DasGupta, 2011):

ζ (ξl, σ 2, µ) = iµξl − σ 2ξ 2

l /2

(10)

Where σ 2 and µ correspond to the evolutionary rate and trend, respectively, of the BM pro-

cess (notably, because the BM process must be propagated “backwards” in time for the pruning

168

algorithm, the drift parameter is multiplied by -1 for pruning algorithm calculations). While evolu-

tionary trends under homogeneous BM models of trait evolution are generally unidentifiable from

ultrametric phylogenetic comparative data (i.e., phylogenies with tips corresponding to contem-

poraneous taxa only, which the current work focuses on), preliminary tests of our approach with

simulated data suggest that state-dependent differences among evolutionary trends are identifiable

under MMBM models. Nonetheless, for simplicity, we assume µ = 0 throughout the simulation

and empirical case studies presented here, leaving investigation of state-dependent evolutionary

trend inference for future work.

We implemented our pruning algorithm using the C/C++ libraries FFTW3 (Frigo and Johnson,

2005) and Armadillo (Sanderson and Curtin, 2016, 2019), interfaced via the R packages Rcpp

(Eddelbuettel and Francois, 2011) and RcppArmadillo (Eddelbuettel and Sanderson, 2014). See

APPENDIX 3B for further details on how we practically manage the efficiency and numerical

stability of our pruning algorithm.

3.2.2 Hidden States and Hypothesis Testing

By calculating the likelihood of both state and continuous trait data under a joint evolutionary

process, our new SCE modeling framework enables straight-forward inference of unobserved or

hidden states associated with heterogeneity in continuous trait evolution dynamics. Ultimately,

this is due to the fact that our approach allows continuous trait data to directly influence the likeli-

hood of state histories, enabling the inference of evolutionary histories of unobserved states based

solely on their apparent impact on continuous trait evolution dynamics. More concretely, inferring

hidden states generally requires “splitting” the observed state data into multiple observed-hidden

state combinations and allowing continuous trait evolution parameters like rates or trends to vary

based on these additional states. Importantly, each tip is initialized such that partial likelihoods

are identical across hidden states. As an example, if one was studying whether flower size evo-

lutionary dynamics differ between tropical and temperate lineages, inferring a model with two

hidden states would entail splitting each state into two new ones, conventionally labeled tropicalA,

tropicalB, temperateA, and temperateB. Then, for any given tip, the partial likelihoods for states

169

tropicalA/tropicalB and temperateA/temperateB are given by the partial likelihoods for the original

tropical and temperate states, respectively.

Ultimately, incorporating hidden states into hypothesis-testing frameworks allows for more

comprehensive and realistic tests of evolutionary hypotheses. To test whether the evolution of a

continuous trait is largely homogeneous or varies according to an observed and/or unobserved dis-

crete variable, one may compare the fit of several candidate SCE models with differing constraints

on which parameters of the continuous trait evolution process (e.g., evolutionary rates, trends)

are allowed to vary across observed and/or hidden states. We demonstrate an example of this

hypothesis-testing approach through the simulation study and empirical examples outlined below.

Critically, despite the fact that hidden states alter how partial likelihoods are initialized in pruning

algorithms, the likelihoods/information criteria associated with models fit to the same phylogenetic

comparative data are directly comparable in this context whether or not they include hidden states.

3.2.3 Simulation Study

To assess the performance of our new approach for inferring SCE models, we tested whether

our method could reliably detect and quantify observed and/or hidden state-dependent rates of

continuous trait evolution from phylogenetic comparative data simulated under MMBM models.

Hereafter, for both brevity and consistency with existing terminology in the field (e.g., Beaulieu

and O’Meara, 2016; Boyko et al., 2023b), we refer to observed state-dependent rate variation as

simply “state-dependent rate variation” and to unobserved/hidden state-dependent rate variation as

“state-independent rate variation”. In general, we used the R package phytools (Revell, 2012) to

first simulate both pure-birth phylogenies (all scaled to have heights of 1) and associated discrete

state data, then used our new sce package to simulate associated continuous trait data under several

different patterns of state-dependent rate variation.

For each simulation, we simulated CTMC evolution of four discrete states labeled “0A”, “0B”,

“1A”, and “1B”, with the numeric prefixes denoting observed states 0 and 1 and letter suffixes

denoting unobserved/hidden binary states A and B. All simulated CTMCs assumed that: 1) transi-

tions between observed states do not depend on unobserved states and vice versa, 2) transitions in

170

both observed and unobserved states cannot occur in the same instant (i.e., q0A,1B = 0, q1A,0B = 0,

etc.), and 3) transitions among unobserved states always occur with rate 1 and are “symmetric”–

that is, transitions from A to B occur at the same rate as transitions from B to A. With the ex-

ception of the third assumption (which we made to simplify the design of our simulation study),

these are all common simplifying assumptions made to render hidden state inference more statis-

tically tractable (e.g., Beaulieu and O’Meara, 2016). Transitions among observed states 0 and 1

followed one of three alternative evolutionary dynamics: 1) “slow” symmetric transitions of rate

1, 2) “fast” symmetric transitions of rate 4, and 3) “asymmetric” transitions whereby transitions

from 0 to 1 occur with rate 2 while transitions from 1 to 0 occur with rate 0.4. For each simulation,

we sampled root states from the stationary distribution associated with a simulation’s given transi-

tion rate matrix. To prevent simulations from exhibiting too few state transitions to reliably infer

state-(in)dependent rate variation, we repeated each CTMC simulation until it met the following

criteria: 1) each state is represented by at least 10% of tips in the phylogeny and 2) each state is

transitioned into along 5 or more distinct branches (allowing for no more than 1 exception; i.e., at

least 3 states must exhibit this property).

We simulated continuous trait evolution according to one of five distinct patterns of state-

dependent and/or state-independent rate variation: 1) “constant” whereby rates of continuous trait

evolution are equal across all states observed or unobserved, 2) “state-independent” whereby rates

only depend on unobserved states A and B, 3) “completely state-dependent” whereby rates only de-

pend on observed states 0 and 1, 4) “strongly state-dependent” whereby rates primarily depend on

observed states while also being influenced by unobserved states, and 5) “weakly state-dependent”

whereby rates primarily depend on unobserved states while also being influenced by observed

states. Across all simulation conditions, states 1 and B were associated with higher rates than

states 0 and A, respectively, with one notable exception. To investigate how asymmetric state tran-

sitions affect inference of state-dependent rates, we swapped the rates associated with state 0 and

1 for some simulations with both asymmetric transition rates and unequal rates of continuous trait

evolution across states 0 and 1. The lowest rate in all simulations was always set to 1, while the

171

higher rate for state-independent and completely state-dependent simulations was set to 8. For

strongly state-dependent simulations, we set rates to 1, 4, 8, and 16 in states 0A, 0B, 1A, and 1B,

respectively, but set these rates to 1, 8, 4, and 16 instead for weakly state-dependent simulations.

Ultimately, we defined 18 simulation conditions–15 corresponding to every possible combina-

tion of the 3 state transition dynamics and 5 state-dependent/independent rate variation patterns,

plus an additional 3 conditions with asymmetric transition rates and rates of continuous trait evo-

lution for states 0 and 1 swapped as described above. To also investigate how our approach was

influenced by increasing sample size, we simulated phylogenies with either 50, 100, or 200 tips.

We simulated 20 replicates for each condition and phylogeny size, ultimately yielding 1,080 sim-

ulated phylogenies and associated comparative datasets. Note that hidden state information was

discarded prior to model fitting and analysis, such that states 0A/0B were converted to 0 and states

1A/1B to 1. To analyze each simulation, we fit 8 different models via maximum likelihood in-

ference using our sce package–four models assuming symmetric transition rates among observed

states plus four otherwise identical models instead assuming asymmetric transition rates. For sim-

plicity, all models assumed transition rates among hidden states are symmetric. The four models

in each category (symmetric versus asymmetric transition rates) consisted of: 1) a null model as-

suming constant rates across observed states without hidden states, 2) a state-independent model

assuming constant rates across observed states with two hidden states, 3) a state-dependent model

allowing rates to vary across observed states without hidden states, and 4) a full model allowing

rates to vary across observed states while also including two hidden states. Like the simulation set-

tings, all models assumed transition rates among hidden states did not depend on observed states

and vice versa, and that transitions in both observed and hidden states cannot occur simultaneously.

To reduce the time it took to analyze all these simulated datasets, we discretized continuous trait

data with a relatively coarse grid of 512 points. While 512 grid points seems to suffice to prelim-

inary analyses, we generally recommend using a grid 2,048 points for final parameter estimates

and likelihood calculations (the FFT algorithm is most efficient with grid resolutions which are a

power of two; Frigo and Johnson, 2005). For the purposes of model fitting, we assumed the con-

172

tinuous trait at each tip followed a normal distribution centered at the simulated continuous trait

value with a small standard deviation equal to 1/100th the range of all continuous trait values for a

given simulation.

We used the Limited-memory Broyden-Fletcher-Glodfarb-Shanno algorithm (LBFGS) algo-

rithm implemented in the C++ library NLOPT (interfaced through the R package nloptr; Johnson,

2021; Ypma et al., 2022) to fit all models. Notably, the LBFGS algorithm requires information

about the gradient of the likelihood surface, which we calculated via simple finite difference ap-

proximations (see APPENDIX 2B for an example of this approach to gradient approximation).

While other numerical optimization algorithms do not require gradient calculations, we found

these algorithms to be slower and less reliable in preliminary tests of our approach. Because tran-

sition and evolutionary rate parameters must be positive, we estimated all parameters on a natural

log scale. For each model fit, we ran the LBFGS algorithm at least 5 times from initial parame-

ter estimates uniformly sampled between -2.5 and 2.5 (∼0.1-10), taking whichever run achieved

the highest maximum likelihood estimate. Likelihood surfaces for models including hidden states

occasionally appeared to be multimodal and/or exhibit flat “ridges”, rendering numerical optimiza-

tion more challenging–thus, we ran the algorithm 10 rather than 5 times for models with hidden

states. We also bounded parameter estimates between -30 (∼ 1 × 10−13) and 7 (∼ 1, 000) to pre-

vent the algorithm from both getting “stuck” in likelihood ridges that may occur when parameter

estimates are effectively equal to 0 and/or running into overflow issues if parameter estimates grew

too large.

We analyzed the results of our simulation study through both “model selection” and “param-

eter inference”-based approaches. For the model selection-based approach, we calculated sample

size corrected Akaike Information Criteria (AICc) for each model fit. Because simulated data of-

ten failed to exhibit overwhelming support for one particular model based on differences in AICc

(∆AICc), we instead focused on analyzing variation in the AICc weights associated with different

models across simulation conditions. In particular, we used “relative AICc weights” (hereafter

RAWs), which we define here as the average AICc weight for all models supporting some hypoth-

173

esis (e.g., state-dependent rate variation) divided by itself plus the average AICc weight for all

models supporting an alternative hypothesis (e.g., constant rates). RAWs thus provide normalized

measures between 0 and 1 of the support for some hypothesis (more technically, RAWs corre-

spond to the inverse logit transform of log “evidence ratios” sensu Burnham and Anderson, 2002).

We specifically calculated RAWs comparing: 1) state-independent versus null models, 2) state-

dependent/full versus null models, 3) state-dependent/full versus state-independent models, and 4)

asymmetric versus symmetric models. Note that we averaged rather than summed AICc weights

because some of these RAWs were “unbalanced” in comparing four models to two (Burnham

and Anderson, 2002; Kittle et al., 2008). For the parameter inference-based approach, we sim-

ply model-averaged parameter estimates according to (non-relative) AICc weights and compared

inferred parameter values to simulated ones. For consistency among fitted models and simulated

parameters, we assumed the inferred hidden state associated with the highest average rate corre-

sponded to the simulated unobserved state B and the other inferred hidden state to A. Note that the

transition rate among hidden states could only be model-averaged across four of the eight models

which explicitly inferred hidden states.

3.2.4 Empirical Example

To demonstrate potential empirical applications of our new method for inferring state-

dependent rates of continuous trait evolution, we tested whether tropical lineages of sage (Lami-

aceae: Salvia L.) exhibit higher rates of flower size evolution in accordance with the Biotic Inter-

actions Hypothesis (BIH). The BIH predicts that the comparatively stable climatic conditions of

tropical ecosystems causes biotic factors to drive more variation in evolutionary fitness relative to

abiotic factors (Dobzhansky, 1950; Schemske, 2001; Schemske et al., 2009). Thus, assuming biotic

selective pressures are typically more heterogeneous than abiotic ones, the characteristic spatiotem-

poral scale of evolutionary processes should shrink towards the tropics, resulting in more frequent

and rapid bouts of evolutionary divergence among populations (i.e., more intense/pronounced “co-

evolutionary mosaics”; see Thompson, 2005). As such, the BIH suggests that traits particularly

important in mediating biotic interactions–like flower morphology–should exhibit elevated rates of

174

evolution among tropical lineages.

For this analysis, we combined the recently-generated, time-calibrated maximum clade credi-

bility phylogeny of sages and associated corolla length measurements from Moein et al. (2023)

with geographic occurrence data from Kriebel et al. (2019). To more thoroughly contextu-

alize the results of our analyses comparing corolla length evolution among tropical and tem-

perate sage lineages, we used additional discrete trait data from Moein et al. (2023) to also

examine whether other key sage traits–specifically, active staminal lever presence/absence and

woodiness/herbaceousness–affect corolla length evolution. We dropped 5 tips from the phylogeny

corresponding to species outside the genus Salvia, as well as an additional 37 tips lacking any asso-

ciated data, yielding a final phylogeny consisting of 334 tips with corresponding corolla (328 tips),

lever (314 tips), and/or woodiness (302 tips) data. We coded missing discrete trait measurements

by assigning identical partial likelihoods to all possible discrete states, similarly to hidden states,

and coded missing continuous trait measurements by assigning equal partial likelihoods to all grid

points corresponding to different corolla length measurements. The geographic occurrences from

Kriebel et al. (2019) provided data for 289 of these tips, but we supplemented these data with

coordinates downloaded from the Global Biodiversity Information Facility (GBIF) for any species

lacking coordinate data. We used the R package CoordinateCleaner (Zizka et al., 2019) to remove

likely erroneous and/or duplicate records from the combined data under the package’s default set-

tings, with the exception of keeping records potentially corresponding to country centroids, which

were the only records available for a substantial number of tips in the phylogeny. Ultimately, this

yielded nearly 71,000 occurrence records for 304 tips, with a median of 10 occurrences per tip

(ranging from 1 to 25,788; 57 or ∼20% of tips were only associated with a single record).

There are, unfortunately, numerous partially conflicting strategies for classifying geographic

locations as either tropical or temperate (Feeley and Stroud, 2018). Because the BIH suggests cli-

matic stability ultimately drives increased evolutionary rates towards the tropics, we chose to use

ratio of annual to daily temperature ranges (hereafter “isothermality”) to delineate geographic oc-

currences as either tropical or temperate. First, we calculated isothermality based on the definition

175

of Feeley and Stroud (2018) for each occurrence record using 2.5 minute resolution WorldClim

rasters (Fick and Hijmans, 2017) via the R package geodata (Hijmans et al., 2023). In this case,

isothermality is a strictly positive variable with values below 1 corresponding to tropical condi-

tions whereby daily temperature fluctuations are comparable to or exceed seasonal variation in

temperature. After log-transforming isothermality such that it ranged from −∞ to ∞, we computed

isothermality ranges for each tip as the 5 and 95% quantiles of each tip’s empirical isothermality

distribution, classifying each tip as either tropical if the midpoint of the range fell below ln 1 = 0

and temperate otherwise (Fig. 3A.1). To assess how sensitive our results were to this particular

classification scheme, we also devised a more conservative coding strategy that took the extremes

of each tip’s isothermality range into account. Notably, however, the widths of isothermality ranges

tended to increase with the number of occurrence records among tips with roughly 20 or fewer

records (Fig. 3A.2). Thus, we calculated the mean range width across all tips with more than 20

records and symmetrically expanded the ranges for all tips with 20 or fewer records to at least

this width. We then classified tips as tropical only if both the upper extreme and midpoint of their

range fell below ln 1.25 ≈ 0.22 and ln 0.75 ≈ −0.29, respectively. Conversely, tips were only clas-

sified as temperate if the lower extreme and midpoint of their range exceeded ln 0.75 ≈ −0.29 and

ln 1.25 ≈ 0.22, respectively. All other tips (50 out of the 304 with occurrence records) were coded

as missing because they exhibited no strong climatic preference, either occurring under both trop-

ical and temperate conditions or exclusively occurring right around the isothermality cut-off point

of ln 1 = 0 (Fig. 3A.3).

For each discrete variable (“strict” and “conservative” codings of tropical/temperate states,

presence/absence of staminal levers, and woodiness), we inferred joint models of state and natural

log-transformed corolla length evolution following procedures largely identical to those used for

the simulation study, with a few key exceptions. First, in addition to the 8 models described in the

simulation study, we assessed evidence for asymmetric transition rates among hidden states via 4

additional models: state-independent and full models assuming symmetric transitions among ob-

served states but asymmetric transitions among hidden states, plus two otherwise identical models

176

assuming asymmetric transitions among both observed and hidden states. Second, because we lack

data on corolla length measurement errors/within-tip variation, we allowed all models to simulta-

neously estimate a “tip error” parameter (i.e., ε in Eq. 9; see Landis and Schraiber, 2017) rather

than fixing this parameter a priori based on the overall range of continuous trait data as in the sim-

ulation study. Lastly, to better ensure the accuracy of all inferences, we increased the resolution

of the continuous trait value grid from 512 to 2,048 and repeatedly fit each model 20 times rather

than 5 or 10.

We analyzed our empirical results following the procedures used for the simulation study as

well, calculating AICc weights and corresponding model-averaged estimates of transition rates

among states (again defining hidden state B as whichever inferred hidden state is associated

with the highest average rate of continuous trait evolution), rates of corolla length evolution,

and tip error in corolla length measurements. we also calculated RAWs measuring the evidence

for state-independent/state-dependent rate variation versus constant rates, state-dependent versus

state-independent rate variation, and asymmetric versus symmetric transition rates among ob-

served/hidden states. Additionally, we computed ancestral state, rate, and corolla length estimates

under each model using marginal ancestral state reconstruction, summarizing the results for each

of the four discrete variables by model-averaging estimated state probabilities, rates, and corolla

lengths at each node and tip in the phylogeny.

3.3 Results

3.3.1 Simulation Study

Overall, our simulation study demonstrates that our new SCE modeling framework can accu-

rately detect and quantify variation in rates of continuous trait evolution from phylogenetic com-

parative data–whether such variation is associated with observed or unobserved discrete variables.

Interestingly, sample size corrected Akaike Information Criteria (AICc) suggested data simulated

under relatively “simple” conditions (e.g., constant rates, state-independent rate variation only)

generally did not exhibit strong evidence against models implying more complex patterns of rate

variation, instead tending to yield equivocal support for a range of simple to more complex models.

177

Accordingly, selecting best-fitting models via common “rules of thumb” (e.g., lowest AICc, ∆AICc

< 2) resulted in incorrectly rejecting simpler models about 10-15% of the time. Fortunately, using

relative AICc weights (RAWs) to measure the evidence in favor of particular hypotheses rather than

models resulted in much better error rates (Figure 3.1). More specifically, using a RAW threshold

of >85% in favor of “more complex” hypotheses (e.g., variation in rates, state-dependent rates,

asymmetric transitions among states) yielded error rates of about 5% or lower. Critically, many

datasets simulated under state-independent rates yielded strong support for state-dependent over

constant rate models, but not state-dependent over state-independent models, directly demonstrat-

ing the importance of accounting for the possible influence of unobserved discrete variables in

testing for associations between discrete variables and rates of continuous trait evolution. Beyond

error rates, our method performed well in terms of its statistical power to detect rate variation.

While our approach sometimes struggled to detect rate variation–particularly state-independent

rate variation–from 50-tip datasets, its power generally grew rapidly with increasing sample size.

As might be expected, increasing ratios of state-independent to state-dependent rate variation (i.e.,

weakly state-dependent simulations compared to completely state-dependent ones) lead to more

equivocal support for state-dependent over state-independent models. Intriguingly, faster transition

rates seemed to slightly decrease power rates for detecting state-dependent rate variation, consis-

tent with previous work based on a more approximate method for inferring state-dependent rates

of continuous trait evolution (Revell, 2013). Our method’s power to detect asymmetric transition

rates (i.e., transitions from one state to another occurring more frequently than the reverse) among

observed states was noticeably weaker than its power to detect rate variation, aligning with similar

findings in the context of state-dependent SSE models (Beaulieu and O’Meara, 2016). Nonethe-

less, our approach could detect asymmetric transition rates more sensitively from data simulated

under state-dependent rates, likely because apparent variation in evolutionary rates across the phy-

logeny provided additional information on the evolutionary history of observed states (see Boyko

et al., 2023b).

Overall, our method yielded rather unbiased and accurate parameter estimates which grew

178

Figure 3.1 Distributions of relative sample size corrected Akaike Information Criterion (AICc)
weights (i.e., average AICc weight associated with models supporting one hypothesis versus an-
other, normalized to vary between 0 and 1; hereafter RAWs) measuring the support for several
key evolutionary hypotheses across different simulation conditions based on our novel approach
to inferring state-(in)dependent continuous trait evolution models. We specifically quantified the
evidence that: 1) rates of continuous trait evolution vary according to an unobserved discrete vari-
able (indep. > const.), 2) rates vary according to the observed discrete variable (dep. > const.), 3)
rates vary according to the observed discrete variable rather than or in addition to an unobserved
variable (dep. > indep.), and 4) transition rates from observed state 0 to 1 differ from those for
1 to 0 (i.e., transition rates are asymmetric; asym. > sym.). Different plots correspond to distinct
simulation conditions, with different sample sizes (i.e., number of tips; distributions in lighter,
warmer colors correspond to larger sample sizes) and hypotheses arrayed along each plot’s x-axis.
Hatched boxes correspond to RAW values > 85%, which indicate substantial support for either
correct (in darker gray) or incorrect (in lighter red) hypotheses depending on the given simulation
conditions. Hatched boxes were omitted in cases where a particular hypothesis is irrelevant given
the simulation conditions.

179

10%30%50%70%90%constant10%30%50%70%90%10%30%50%70%90%indep.>const.dep.>const.dep.>indep.asym.>sym.number of tips50100200state-independentindep.>const.dep.>const.dep.>indep.asym.>sym.>85% support for...correct hypothesisincorrect hypothesiscompletely state-dependent10%30%50%70%90%indep.>const.dep.>const.dep.>indep.asym.>sym.strongly state-dependentindep.>const.dep.>const.dep.>indep.asym.>sym.slow transitionsweakly state-dependentfast transitionsasymmetric transitions(highest rates in 1)asymmetric transitions(highest rates in 0)indep.>const.dep.>const.dep.>indep.asym.>sym.model comparisonrelative AICc weightrate variation modelstate transition modelmore precise with increasing sample size (Figs. 3.2-3.3). Interestingly, in some cases, the method

inferred hidden states which were very unlikely to occur anywhere across a phylogeny, resulting

in “outlier” rate estimates (represented by the long, skinny tails of some distributions and × sym-

bols in Fig. 3.2) because the rate parameters associated with these states had virtually no effect

on model likelihoods. Fortunately, such situations are not especially problematic–ancestral state

reconstruction may be used to quickly verify whether states associated with anomalously low or

high rate estimates are actually likely to have occurred anywhere on a given phylogeny. Unsur-

prisingly, inferred state-dependent rate differences tended to be more accurate and precise than

state-independent rate differences. In fact, state-independent rate estimates often exhibited a spe-

cific pattern of bias for smaller 50-tip datasets, with differences among hidden states within a given

observed state “collapsing” to 0 (e.g., rate estimates for states 0A and 0B being biased towards the

overall average rate for state 0). Fortunately, this bias largely disappears for larger sample sizes

as state-independent rate variation could be inferred with greater confidence (see in particular rate

estimates for simulations with state-independent rates and slow transition rates in Fig. 3.2). A sim-

ilar pattern occurred for rate differences between observed states under some simulation conditions

with fast transition rates among observed states, aligning with results from previous work (Rev-

ell, 2013). Analogously, transition rate estimates for smaller datasets simulated under asymmetric

transtion rates tended to exhibit some bias towards inferring more symmetric rates, consistent with

the general difficulty of confidently detecting asymmetric transition rates (Fig. 3.1). Notably, in-

ferred parameters for transition rates among hidden states varied from fairly precise to exception-

ally imprecise depending on strength of state-independent rate variation, as inference of hidden

states is directly tied to the apparency of their effect on rates of continuous trait evolution–for ex-

ample, hidden transition rate estimates for data simulated under completely state-dependent rates

varied over six orders of magnitude regardless of sample size. Thus, unsurprisingly, simulations

generally must have sufficiently strong state-independent relative to state-dependent rate variation

for accurate inference of hidden state transition rates.

180

Figure 3.2 Distributions of model-averaged rates of continuous trait evolution in each state (based
on sample size corrected Akaike Information Criterion–or AICc–weights) across all simulation
conditions estimated under our novel approach to inferring state-dependent continuous trait evo-
lution models. Different plots correspond to distinct simulation conditions, with different sample
sizes (i.e., number of tips; distributions in lighter, warmer colors correspond to larger sample sizes)
and states arrayed along each plot’s x-axis. Horizontal lines depict the values of simulated rate
parameters under different simulation conditions. We used × symbols at the bottom of plots to in-
dicate four cases where rate estimates fell below 10−2 (ranging from about 4 × 10−5 to 5 × 10−3).
Outside of these cases, estimated rates never exceeded the plot boundaries.

181

10−210−1100101102constant10−210−110010110210−210−11001011020A0B1A1Bnumber of tips50100200state-independent0A0B1A1Bsimulated valueestimate≈0completely state-dependent10−210−11001011020A0B1A1Bstrongly state-dependent0A0B1A1Bslow transitionsweakly state-dependentfast transitionsasymmetric transitions(highest rates in 1)asymmetric transitions(highest rates in 0)0A0B1A1Bstateestimated raterate variation modelstate transition modelFigure 3.3 Distributions of model-averaged transition rates among states (based on sample size cor-
rected Akaike Information Criterion–or AICc–weights) across all simulation conditions estimated
under our novel approach to inferring state-dependent continuous trait evolution models. Differ-
ent plots correspond to distinct simulation conditions, with different sample sizes (i.e., number of
tips; distributions in lighter, warmer colors correspond to larger sample sizes) and state transitions
arrayed along each plot’s x-axis. Horizontal lines depict the values of simulated transition rate
parameters under different simulation conditions. We used × symbols at the bottom of plots to
indicate five cases where transition rate estimates fell below 10−3 (ranging from about 3 × 10−5 to
4 × 10−4). Outside of these cases, estimated transition rates never exceeded the plot boundaries,
though some came quite close (note that we set an upper bound of exp[7] ≈ 103 on parameter esti-
mates), causing some distributions to appear “cut off” at their extremes.

182

10−310−210−1100101102103constant10−310−210−110010110210310−310−210−11001011021030→10←1A↔Bnumber of tips50100200state-independent0→10←1A↔Bsimulated valueestimate≈0completely state-dependent10−310−210−11001011021030→10←1A↔Bstrongly state-dependent0→10←1A↔Bslow transitionsweakly state-dependentfast transitionsasymmetric transitions(highest rates in 1)asymmetric transitions(highest rates in 0)0→10←1A↔Bstate transitionestimated transition raterate variation modelstate transition model3.3.2 Empirical Example

We applied our new approach for inferring state-dependent variation in rates of continuous

trait evolution to assess whether tropical lineages of sage exhibit higher rates of flower size (mea-

sured via natural log-transformed corolla length) evolution than temperate lineages as predicted

by the BIH. Based on RAWs, we found substantial support for elevated rates of corolla length

evolution among tropical sages (i.e., relevant RAWs around >85%; see Table 3.1; see also Table

3A.1 for associated AICc weights), with estimated rates among tropical lineages around double

to triple those for temperate lineages (Table 3.2). This was true regardless of whether we used a

strict (Fig. 3.4) or conservative (Fig. 3A.4) scheme for classifying extant sages as tropical versus

temperate, and stands in stark contrast to the results for other discrete traits (i.e., staminal levers

and woodiness; Figs. 3A.5 and 3A.6), which do not exhibit any apparent association with rates

of corolla length evolution. On the other hand, evidence for state-independent rate variation was

decidedly equivocal with a RAW of around 56%. Reassuringly, such results were consistent across

models of different discrete traits, as would be expected because discrete and continuous trait evo-

lutionary processes are completely independent under models completely state-independent rate

variation.

Indeed, excepting transition rates among observed states, parameter estimates under

these models were extremely consistent across different discrete traits (compare rows 3, 7, 9, and

11 within/across Tables 3A.2–3A.5).

Beyond variation rates of corolla length evolution, the only discrete trait to yield notable evi-

dence for asymmetric transition rates was woodiness, with transitions to herbaceousness occurring

at nearly twice the rate of transitions to woodiness among sages, agreeing with previous results

(Moein et al., 2023). With regards to hidden states, while our raw parameter estimates suggest

transitions to the “fast” hidden state B (i.e., higher rates of corolla length evolution) occur at a rate

some three times higher than that for transition into the “slow” hidden state A, evidence for asym-

metric transition rates among hidden states based on RAWs never exceeded 50%, presumably due

to the general lack of evidence for state-independent rate variation in the first place. Accounting

for the association between tropicality and rates of corolla length evolution eroded any support

183

for asymmetry in hidden state transition rates even further. Inferred tip error in log corolla length

measurements (i.e., variation in log corolla length due to measurement error/within-tip variation)

remained rather consistent across all models at ∼0.3, roughly corresponding to an error of about

±30% around raw corolla length measurements. However, intriguingly, inferred tip errors were

slightly yet consistently lower under models accounting for associations between rates and tropical-

ity, suggesting that other models misattributed some signals of this rate heterogeneity to increased

corolla length measurement error instead (see tip error/ε estimates in Tables 3A.2–3A.5).

Table 3.1 Relative sample size corrected Akaike Information Criterion (AICc) weights (i.e., aver-
age AICc weight associated with models supporting one hypothesis versus another, normalized to
vary between 0 and 1; hereafter RAWs) measuring the support for evolutionary hypotheses based
on joint models of corolla length and discrete trait evolution among sages (Lamiaceae: Salvia L.).
Each column corresponds to one of the four discrete traits we analyzed: 1-2) alternative “strict” and
“conservative” codings (refer to subsection 3.2.4 for details) of tropicality versus temperateness
(strict/conserv. trop.), 3) the presence/absence of staminal levers (lever pres.), and 4) woodiness
versus herbaceousness (woodiness). Each row corresponds to a particular hypothesis we measured
support for: 1) rates of corolla length evolution vary according to an unobserved discrete variable
(indep. > const.), 2) rates vary according to the observed discrete trait (dep. > const.), 3) rates
vary according to the observed discrete trait rather than or in addition to an unobserved variable
(dep. > indep.), 4) transition rates from observed state 0 (i.e., temperateness, staminal lever ab-
sence, herbaceousness) to 1 (i.e., tropicality, lever presence, woodiness) differ from those for 1 to
0 (i.e., transition rates are asymmetric; obs. asym. > sym.), and 5) transition rates from the “slow”
(i.e., lower rate of corolla length evolution) hidden state A to the “fast” state B are asymmetric
(hid. asym. > sym.). A RAW of ∼85-90% or more indicates strong evidence in favor of a given
hypothesis.

discrete trait

hypothesis
indep. > const.
dep. > const.
dep. > indep.
obs. asym. > sym.
hid. asym. > sym.

strict trop.
56%
91%
89%
35%
37%

conserv. trop.
56%
96%
95%
31%
35%

lever pres. woodiness

56%
23%
19%
26%
49%

55%
31%
26%
88%
48%

184

Figure 3.4 Phylogram depicting model-averaged marginal ancestral rate and state estimates (based
on sample size corrected Akaike Information Criterion–or AICc–weights) based on our joint analy-
sis of corolla length evolution and temperate-tropical transitions (strictly-coded; refer to subsection
3.2.4 for details) among sages (Lamiaceae: Salvia L.). The color of branches correspond to inferred
rates of corolla length evolution, with darker, cooler and lighter, warmer colors denoting relatively
slow and fast rates, respectively. Pie charts at select nodes depict the probability that a given node
tended to occur in either temperate (light blue) or tropical (dark green) environments. Because we
lacked data for some tips and accounted for uncertainty in tropicality and corolla lengths, we also
depict inferred tropicality probabilities (via colored boxes; light gray indicates even chances of be-
ing tropical or temperate) and 95% confidence intervals on corolla lengths (via gray bars) arrayed
along the tips.

185

0.0040.0070.0110.0180.030rate of log corolla length evolution(variance per million years)41664corolla length(mm)statetemperateambiguoustropicalTable 3.2 Model-averaged parameter estimates (based on sample size corrected Akaike Informa-
tion Criterion–or AICc–weights) based on joint models of corolla length and discrete trait evolu-
tion among sages (Lamiaceae: Salvia L.). Each column corresponds to one of the four discrete
traits we analyzed: 1-2) alternative “strict” and “conservative” codings (refer to subsection 3.2.4
for details) of tropicality versus temperateness (strict/conserv. trop.), 3) the presence/absence of
staminal levers (lever pres.), and 4) woodiness versus herbaceousness (woodiness). Each row cor-
responds to a particular parameter. Parameters denoted qx,y represent transition rates from state x
to state y per million years. States 0 and 1 refer to observed states (i.e., 0 = temperateness/staminal
lever absence/herbaceousness, 1 = tropicality/lever presence/woodiness) while states A and B re-
fer to “slow” (i.e., lower rate of corolla length evolution) and “fast” hidden states, respectively.
Parameters denoted σ 2
xw refer to rates of log-transformed corolla length evolution (i.e., increase in
variance per million years) in observed state x and hidden state w. Lastly, ε denotes “tip error”–the
inferred standard deviation of (presumably normally-distributed) log-transformed corolla length
measurements across all tips due to measurement error and/or within-tip phenotypic variation.

discrete trait

parameter
q0,1
q1,0
qA,B
qB,A
σ 2
0A
σ 2
0B
σ 2
1A
σ 2
1B
ε

strict trop.
0.018
0.016
0.018
0.013
0.009
0.011
0.022
0.025
0.292

conserv. trop.
0.008
0.007
0.014
0.015
0.009
0.011
0.026
0.027
0.287

lever pres. woodiness

0.024
0.024
0.034
0.011
0.006
0.019
0.007
0.020
0.303

0.040
0.076
0.032
0.010
0.007
0.020
0.006
0.021
0.303

3.4 Discussion

Here, we outlined and demonstrated the capabilities of a new approach for inferring how dis-

crete variables affect continuous trait evolution dynamics based on a novel pruning algorithm for

directly calculating the likelihood of phylogenetic comparative data under a joint evolutionary pro-

cess of both discrete state and continuous trait evolution. Unlike other methods for inferring these

SCE models, our approach avoids relying on explicit reconstructions of state histories, rendering

the method relatively efficient and convenient to use. Overall, our simulation study verifies that our

new framework for fitting SCE models not only yields largely accurate parameter estimates, but

also exhibits both acceptable error rates and relatively high power for detecting variation in rates of

continuous trait evolution. A particular strength of our framework is its ability to account for resid-

ual heterogeneity in continuous trait evolution dynamics caused by unobserved discrete variables

186

or hidden states, enabling more accurate and robust parameter inference and evolutionary hypothe-

sis testing (May and Moore, 2020; Boyko et al., 2023b; Tribble et al., 2023). Our simulation study

concretely demonstrate this fact, showing that residual heterogeneity in rates of continuous trait

evolution are frequently mistaken for state-dependent heterogeneity when only constant-rate and

completely state-dependent models are considered (Fig. 2.2).

3.4.1

Increased rates of flower size evolution among tropical sages

Using our new SCE modeling framework, we found that tropical sage lineages exhibit higher

rates of flower size evolution than temperate lineages, consistent with predictions of the BIH. In

fact, the two main subclades predominantly consisting of tropical sage taxa in our analyses (Fig.

3.4) are rather unusual among sage lineages for exhibiting multiple evolutionary shifts from bee

to bird pollination (Kriebel et al., 2019), and bird pollination has been explicitly linked to larger

flower sizes in sages (Wester et al., 2020; Moein et al., 2023). While more research is needed

to confidently determine what mechanisms underlie elevated rates of flower size evolution among

tropical sages, this association between tropical environments and elevated rates of both pollinator

interaction and flower size evolution among sages is rather striking and certainly seems to agree

with some of the key predictions of the BIH. Alternatively, such elevated rates of flower size evolu-

tion among tropical sage lineages may result from higher lineage diversification rates in the tropics

driving increased incomplete lineage sorting and/or hybridization among sages. A single species

tree is cannot fully describe the complex interrelationships of clades exhibiting high rates of in-

trogression and hybridization, distorting expected patterns of phenotypic similarity across species

and inflating evolutionary rate estimates (Mendes et al., 2018; Hibbins and Hahn, 2021; Hibbins

et al., 2023). While our new method cannot explicitly account for these “reticulate” evolutionary

processes, one can approximately model them by integrating comparative analyses over a sample

of “gene trees” (Hibbins et al., 2023). To this end, our new approach to inferring SCE models the-

oretically makes integrating comparative analyses over multiple possible tree topologies easier by

avoiding repeated sampling of discrete state histories over different topologies. This should make

sampling parameters that fit the observed data under a variety of different topologies much easier

187

and computationally feasible, as the most likely parameters of a continuous trait evolution model

can vary widely depending on the assumed state history, an issue presumably only exaggerated

by different overall tree topologies (e.g., Caetano and Harmon, 2017; Boyko et al., 2023b). In

any case, future ecological and microevolutionary work comparing pollination interactions, selec-

tion on floral traits, and/or population genetic patterns between tropical and temperate sage taxa

may help more precisely elucidate the mechanisms driving increased rates of flower size evolution

among tropical sages.

While much macroevolutionary research has investigated whether speciation and/or extinction

rates differ among temperate and tropical lineages, comparatively little research has examined

whether rates of phenotypic evolution are elevated among tropical lineages. Those that have in-

vestigated this question have yielded mixed results, from those broadly in agreement with the BIH

(Schumm et al., 2019; Chartier et al., 2021), to those finding no consistent differences in rates

among temperate/tropical lineages (Drury et al., 2021), to even those finding opposing patterns

of increased rates of trait evolution in temperate ecosystems (Hipsley et al., 2014). Our results

here notably agree with a previous study demonstrating that heathers and allies (order Ericales)

exhibit greater floral morphological diversity towards the tropics (Chartier et al., 2021). However,

Chartier et al. (2021) did not conduct any phylogenetic comparative analyses, and it remains un-

clear what evolutionary processes generated these apparent patterns. To our knowledge, this study

is the first to explicitly compare rates of floral morphology evolution across tropical and temperate

environments. Notably, while sages and heathers diverged from one another quite some time ago

(molecular and fossil evidence roughly suggest the mid to early Cretaceous, some 90-120 million

years ago; Zhang et al., 2020), they are both members of the larger Asterid clade. To determine

whether these results reflect more general patterns, future studies should investigate whether rates

of floral trait evolution and/or floral morphological diversity increase towards the tropics not only

in other Asterid clades but also more broadly across the Angiosperm phylogeny (e.g., Rosids,

Monocots, Magnoliids).

188

3.4.2 Relationship to previous methods and possible extensions

This is not the first approach to joint inference of state histories and continuous trait evolution

processes from phylogenetic comparative data. Theoretically, sequential approximations converge

to a truly joint modeling approach if the likelihood of a continuous trait evolution model is averaged

over a sufficiently representative sample of state histories. Generating such a sample, however, is

not trivial, as most sampling methods (e.g., simmapping; Nielsen, 2002; Bollback, 2006; Revell,

2013) generate state histories associated with extremely low likelihoods that barely contribute to

the overall average (Boyko et al., 2023b). Nonetheless, previous researchers have developed both

effective Bayesian approaches (Caetano et al., 2018; May and Moore, 2020; Quintero and Landis,

2020) and clever greedy algorithms (Boyko et al., 2023b) for achieving just this. While our new

SCE framework generally offers greater computational efficiency and, in some respects, flexibility

compared to previous approaches, these alternative methods still offer some important strengths

worth considering.

While our approach could be extended to multivariate continuous traits quite easily via the use

of multidimensional FFTs, the computational complexity of our pruning algorithm would unfor-

tunately scale poorly. In particular, each additional trait increases the number of grid points to

consider by a factor equal to the specified grid resolution–for example, two traits each coarsely

discretized into just 256 grid points would together still require 256 × 256 = 65, 536 grid points.

We believe sparse/adaptive grids (Brumm and Scheidegger, 2017) offer a possible workaround to

this issue, but is far beyond the scope of the current paper. However, methods like ratematrix

(Caetano et al., 2018) and MuSSCRat (May and Moore, 2020) are already capable of accommo-

dating several or more continuous traits. Additionally, while our SCE framework can model a

wide array of possible evolutionary dynamics via Lévy processes, it is not easily extendable to

more “adaptive” models of trait evolution like Ornstein Uhlenbeck (OU; Butler and King, 2004)

and Fokker-Planck-Komologrov processes (Boucher et al., 2018). Unlike Lévy processes, these

processes do not admit a characteristic exponent representation and are thus not directly compati-

ble with our current approach. Fortunately, the recently-developed hOUwie is capable of inferring

189

state-dependent OU processes via a clever stochastic algorithm for approximating the likelihood

of the model rather efficiently despite being based on a sequential approximation.

In general, we believe our method offers the most straight-forward and computationally-

efficient modeling approach in the case of univariate continuous traits evolving under more “drift-

like” processes (e.g., Brownian motion, “pulsed” evolution; see Landis et al., 2013; Landis and

Schraiber, 2017). More broadly, however, we believe our novel algorithm establishes a helpful

mathematical link between popular discrete and continuous trait evolution models. Future re-

search could build off this framework to develop additional SCE models tailored to a variety of

applications–for example, joint inference of geographic history and its effect on trait evolution

(e.g., Goldberg et al., 2011; Caetano et al., 2018) or even jointly modeling the influence of a con-

tinuous variable on continuous trait evolution (e.g., FitzJohn, 2010). Another interesting–though

perhaps more challenging–avenue for development would be uniting continuous, discrete, and lin-

eage diversification models under a single framework, which would be possible by combining our

algorithm with those used for SSE models–particularly the quantitative SSE model (i.e., quasse;

FitzJohn, 2010). One potentially useful application of such a framework would be modeling more

dynamic interactions between speciation, extinction, and continuous trait evolution using insights

from cladogenetic SSE models (e.g., Goldberg and Igi´c, 2012; see also Bokma, 2008).

3.4.3 Conclusion

Macroevolutionary researchers still remain limited in their ability to rigorously detect and quan-

tify variation in continuous trait evolution dynamics, despite the ubiquity of evolutionary hetero-

geneity across the tree of life. Our new SCE modeling framework allows for joint inference of

discrete state histories and their influence of the evolution of a continuous trait. Such states could

represent different habitats, reproductive strategies, or even entirely unobserved variables/hidden

states used to model generic background and/or residual heterogeneity in continuous trait evolution

dynamics. By using this method to fit and compare several candidate models, researchers can eas-

ily and robustly test a potentially wide variety of hypotheses regarding what factors are associated

with shifts in the tempo or mode of continuous trait evolution. Furthermore, the mathematical basis

190

of our approach presents numerous opportunities for further elaboration, extension, and connection

with other methods going forward. Ultimately, we believe our method fills an important gap among

phylogenetic comparative methods and will benefit a broad array of both method developers and

empirical macroevolutionary researchers alike.

191

BIBLIOGRAPHY

Bartoszek K., Pienaar J., Mostad P., Andersson S., and Hansen T.F. 2012. A phylogenetic compar-

ative method for studying multivariate adaptation. J Theor Biol 314:204–215.

Beaulieu J.M. and O’Meara B.C. 2016. Detecting hidden diversification shifts in models of trait-

dependent speciation and extinction. Syst Biol 65:583–601.

Beaulieu J.M., O’Meara B.C., and Donoghue M.J. 2013. Identifying hidden rate changes in the
evolution of a binary morphological character: The evolution of plant habit in campanulid an-
giosperms. Syst Biol 62:725–737.

Bokma F. 2008. Detection of “punctuated equilibrium” by Bayesian estimation of speciation and
extinction rates, ancestral character states, and rates of anagenetic and cladogenetic evolution on
a molecular phylogeny. Evolution 62:2718–2726.

Bollback J.P. 2006. SIMMAP: Stochastic character mapping of discrete traits on phylogenies.

BMC Bioinformatics 7:88.

Boucher F.C. and Démery V. 2016. Inferring bounded evolution in phenotypic characters from

phylogenetic comparative data. Syst Biol 65:651–661.

Boucher F.C., Démery V., Conti E., Harmon L.J., and Uyeda J. 2018. A general model for estimat-

ing macroevolutionary landscapes. Syst Biol 67:304–319.

Bowman J.C. and Roberts M. 2011. Efficient dealiased convolutions without padding. SIAM J.

Sci. Comput. 33:386–406.

Boyko J.D. and Beaulieu J.M. 2021. Generalized hidden Markov models for phylogenetic compar-

ative datasets. Methods Ecol Evol 12:468–478.

Boyko J.D. and Beaulieu J.M. 2023. Reducing the biases in false correlations between discrete

characters. Syst Biol 72:476–488.

Boyko J.D., Hagen E.R., Beaulieu J.M., and Vasconcelos T. 2023a. The evolutionary responses of
life-history strategies to climatic variability in flowering plants. New Phytol 240:1587–1600.

Boyko J.D., O’Meara B.C., and Beaulieu J.M. 2023b. A novel method for jointly modeling the

evolution of discrete and continuous traits. Evolution 77:836–851.

Brock Fenton M. and Simmons N.B. 2015. Bats. University of Chicago Press, Chicago, IL.

Brumm J. and Scheidegger S. 2017. Using adaptive sparse grids to solve high-dimensional dy-

namic models. Econometrica 85:1575–1612.

192

Burnham K.P. and Anderson D.R. 2002. Information and likelihood theory: A basis for model
selection and inference. Pages 49–97 in Model Selection and Multimodel Inference: A Practical
information-theoretic Approach. Springer New York, New York, NY.

Butler M.A. and King A.A. 2004. Phylogenetic comparative analysis: A modeling approach for

adaptive evolution. Am Nat 164:683–695.

Caetano D.S. and Harmon L.J. 2017. ratematrix: An R package for studying evolutionary integra-

tion among several traits on phylogenetic trees. Methods Ecol Evol 8:1920–1927.

Caetano D.S., O’Meara B.C., and Beaulieu J.M. 2018. Hidden state models improve state-
dependent diversification approaches, including biogeographical models. Evolution 72:2308–
2324.

Chartier M., von Balthazar M., Sontag S., Löfstrand S., Palme T., Jabbour F., Sauquet H., and
Schönenberger J. 2021. Global patterns and a latitudinal gradient of flower disparity: Perspec-
tives from the angiosperm order Ericales. New Phytol 230:821–831.

Cooper N. and Purvis A. 2009. What factors shape rates of phenotypic evolution? A comparative

study of cranial morphology of four mammalian clades. J Evol Biol 22:1024–1035.

Crepet W.L. and Niklas K.J. 2009. Darwin’s second ‘abominable mystery’: Why are there so many

angiosperm species? Am J Bot 96:366–381.

DasGupta A. 2011. Characteristic functions and applications. Pages 293–322 in Probability for
Statistics and Machine Learning: Fundamentals and Advanced Topics. Springer New York, New
York, NY.

Davies T.J., Barraclough T.G., Chase M.W., Soltis P.S., Soltis D.E., and Savolainen V. 2004. Dar-
win’s abominable mystery: Insights from a supertree of the angiosperms. Proc Natl Acad Sci
USA 101:1904–1909.

Diamond J. and Roy D. 2023. Patterns of functional diversity along latitudinal gradients of species

richness in eleven fish families. Glob Ecol Biogeogr 32:450–465.

Dobzhansky T. 1950. Evolution in the tropics. Am Sci 38:209–221.

Donoghue M.J. and Sanderson M.J. 2015. Confluence, synnovation, and depauperons in plant

diversification. New Phytol 207:260–274.

Drury J.P., Clavel J., Tobias J.A., Rolland J., Sheard C., and Morlon H. 2021. Tempo and mode of

morphological evolution are decoupled from latitude in birds. PLoS Biol 19:e3001270.

Eastman J.M., Alfaro M.E., Joyce P., Hipp A.L., and Harmon L.J. 2011. A novel comparative
method for identifying shifts in the rate of character evolution on trees. Evolution 65:3578–

193

3589.

Eddelbuettel D. and Francois R. 2011. Rcpp: Seamless R and C++ integration. J Stat Softw 40:1–

18.

Eddelbuettel D. and Sanderson C. 2014. RcppArmadillo: Accelerating R with high-performance

C++ linear algebra. Comput Stat Data Anal 71:1054–1063.

Feeley K.J. and Stroud J.T. 2018. Where on Earth are the “tropics”? Front Biogeogr 10:e38649.

Felsenstein J. 1973. Maximum likelihood and minimum-steps methods for estimating evolutionary

trees from data on discrete characters. Syst Zool 22:240–249.

Felsenstein J. 1985. Phylogenies and the comparative method. Am Nat 125:1–15.

Felsenstein J. 2012. A comparative method for both discrete and continuous characters using the

threshold model. Am Nat 179:145–156.

Fick S.E. and Hijmans R.J. 2017. WorldClim 2: New 1-km spatial resolution climate surfaces for

global land areas. Int J Climatol 37:4302–4315.

FitzJohn R.G. 2010. Quantitative traits and diversification. Syst Biol 59:619–633.

FitzJohn R.G. 2012. Diversitree: Comparative phylogenetic analyses of diversification in R. Meth-

ods Ecology Evol 3:1084–1092.

FitzJohn R.G., Maddison W.P., and Otto S.P. 2009. Estimating trait-dependent speciation and ex-

tinction rates from incompletely resolved phylogenies. Syst Biol 58:595–611.

Freyman W.A. and Höhna S. 2018. Cladogenetic and anagenetic models of chromosome number

evolution: A Bayesian model averaging approach. Syst Biol 67:195–215.

Frigo M. and Johnson S.G. 2005. The design and implementation of FFTW3. Proc IEEE Inst Electr

Electron Eng 93:216–231.

Gingerich P.D. 2009. Rates of evolution. Annu Rev Ecol Evol Syst 40:657–675.

Goldberg E.E. and Foo J. 2020. Memory in trait macroevolution. Am Nat 195:300–314.

Goldberg E.E. and Igi´c B. 2012. Tempo and mode in plant breeding system evolution. Evolution

66:3701–3709.

Goldberg E.E., Lancaster L.T., and Ree R.H. 2011. Phylogenetic inference of reciprocal effects

between geographic range evolution and diversification. Syst Biol 60:451–465.

194

Hansen T.F., Pienaar J., and Orzack S.H. 2008. A comparative method for studying adaptation to

a randomly evolving environment. Evolution 62:1965–1977.

Harmon L.J., Pennell M.W., Francisco Henao-Diaz L., Rolland J., Sipley B.N., and Uyeda J.C.
2021. Causes and consequences of apparent timescaling across all estimated evolutionary rates.
Annu Rev Ecol Evol Syst 52:587–609.

Hassler G., Tolkoff M.R., Allen W.L., Ho L.S.T., Lemey P., and Suchard M.A. 2022a. Inferring
phenotypic trait evolution on large trees with many incomplete measurements. J Am Stat Assoc
117:678–692.

Hassler G.W., Gallone B., Aristide L., Allen W.L., Tolkoff M.R., Holbrook A.J., Baele G., Lemey
P., and Suchard M.A. 2022b. Principled, practical, flexible, fast: A new approach to phylogenetic
factor analysis. Methods Ecol Evol 13:2181–2197.

Helmstetter A.J., Zenil-Ferguson R., Sauquet H., Otto S.P., Méndez M., Vallejo-Marin M., Schö-
nenberger J., Burgarella C., Anderson B., de Boer H., Glémin S., and Käfer J. 2023. Trait-
dependent diversification in angiosperms: Patterns, models and data. Ecol Lett 26:640–657.

Herrera-Alsina L., van Els P., and Etienne R.S. 2019. Detecting the dependence of diversification

on multiple traits from phylogenetic trees and trait data. Syst Biol 68:317–328.

Hibbins M.S., Breithaupt L.C., and Hahn M.W. 2023. Phylogenomic comparative methods: Accu-
rate evolutionary inferences in the presence of gene tree discordance. Proc Natl Acad Sci USA
120:e2220389120.

Hibbins M.S. and Hahn M.W. 2021. The effects of introgression across thousands of quantitative

traits revealed by gene expression in wild tomatoes. PLoS Genet 17:e1009892.

Hijmans R.J., Barbosa M., Ghosh A., and Mandel A. 2023. geodata: download geographic data. R

package version 0.5-9.

Hillebrand H. 2004. On the generality of the latitudinal diversity gradient. Am Nat 163:192–211.

Hipsley C.A., Miles D.B., and Müller J. 2014. Morphological disparity opposes latitudinal diver-

sity gradient in lacertid lizards. Biol Lett 10:20140101.

Hiscott G., Fox C., Parry M., and Bryant D. 2016. Efficient recycled algorithms for quantitative

trait models on phylogenies. Genome Biol. Evol. 8:1338–1350.

Jablonski D. 2017. Approaches to macroevolution: 2. Sorting of variation, some overarching is-

sues, and general conclusions. Evol Biol 44:451–475.

Johnson S.G. 2021. The NLopt nonlinear-optimization package. Version 2.7.1.

195

Kittle A.M., Fryxell J.M., Desy G.E., and Hamr J. 2008. The scale-dependent impact of wolf
predation risk on resource selection by three sympatric ungulates. Oecologia 157:163–175.

Kriebel R., Drew B.T., Drummond C.P., González-Gallegos J.G., Celep F., Mahdjoub M.M., Rose
J.P., Xiang C.L., Hu G.X., Walker J.B., Lemmon E.M., Lemmon A.R., and Sytsma K.J. 2019.
Tracking temporal shifts in area, biomes, and pollinators in the radiation of Salvia (sages)
across continents: Leveraging anchored hybrid enrichment and targeted sequence data. Am J
Bot 106:573–597.

Landis M.J., Matzke N.J., Moore B.R., and Huelsenbeck J.P. 2013. Bayesian analysis of biogeog-

raphy when the number of areas is large. Syst Biol 62:789–804.

Landis M.J. and Schraiber J.G. 2017. Pulsed evolution shaped modern vertebrate body sizes. Proc

Natl Acad Sci USA 114:13224–13229.

Maddison W.P., Midford P.E., and Otto S.P. 2007. Estimating a binary character’s effect on speci-

ation and extinction. Syst Biol 56:701–710.

Magnuson-Ford K. and Otto S.P. 2012. Linking the investigations of character evolution and

species diversification. Am Nat 180:225–245.

Martin B.S., Bradburd G.S., Harmon L.J., and Weber M.G. 2023. Modeling the evolution of rates

of continuous trait evolution. Syst Biol 72:590–605.

Martins E.P. and Hansen T.F. 1997. Phylogenies and the comparative method: A general ap-
proach to incorporating phylogenetic information into the analysis of interspecific data. Am
Nat 149:646–667.

May M.R. and Moore B.R. 2020. A Bayesian approach for inferring the impact of a discrete char-
acter on rates of continuous-character evolution in the presence of background-rate variation.
Syst Biol 69:530–544.

Mendes F.K., Fuentes-González J.A., Schraiber J.G., and Hahn M.W. 2018. A multispecies coa-

lescent model for quantitative traits. Elife 7.

Moein F., Jamzad Z., Rahiminejad M., Landis J.B., Mirtadzadini M., Soltis D.E., and Soltis P.S.
2023. Towards a global perspective for Salvia L.: Phylogeny, diversification and floral evolution.
J Evol Biol 36:589–604.

Moler C. and Van Loan C. 2003. Nineteen dubious ways to compute the exponential of a matrix,

twenty-five years later. SIAM Rev Soc Ind Appl Math 45:3–49.

Nakov T., Beaulieu J.M., and Alverson A.J. 2019. Diatoms diversify and turn over faster in fresh-

water than marine environments. Evolution 73:2497–2511.

196

Nielsen R. 2002. Mapping mutations on phylogenies. Syst Biol 51:729–739.

Pagel M. 1994. Detecting correlated evolution on phylogenies: A general method for the compar-

ative analysis of discrete characters. Proc R Soc B 255:37–45.

Pagel M., O’Donovan C., and Meade A. 2022. General statistical model shows that macroevolu-
tionary patterns and processes are consistent with Darwinian gradualism. Nat Commun 13:1113.

Pinsky M.A. 2009. Fourier transforms on the line and space. Pages 89–167 in Introduction to
Fourier Analysis and Wavelets (D. Cox, S. G. Krantz, R. Mazzeo, and M. Scharlemann, eds.)
vol. 102 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI.

Quintero I. and Landis M.J. 2020. Interdependent phenotypic and biogeographic evolution driven

by biotic interactions. Syst Biol 69:739–755.

Rabosky D.L., Donnellan S.C., Grundler M., and Lovette I.J. 2014. Analysis and visualization of
complex macroevolutionary dynamics: An example from Australian scincid lizards. Syst Biol
63:610–627.

Rabosky D.L. and Goldberg E.E. 2017. FiSSE: A simple nonparametric test for the effects of a

binary character on lineage diversification rates. Evolution 71:1432–1442.

Revell L.J. 2012. phytools: an R package for phylogenetic comparative biology (and other things).

Methods Ecol Evol 3:217–223.

Revell L.J. 2013. A comment on the use of stochastic character maps to estimate evolutionary rate

variation in a continuously valued trait. Syst Biol 62:339–345.

Sanderson C. and Curtin R. 2016. Armadillo: A template-based C++ library for linear algebra. J

Open Source Softw 1:26.

Sanderson C. and Curtin R. 2019. Practical sparse matrices in C++ with hybrid storage and

template-based expression optimisation. Math Comput Appl 24:70.

Sato K.I. 2013. Characterization and existence of Lévy and additive processes. Pages 31–68 in
Lévy Processes and Infinitely Divisible Distributions (B. Bollobás, W. Fulton, A. Katok, F. Kir-
wan, P. Sarnak, B. Simon, and B. Totaro, eds.) vol. 68 of Cambridge Studies in Advanced Math-
ematics. Cambridge University Press, Cambridge, UK.

Saupe E.E. 2023. Explanations for latitudinal diversity gradients must invoke rate variation. Proc

Natl Acad Sci USA 120:e2306220120.

Sauquet H. and Magallón S. 2018. Key questions and challenges in angiosperm macroevolution.

New Phytol 219:1170–1187.

197

Schemske D.W. 2001. Biotic interactions and speciation in the tropics. Pages 219–239 in Specia-
tion and Patterns of Diversity (R. Butlin, J. Bridle, and D. Schluter, eds.). Cambridge University
Press, Cambridge, UK.

Schemske D.W., Mittelbach G.G., Cornell H.V., Sobel J.M., and Roy K. 2009. Is there a latitudinal

gradient in the importance of biotic interactions? Annu Rev Ecol Evol Syst 40:245–269.

Schumm M., Edie S.M., Collins K.S., Gómez-Bahamón V., Supriya K., White A.E., Price T.D.,
and Jablonski D. 2019. Common latitudinal gradients in functional richness and functional even-
ness across marine and terrestrial systems. Proc R Soc B 286:20190745.

Simpson G.G. 1944. Tempo and Mode in Evolution. Columbia University Press, New York, NY.

Stan Development Team . 2019. Stan Modeling Language Users Guide and Reference Manual.

Version 2.21.0.

Stevens R.D., Willig M.R., and Strauss R.E. 2006. Latitudinal gradients in the phenetic diversity

of New World bat communities. Oikos 112:41–50.

Stork N.E., McBroom J., Gely C., and Hamilton A.J. 2015. New approaches narrow global species
estimates for beetles, insects, and terrestrial arthropods. Proc Natl Acad Sci USA 112:7519–
7523.

Thomas G.H. and Freckleton R.P. 2012. MOTMOT: Models of trait macroevolution on trees. Meth-

ods Ecol Evol 3:145–151.

Thompson J.N. 2005. The Geographic Mosaic of Coevolution. University of Chicago Press,

Chicago, IL.

Tolkoff M.R., Alfaro M.E., Baele G., Lemey P., and Suchard M.A. 2018. Phylogenetic factor

analysis. Syst Biol 67:384–399.

Tribble C.M., May M.R., Jackson-Gain A., Zenil-Ferguson R., Specht C.D., and Rothfels C.J.
2023. Unearthing modes of climatic adaptation in underground storage organs across Liliales.
Syst Biol 72:198–212.

Uyeda J.C., Zenil-Ferguson R., and Pennell M.W. 2018. Rethinking phylogenetic comparative

methods. Syst Biol 67:1091–1109.

Vasconcelos T., O’Meara B.C., and Beaulieu J.M. 2022. A flexible method for estimating tip diver-
sification rates across a range of speciation and extinction scenarios. Evolution 76:1420–1433.

Verboom G.A., Boucher F.C., Ackerly D.D., Wootton L.M., and Freyman W.A. 2020. Species

selection regime and phylogenetic tree shape. Syst Biol 69:774–794.

198

Wester P., Cairampoma L., Haag S., Schramme J., Neumeyer C., and Claßen-Bockhoff R. 2020.
Bee exclusion in bird-pollinated Salvia flowers: The role of flower color versus flower construc-
tion. Int J Plant Sci 181:770–786.

Yang Z. 2007. PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586–

1591.

Ypma J., Johnson S.G., Stamm A., Borchers H.W., Eddelbuettel D., Ripley B., Hornik K., Chiquet
J., Adler A., Dai X., and Ooms J. 2022. nlotpr: R Interface to NLOPT. R package version 2.0.3.

Zhang C., Zhang T., Luebert F., Xiang Y., Huang C.H., Hu Y., Rees M., Frohlich M.W., Qi J.,
Weigend M., and Ma H. 2020. Asterid phylogenomics/phylotranscriptomics uncover morpho-
logical evolutionary histories and support phylogenetic placement for numerous whole-genome
duplications. Mol Biol Evol 37:3188–3210.

Zizka A., Silvestro D., Andermann T., Azevedo J., Duarte Ritter C., Edler D., Farooq H., Herdean
A., Ariza M., Scharn R., Svanteson S., Wengstrom N., Zizka V., and Antonelli A. 2019. Coordi-
nateCleaner: Standardized cleaning of occurrence records from biological collection databases.
Methods Ecol Evol 10:744–751.

199

APPENDIX 3A

SUPPLEMENTAL TABLES AND FIGURES

Figure 3A.1 The calculated isothermality (i.e., the ratio of seasonal variation in temperature to daily
temperature fluctuations; see Feeley and Stroud, 2018) ranges used to classify each tip/taxon in
the sage (Lamiaceae: Salvia L.) phylogeny for which we had geographic occurrence data as either
tropical (vertical line segments colored darker green) or temperate (lines colored lighter blue) under
our “strict” coding scheme (refer to subsection 3.2.4 for details). The lower and upper bounds
of each vertical line segment represent the empirical 5 and 95% quantiles, respectively, of the
isothermality values associated with occurrence records for a single taxon. The dashed horizontal
line depicts the position where seasonal and daily temperature variation are equal–here, taxa with
range midpoints below and above this line were considered tropical and temperate, respectively.

200

strict classificationisothermalitytip1/161/414statetemperateambiguoustropicalFigure 3A.2 The widths of calculated isothermality (i.e., the ratio of seasonal variation in temper-
ature to daily temperature fluctuations; see Feeley and Stroud, 2018) ranges for each tip/taxon in
the sage (Lamiaceae: Salvia L.) phylogeny for which we had geographic occurrence data, plot-
ted against the number of occurrence records associated with each tip on the x-axis. Widths are
given by the difference between the natural log-transformed empirical 5 and 95% quantiles of
the isothermality values associated with occurrence records for a single taxon. Note that, while
the range widths generally increase among taxa with more occurrence records (i.e., because the
geographic distribution of such taxa are better-sampled), this trend largely “levels” out for taxa
associated with more than 20 records. For our “conservative” coding of tropicality versus tem-
perateness (refer to subsection 3.2.4 for details), range widths for taxa with 20 or fewer records
were symmetrically expanded (on the log scale) to the mean width among taxa with more than 20
records, which came out to ∼0.7 (corresponding to 1 to 2-fold range of isothermality values).

201

0.00.51.01.52.02.53.0number of occurrence recordslog isothermality range width152010050020001000050000Figure 3A.3 The calculated isothermality (i.e., the ratio of seasonal variation in temperature to
daily temperature fluctuations; see Feeley and Stroud, 2018) ranges used to classify each tip/taxon
in the sage (Lamiaceae: Salvia L.) phylogeny for which we had geographic occurrence data as
either tropical (vertical line segments colored darker green), temperate (lines colored lighter blue),
or ambiguous (lines colored gray) under our “conservative” coding scheme (refer to subsection
3.2.4 for details). Generally, the lower and upper bounds of each vertical line segment represent the
empirical 5 and 95% quantiles, respectively, of the isothermality values associated with occurrence
records for a single taxon. However, the widths of isothermality ranges for any undersampled taxa
(i.e., 20 or fewer occurrence records) were expanded to at least the mean range width of well-
sampled taxa (i.e., more than 20 records; see Fig. 3A.2). The dashed horizontal lines depict
the positions where annual temperature variation is 25% lower and higher than daily temperature
variation. Here, taxa were only considered tropical if their isothermality range’s midpoint and
upper bound both fell below the lower and higher dashed lines, respectively. Similarly, taxa were
only considered temperate if their range’s midpoint and lower bound both exceeded the higher
and lower dashed lines, respectively. All other taxa failed to exhibit a strong preference for either
tropical or temperate environments and were thus considered ambiguous.

202

conservative classificationisothermalitytip1/161/414statetemperateambiguoustropicalFigure 3A.4 Phylogram depicting model-averaged marginal ancestral rate and state estimates
(based on sample size corrected Akaike Information Criterion–or AICc–weights) based on our
joint analysis of corolla length evolution and temperate-tropical transitions (conservatively-coded;
refer to subsection 3.2.4 for details) among sages (Lamiaceae: Salvia L.). The color of branches
correspond to inferred rates of corolla length evolution, with darker, cooler and lighter, warmer
colors denoting relatively slow and fast rates, respectively. Pie charts at select nodes depict the
probability that a given node tended to occur in either temperate (light blue) or tropical (dark green)
environments. Because we lacked data for some tips and accounted for uncertainty in tropicality
and corolla lengths, we also depict inferred tropicality probabilities (via colored boxes; light gray
indicates even chances of being tropical or temperate) and 95% confidence intervals on corolla
lengths (via gray bars) arrayed along the tips.

203

0.0040.0070.0110.0180.030rate of log corolla length evolution(variance per million years)41664corolla length(mm)statetemperateambiguoustropicalFigure 3A.5 Phylogram depicting model-averaged marginal ancestral rate and state estimates
(based on sample size corrected Akaike Information Criterion–or AICc–weights) based on our
joint analysis of corolla length and staminal lever evolution among sages (Lamiaceae: Salvia L.).
The color of branches correspond to inferred rates of corolla length evolution, with darker, cooler
and lighter, warmer colors denoting relatively slow and fast rates, respectively. Pie charts at se-
lect nodes depict the probability that a given node either possessed (light yellow) or lacked (dark
purple) staminal levers. Because we lacked data for some tips and accounted for uncertainty in
tropicality and corolla lengths, we also depict inferred staminal lever presence probabilities (via
colored boxes; light gray indicates even chances of having or lacking staminal levers) and 95%
confidence intervals on corolla lengths (via gray bars) arrayed along the tips.

204

0.0040.0070.0110.0180.030rate of log corolla length evolution(variance per million years)41664corolla length(mm)statelever absentambiguouslever presentFigure 3A.6 Phylogram depicting model-averaged marginal ancestral rate and state estimates
(based on sample size corrected Akaike Information Criterion–or AICc–weights) based on our
joint analysis of corolla length and woodiness evolution among sages (Lamiaceae: Salvia L.). The
color of branches correspond to inferred rates of corolla length evolution, with darker, cooler and
lighter, warmer colors denoting relatively slow and fast rates, respectively. Pie charts at select
nodes depict the probability that a given node was either woody (dark brown) or herbaceous (light
green). Because we lacked data for some tips and accounted for uncertainty in woodiness and
corolla lengths, we also depict inferred tropicality probabilities (via colored boxes; light gray indi-
cates even chances of being woody or herbaceous) and 95% confidence intervals on corolla lengths
(via gray bars) arrayed along the tips.

205

0.0040.0070.0110.0180.030rate of log corolla length evolution(variance per million years)41664corolla length(mm)stateherbaceousambiguouswoodyTable 3A.1 Sample size corrected Akaike Information Criterion (AICc) weights for joint models
of corolla length and discrete trait evolution among sages (Lamiaceae: Salvia L.), demonstrating
the support for alternative models for any given discrete trait. The rows correspond to alternative
models that differed in four assumptions corresponding to the four leftmost columns: 1) whether
rates of corolla length evolution vary independently of the observed discrete trait (i.e., whether the
model included “hidden states”; labeled indep?), 2) whether rates vary according to the observed
discrete trait (dep?), 3) whether transitions among states of the observed discrete trait occur at
unequal, “asymmetric” rates (obs. asym?), and 4) whether transition rates among hidden states
are asymmetric (hid. asym?). ✓and ××× symbols indicate which models made and did not make
these assumptions, respectively. The remaining four rightmost columns correspond to the different
discrete traits we analyzed: 1-2) alternative “strict” and “conservative” codings (refer to subsection
3.2.4 for details) of tropicality versus temperateness (strict/conserv. trop.), 3) the presence/absence
of staminal levers (lever pres.), and 4) woodiness versus herbaceousness (woodiness). Note that the
weights only describe the support for a given model for a given discrete trait, and are not directly
comparable across discrete traits.

model assumptions

discrete trait

indep?
×××
×××
✓
✓
×××
×××
✓
✓
✓
✓
✓
✓

dep?
×××
✓
×××
✓
×××
✓
×××
✓
×××
✓
×××
✓

obs. asym?
×××
×××
×××
×××
✓
✓
✓
✓
×××
×××
✓
✓

hid. asym?
—
—
×××
×××
—
—
×××
×××
✓
✓
✓
✓

strict trop.
1.9%
52.2%
2.4%
4.5%
0.9%
28.5%
1.2%
2.3%
2.3%
1.8%
1.1%
0.9%

conserv. trop.
1.0%
59.4%
1.2%
4.4%
0.4%
27.1%
0.5%
1.9%
1.2%
1.8%
0.5%
0.8%

lever pres. woodiness

16.8%
6.2%
21.7%
4.1%
6.0%
2.2%
7.7%
1.4%
20.6%
4.5%
7.3%
1.6%

2.4%
0.9%
3.1%
1.8%
18.4%
6.6%
23.4%
9.0%
2.9%
1.2%
22.1%
8.3%

206

Table 3A.2 Parameter estimates based on our joint analysis of corolla length evolution and temperate-tropical transitions (strictly-
coded; refer to subsection 3.2.4 for details) among sages (Lamiaceae: Salvia L.). The rows correspond to alternative models that
differed in four assumptions corresponding to the four leftmost columns: 1) whether rates of corolla length evolution vary independently
of tropicality/temperateness (i.e., whether the model included “hidden states”; labeled indep?), 2) whether rates vary according to
tropicality/temperateness (dep?), 3) whether transitions into versus out of the tropics occur at unequal, “asymmetric” rates (obs. asym?),
and 4) whether transition rates among hidden states are asymmetric (hid. asym?). ✓and ××× symbols indicate which models made and did
not make these assumptions, respectively. The remaining 9 rightmost columns correspond to different parameters. Parameters denoted
qx,y represent transition rates from state x to state y per million years. States 0 and 1 refer to temperate and tropical states, respectively,
while states A and B refer to “slow” (i.e., lower rate of corolla length evolution) and “fast” hidden states, respectively. Parameters
denoted σ 2
xw refer to rates of log-transformed corolla length evolution (i.e., increase in variance per million years) in observed state x
and hidden state w. Lastly, ε denotes “tip error”–the inferred standard deviation of (presumably normally-distributed) log-transformed
corolla length measurements across all tips due to measurement error and/or within-tip phenotypic variation.

model assumptions

dep?
×××
✓
×××
✓
×××
✓
×××
✓
×××
✓
×××
✓

obs. asym?
×××
×××
×××
×××
✓
✓
✓
✓
×××
×××
✓
✓

hid. asym?
—
—
×××
×××
—
—
×××
×××
✓
✓
✓
✓

indep?
×××
×××
✓
✓
×××
×××
✓
✓
✓
✓
✓
✓

q0,1
0.017
0.017
0.017
0.017
0.019
0.020
0.019
0.020
0.017
0.017
0.019
0.020

0.021
0.007

q1,0
qA,B
0.017 —
0.017 —
0.017
0.017
0.014 —
0.013 —
0.014
0.014
0.017
0.017
0.014
0.013

0.021
0.007
0.048
0.000
0.048
0.000

207

parameters
σ 2
0A

σ 2
0B

σ 2
1B

σ 2
1A

ε

0.021
0.007

qB,A
— 0.014 0.014 0.014 0.014 0.316
— 0.010 0.010 0.026 0.026 0.290
0.004 0.023 0.004 0.023 0.300
0.000 0.012 0.000 0.027 0.293
— 0.014 0.014 0.014 0.014 0.316
— 0.010 0.010 0.026 0.026 0.290
0.004 0.023 0.004 0.023 0.300
0.000 0.012 0.000 0.028 0.293
0.003 0.022 0.003 0.022 0.295
0.006 0.017 0.028 0.021 0.296
0.003 0.022 0.003 0.022 0.295
0.006 0.017 0.028 0.022 0.296

0.021
0.007
0.000
0.035
0.000
0.035

Table 3A.3 Parameter estimates based on our joint analysis of corolla length evolution and temperate-tropical transitions (conservatively-
coded; refer to subsection 3.2.4 for details) among sages (Lamiaceae: Salvia L.). The rows correspond to alternative models that
differed in four assumptions corresponding to the four leftmost columns: 1) whether rates of corolla length evolution vary independently
of tropicality/temperateness (i.e., whether the model included “hidden states”; labeled indep?), 2) whether rates vary according to
tropicality/temperateness (dep?), 3) whether transitions into versus out of the tropics occur at unequal, “asymmetric” rates (obs. asym?),
and 4) whether transition rates among hidden states are asymmetric (hid. asym?). ✓and ××× symbols indicate which models made and did
not make these assumptions, respectively. The remaining 9 rightmost columns correspond to different parameters. Parameters denoted
qx,y represent transition rates from state x to state y per million years. States 0 and 1 refer to temperate and tropical states, respectively,
while states A and B refer to “slow” (i.e., lower rate of corolla length evolution) and “fast” hidden states, respectively. Parameters
denoted σ 2
xw refer to rates of log-transformed corolla length evolution (i.e., increase in variance per million years) in observed state x
and hidden state w. Lastly, ε denotes “tip error”–the inferred standard deviation of (presumably normally-distributed) log-transformed
corolla length measurements across all tips due to measurement error and/or within-tip phenotypic variation.

model assumptions

indep?
×××
×××
✓
✓
×××
×××
✓
✓
✓
✓
✓
✓

dep?
×××
✓
×××
✓
×××
✓
×××
✓
×××
✓
×××
✓

obs. asym?
×××
×××
×××
×××
✓
✓
✓
✓
×××
×××
✓
✓

hid. asym?
—
—
×××
×××
—
—
×××
×××
✓
✓
✓
✓

q0,1
0.008
0.008
0.008
0.008
0.009
0.009
0.009
0.009
0.008
0.008
0.009
0.009

0.021
0.007

q1,0
qA,B
0.008 —
0.008 —
0.008
0.008
0.006 —
0.006 —
0.006
0.006
0.008
0.008
0.006
0.006

0.021
0.010
0.048
0.000
0.048
0.000

208

parameters
σ 2
0A

σ 2
0B

σ 2
1B

σ 2
1A

ε

0.021
0.007

qB,A
— 0.014 0.014 0.014 0.014 0.316
— 0.010 0.010 0.028 0.028 0.286
0.004 0.023 0.004 0.023 0.300
0.000 0.011 0.000 0.029 0.290
— 0.014 0.014 0.014 0.014 0.316
— 0.010 0.010 0.028 0.028 0.286
0.004 0.023 0.004 0.023 0.300
0.000 0.012 0.027 0.028 0.288
0.003 0.022 0.003 0.022 0.295
0.006 0.017 0.030 0.023 0.293
0.003 0.022 0.003 0.022 0.295
0.006 0.017 0.030 0.023 0.293

0.021
0.010
0.000
0.036
0.000
0.036

Table 3A.4 Parameter estimates based on our joint analysis of corolla length and staminal lever evolution (strictly-coded; refer to subsec-
tion 3.2.4 for details) among sages (Lamiaceae: Salvia L.). The rows correspond to alternative models that differed in four assumptions
corresponding to the four leftmost columns: 1) whether rates of corolla length evolution vary independently of lever presence/absence
(i.e., whether the model included “hidden states”; labeled indep?), 2) whether rates vary according to lever presence/absence (dep?), 3)
whether gains and losses of levers occur at unequal, “asymmetric” rates (obs. asym?), and 4) whether transition rates among hidden
states are asymmetric (hid. asym?). ✓and ××× symbols indicate which models made and did not make these assumptions, respectively.
The remaining 9 rightmost columns correspond to different parameters. Parameters denoted qx,y represent transition rates from state x
to state y per million years. States 0 and 1 refer to the absence and presence of staminal levers, respectively, while states A and B refer
to “slow” (i.e., lower rate of corolla length evolution) and “fast” hidden states, respectively. Parameters denoted σ 2
xw refer to rates of
log-transformed corolla length evolution (i.e., increase in variance per million years) in observed state x and hidden state w. Lastly, ε
denotes “tip error”–the inferred standard deviation of (presumably normally-distributed) log-transformed corolla length measurements
across all tips due to measurement error and/or within-tip phenotypic variation.

model assumptions

dep?
×××
✓
×××
✓
×××
✓
×××
✓
×××
✓
×××
✓

obs. asym?
×××
×××
×××
×××
✓
✓
✓
✓
×××
×××
✓
✓

hid. asym?
—
—
×××
×××
—
—
×××
×××
✓
✓
✓
✓

indep?
×××
×××
✓
✓
×××
×××
✓
✓
✓
✓
✓
✓

q0,1
0.024
0.024
0.024
0.024
0.024
0.025
0.024
0.025
0.024
0.024
0.024
0.026

0.021
0.019

q1,0
qA,B
0.024 —
0.024 —
0.024
0.024
0.024 —
0.024 —
0.024
0.024
0.024
0.024
0.024
0.023

0.021
0.019
0.048
0.046
0.048
0.046

209

parameters
σ 2
0A

σ 2
0B

σ 2
1B

σ 2
1A

ε

0.021
0.019

qB,A
— 0.014 0.014 0.014 0.014 0.316
— 0.013 0.013 0.015 0.015 0.315
0.004 0.023 0.004 0.023 0.300
0.000 0.018 0.004 0.024 0.301
— 0.014 0.014 0.014 0.014 0.316
— 0.013 0.013 0.015 0.015 0.315
0.004 0.023 0.004 0.023 0.300
0.000 0.018 0.004 0.024 0.301
0.003 0.022 0.003 0.022 0.295
0.000 0.021 0.004 0.022 0.294
0.003 0.022 0.003 0.022 0.295
0.000 0.021 0.004 0.022 0.294

0.021
0.019
0.000
0.000
0.000
0.000

Table 3A.5 Parameter estimates based on our joint analysis of corolla length and woodiness evolution (strictly-coded; refer to subsection
3.2.4 for details) among sages (Lamiaceae: Salvia L.). The rows correspond to alternative models that differed in four assumptions cor-
responding to the four leftmost columns: 1) whether rates of corolla length evolution vary independently of woodiness/herbaceousness
(i.e., whether the model included “hidden states”; labeled indep?), 2) whether rates vary according to woodiness/herbaceousness (dep?),
3) whether transitions to woody and herbaceous habits occur at unequal, “asymmetric” rates (obs. asym?), and 4) whether transition rates
among hidden states are asymmetric (hid. asym?). ✓and ××× symbols indicate which models made and did not make these assumptions,
respectively. The remaining 9 rightmost columns correspond to different parameters. Parameters denoted qx,y represent transition rates
from state x to state y per million years. States 0 and 1 refer to herbaceous and woody states, respectively, while states A and B refer
to “slow” (i.e., lower rate of corolla length evolution) and “fast” hidden states, respectively. Parameters denoted σ 2
xw refer to rates of
log-transformed corolla length evolution (i.e., increase in variance per million years) in observed state x and hidden state w. Lastly, ε
denotes “tip error”–the inferred standard deviation of (presumably normally-distributed) log-transformed corolla length measurements
across all tips due to measurement error and/or within-tip phenotypic variation.

model assumptions

dep?
×××
✓
×××
✓
×××
✓
×××
✓
×××
✓
×××
✓

obs. asym?
×××
×××
×××
×××
✓
✓
✓
✓
×××
×××
✓
✓

hid. asym?
—
—
×××
×××
—
—
×××
×××
✓
✓
✓
✓

indep?
×××
×××
✓
✓
×××
×××
✓
✓
✓
✓
✓
✓

q0,1
0.051
0.051
0.051
0.051
0.038
0.038
0.038
0.039
0.051
0.051
0.038
0.038

0.021
0.017

q1,0
qA,B
0.051 —
0.051 —
0.051
0.051
0.080 —
0.079 —
0.080
0.079
0.051
0.051
0.080
0.079

0.021
0.017
0.048
0.037
0.048
0.036

210

parameters
σ 2
0A

σ 2
0B

σ 2
1B

σ 2
1A

ε

0.021
0.017

qB,A
— 0.014 0.014 0.014 0.014 0.316
— 0.014 0.014 0.013 0.013 0.316
0.004 0.023 0.004 0.023 0.300
0.007 0.019 0.000 0.030 0.301
— 0.014 0.014 0.014 0.014 0.316
— 0.014 0.014 0.014 0.014 0.316
0.004 0.023 0.004 0.023 0.300
0.007 0.021 0.000 0.024 0.301
0.003 0.022 0.003 0.022 0.295
0.007 0.020 0.000 0.029 0.293
0.003 0.022 0.003 0.022 0.295
0.009 0.021 0.000 0.027 0.293

0.021
0.017
0.000
0.000
0.000
0.000

APPENDIX 3B

PRUNING ALGORITHM DETAILS

Here, we briefly outline some of the more technical details of implementing the pruning algo-

rithm described in subsection 3.2.1, focusing on how we address three key issues. In particular,

a naïve implementation of our pruning algorithm would: 1) be quite computationally expensive

and slow because it requires exponentiating anywhere from several hundred to a few thousand

matrices for each branch in a phylogeny. Additionally, due to the recursive nature of our algo-

rithm, numerical errors in partial likelihood calculations may accumulate due to both 2) numerical

artifacts associated with inverting Fourier transforms (e.g., “aliasing” and rounding errors) and 3)

general under- and overflow (i.e., partial likelihoods becoming smaller and larger than what can be

represented using floating point arithmetic). We describe how we effectively manage each of these

problems below, starting with reducing the computational burden of matrix exponentiation.

Unfortunately, matrix exponentiation is infamous for being a slow and computationally diffi-

cult operation (Moler and Van Loan, 2003). Nonetheless, these challenges have at least driven

the development of many different methods for computing matrix exponentials, each with their

own strengths and weaknesses. We chose to use one of the more straight-forward approaches, di-

agonalization, whereby a matrix to be exponentiated, Q, is eigendecomposed into a matrix of its

eigenvectors, V , and a vector of its eigenvalues, λ . Then (Moler and Van Loan, 2003):

exp [Qt] = V diag(exp [λt])V −1

(1)

Where t is a scalar, diag(exp [λt]) denotes a square matrix with the exponentiated eigenvalues

multiplied by t along its diagonal, and V −1 is the inverse of the eigenvector matrix. Notably,

several other phylogenetic comparative methods based on pruning algorithms use diagonalization

to exponentiate matrices as well (Pagel, 1994; Boucher and Démery, 2016; Boucher et al., 2018),

and for good reason–such pruning algorithms require repeated computation of exp [Qt] for the

same matrix Q but different values of t, which corresponds to a phylogeny’s branch lengths in this

context. While calculating V , λ , and V −1 is computationally expensive, they only depend on Q and

211

may be pre-computed once prior to carrying out the pruning algorithm. Then, calculating exp [Qt]

for each branch in the phylogeny only requires basic (i.e., non-matrix) exponentiation and matrix

multiplication, which is generally far simpler and quicker than direct matrix exponentiation.

In the case of our pruning algorithm specifically, we store V and V −1 for each matrix making

up the “rate array” R (see Eqs. 7 and 8) in identically-structured “eigenvector/inverse eigenvector

arrays”. Similarly, the λ vectors for each matrix in R are stored as the columns of an “eigenvalue

matrix”. This approach further improves the performance of our pruning algorithm by allowing us

vectorize the exponentiation, addition, and multiplication operations involved in calculating Eq. 1

across the hundreds to thousands of matrices making up R. One last benefit of the diagonalization

approach is that, unlike some other available methods for computing matrix exponentials, it directly

generalizes to matrices of complex numbers (i.e., numbers of the form a + bi, where i =

√

−1),

which is necessary for our purposes as characteristic functions are generally complex-valued.

Moving onto managing the numerical instabilities associated with inverting Fourier transforms,

we sought to devise a procedure for “cleaning” apparent artifacts in branch-inflated partial likeli-

hoods (φ ∗

d (x); see Eq. 1 and accompanying text) resulting from two key limitations of the FFT

algorithm. First, because we evaluate characteristic functions on a grid of necessarily limited ex-

tent and resolution, inverse FFTs may yield branch-inflated partial likeihoods exhibiting rapid,

spurious oscillations under certain conditions–a phenomenon known as “aliasing” (Bowman and

Roberts, 2011). Generally speaking, this occurs when partial likelihoods vary too rapidly across

grid points to be fully represented by their discretized characteristic function representations. For-

tunately, given the fact that partial likelihoods across grid points tend to rapidly “smooth out”

under most trait evolution models (e.g., FitzJohn, 2010), such artifacts occur rather infrequently.

Second, because we evaluate branch-inflated partial likelihoods on relatively high-resolution grids

meant to approximate continuous domains, many partial likelihood values (particularly towards

the boundaries of the grid) will be closer to 0 than what can be represented using floating point

arithmetic (i.e., partial likelihoods around machine epsilon or lower–∼ 1e−16 on a typical com-

puter). Accordingly, rounding errors during Fourier transform inversion can generate effectively

212

random “noise” around these extremely low partial likelihood values, resulting in slightly negative

or even complex-valued partial likelihoods.

Through trial and error, we developed a relatively simple cleaning procedure that effectively

manages these problems while avoiding any costly computation. We refer to the input and output

of this procedure as “raw” and “cleaned” branch-inflated partial likelihoods, respectively. The

raw partial likelihoods are directly given by the inverse Fourier transform of their characteristic

function representation ( ˆφd(ξ ); see text following Eq. 5). To clean the raw partial likelihoods, we

first tackle some of the rounding errors by discarding the imaginary components of any complex-

valued partial likelihoods (i.e., number of the form a + bi are “rounded” to a). Then, to eliminate

any rapid, likely erroneous oscillations reflecting aliasing errors, the raw partial likelihoods are

“smoothed out” by rounding partial likelihood values surrounded by negative values on both sides

to 1e−16. To ensure all partial likelihoods are positive, any remaining values below 1e−16 are

also rounded to 1e−16. Lastly, to yield the final output, the partial likelihoods are rescaled such

that they add up to the sum of the original, raw partial likelihoods–which is actually given by the

first value of the characteristic function representation of the raw partial likelihoods and calculated

quite accurately under our approach (DasGupta, 2011). To be clear, rounding all sufficiently low

partial likelihood values up to 1e−16 is an approximation, as it causes the branch-inflated partial

likelihoods to have “fatter tails” than they really should. Nonetheless, this procedure improves

the numerical accuracy of subsequent Fourier transforms while also ensuring products of branch-

inflated partial likelihoods always have a positive-valued sum.

Lastly, as the pruning algorithm proceeds and branch-inflated partial likelihoods are repeatedly

multiplied together, partial likelihoods tend to rapidly approach values either too low or high to be

represented using floating point arithmetic (i.e., under- and overflow, respectively). To prevent this,

after multiplying cleaned branch-inflated partial likelihoods together for each state to yield the full

partial likelihood matrix for a new edge e (Φe; see paragraph following Eqs. 1-4), the resulting

matrix is rescaled to have a maximum of 1 (notably, this also ensures all branch-inflated partial

likelihoods exhibit the same overall scale, rendering it more appropriate to use 1e−16 as a universal

213

“lower bound” when cleaning branch-inflated partial likelihoods). The cumulative product of the

rescaling factors is in turn tracked on a logarithmic scale, and the final partial likelihoods at the

root are subsequently log-transformed and divided by the final rescaling factor on the log scale.

Additionally, we use the log-sum-exp trick (Stan Development Team, 2019) to stably sum partial

likelihoods at the root including extremely low or high values according to Eqs. 1-4. Intriguingly–

though perhaps somewhat unsurprisingly–the rescaling factor, rather than partial likelihoods at the

root themselves, seem to determine most of the variation in the overall likelihoods associated with

different parameter estimates. This reflects the fact that the rescaling factor in some sense measures

how well the peaks of different branch-inflated partial likelihood matrices “match up”.

214