AN EMPIRICAL ANALYSIS OF HISTORICAL CONTINGENCY IN EVOLUTION By Austin James Ferguson A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Computer Science - Doctor of Philosophy Ecology, Evolution, and Behavior - Dual Major 2024 ABSTRACT While evolution has created a stunning diversity of complex traits in nature, isolating the details for how a particular trait evolved remains challenging. Specifically, what were the critical events in evolutionary history that made that trait more or less likely to arise? We must consider historical contingency, where even small changes, such as an apparently neutral mutation, can have substantial influence on long-term evolutionary outcomes. Evolutionary biologists have long been interested in the role that historical contingency plays in evolution, but testing hypotheses of its effects has traditionally been difficult and time consuming, if it is even possible at all. Here I leverage the speed and power of digital evolution to experimentally test the role of historical contingency in evolution. I start by observing how the evolution of phenotypic plasticity stabilizes future evolutionary dynamics. Next, I employ analytic replay experiments to empirically test which mutations in a population’s history increased the likelihood that associative learning evolves, first as case studies and then using more statistically powerful experimental approaches. I demonstrate that single mutations can drastically increase the odds of learning appearing, shifting it from a rare possibility to a near inevitability, and I find that these “potentiating” mutations exist in all studied lineages. Finally, I use potentiating mutations to develop an intuitive view into how adaptive momentum increases evolutionary exploration in populations experiencing disequilibrium. We are only beginning to scratch the surface of how historical contingency influences evolution, but digital evolution systems can expedite this process by testing these hypotheses and further refining these techniques for use in natural organisms. This work, and those like it, are pivotal in understanding how populations previously evolved, how their accumulated history currently affects them, and how they might evolve far into the future. Copyright by AUSTIN JAMES FERGUSON 2024 In memory of my grandmother, Mammaw Debbie. Oh, how I wish you were here to celebrate. iv ACKNOWLEDGEMENTS When writing a dissertation about historical contingency, it is impossible to avoid thinking about the events in my own life, both big and small, that have brought me to this point. Among all of the decisions, failures, and chance events that potentiated this path, the most important factors are the plethora of amazing people whose paths have crossed my own. Here I wish to thank them. Fully encapsulating my gratitude within this text is impossible, but I will try my best. To those who seemingly never tire: Barbara Bloemers, Vinny Mattison, Dr. Elise Zipkin, Dr. Sandeep Kulkarni, and Dr. Eric Torng, you each deserve sainthood for your demonstrations of patience and tenacity in pushing for us graduate students. To those most likely to read this dissertation: Dr. Chris Adami, Dr. Wolfgang Banzhaf, and Dr. Rob Pennock, your patience and guidance have helped shape me into a scientist. To those who led the way: Dr. Emily Dolson, Dr. Josh Nahum, Dr. Alex Lalejini, Alexa Lalejini, Dr. Matthew Moreno, Dr. Jose Hernandez, Dr. Anselmo Pontes, Dr. Anya Vostinar, Dr. Mike Wiser, Dr. Vinny Ragusa, Dr. Jory Schossau, Santiago Rodriguez Papa, Keron Greene, and Dr. Nkrumah Grant, you welcomed me with open arms and offered friendship, guidance, and much needed reassurance every step of the way. To those who kept me sane: Dr. Acacia Ackles and Kate Skocelas, without the two of you I would not have made it through the isolation of a global pandemic. To those who carried me through the finish: Sydney Leither, Max Foreback, Shakiba Shahban- degan, and Karen Suzue, the countless nights of laughter have made these last couple years some of my best. Your unwavering support and wisdom beyond your years keep me in grateful awe. To the one who is uncategorizable: Cliff Bohm, who always has an idea to debate, a paper to collaborate on, and a warm dinner for good company. To that good company: Chris Reeves, Kira Chan, Lucky Voutour, Ariel Cooper, and the rest of the crew, your comradery was always a highlight of my week. To those who made Monday mornings tolerable: the EEB coworking crew, there is no one I would rather work in silence around. You are truly the most supportive group of people I have ever met. To those who let me learn how to teach: my undergraduate mentees Daniel Junghans, Nick Lloyd, Caroline Gormerly, Dylan Rainbow, Lanea Rohan, Aria Killebrew-Bruehl, Valdine Pegy v Tchinda, and dozen of students across classes and workshops, thank you for your patience and in teaching me more than I taught you. To those who are always there: Dr. Ronnie Emmons, Logan Boggs, Dariel Marine, Dr. Gabe Howard, and Jason Montavon, your companionship and endless laughs are truly appreciated. To the one who pushed me: Andrew Polanco, you single-handedly convinced me to aim for what I really wanted and refused to let me falter. To those who demonstrated the role: my instructors and colleagues at Shawnee, it was your passion for educating and your kind hearts that led me to believe I could maybe pull this off. To the friends who have truly become family: Alex Bellomy, who continuously raises the bar for what it means to be someone’s best friend; and Ashlyn Bellomy, who has demonstrated that true friends do not require a lifetime of history. To the one who took a chance on me: Dr. Charles Ofria, my advisor who foolishly accepted a clueless new graduate student six years ago. Through a wild journey, you have shaped me into a scientist and educator, never showing the slightest hesitation to help me or just anyone around you. While I am sad to see this journey end, I am so happy to call you a friend at the end of it. To those that gave me unwavering support: Martha, Jim, and Marty Ferguson and Jeff Adkins, you have never forgotten to say “I’m proud of you” when I needed it most. To the one taken too soon: Debbie Adkins, you taught me the most important lessons: not to take life too seriously and to always love with all your heart. To the one who changed my life: Ethan Ferguson, your big heart and determination continu- ously fill me with pride. You may be the younger brother, but I am the one learning from you. To the ones who made it possible: Donnie and Alisha Ferguson, you have always pushed to give me the opportunities you did not have. You taught me the true value in “we are here to support you”. You also taught me how to work hard, laugh harder, and make the most of life. Everything I have done and will do in this life are thanks to the sacrifices you have made. To those mentioned and those not: from the bottom of my heart, thank you. vi TABLE OF CONTENTS Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2 Adaptive phenotypic plasticity stabilizes evolution in fluctuating environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 3 Potentiating mutations facilitate the evolution of associative learning in digital organisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 4 A large-scale digital evolution study of historical contingencies in associative learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 5 Predicting the unpredictable: Using replay experiments to disentangle how evolutionary outcomes are altered by adaptive momentum . . . . . . . . . . . 12 40 56 77 Chapter 6 Conclusions and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 vii Chapter 1 Introduction I feel obliged to start this dissertation with the standard observation: there is seemingly limitless diversity to the organisms that have evolved in nature. From microbes to megafauna, evolutionary processes have created a stunning array of traits and behaviors in organisms. Were each of these features mere evolutionary flukes, or was their eventual evolution inevitable? Are there at least general evolutionary trends that we can abstract from these examples? Questions like these, of course, are grand challenges of evolutionary biology. Here, I focus on one such question: how do the events of the past influence future evolution? When looking at evolution in nature, we often encounter beneficial traits that were uniquely evolved in one type of organism, while organisms in the broader taxonomic unit found different survival strategies. For example, most birds and many insects exhibit self-powered flight, but only one branch of mammals does: bats (Gunnell and Simmons, 2005). Similarly, while many snakes have both venom and the means to deliver it via a bite, venomous bites are only found in one clade of lizards (Fry et al., 2006). Why are these adaptations so rare, and what conditions led to their evolution in only a single clade each? In nature, there is only so much we can do; unfortunately we cannot travel back in time and “replay the tape of life” (as evoked by Gould (1990)) to observe if flight consistently evolved in bats thanks to that mammalian clade’s history, or if it was an uncommon stroke of luck. While replaying the tape of life as we know it is impossible, experimental evolution studies allow us to ask similar questions. As I describe below, evolutionary biologists have successfully employed experimental approaches to better understand the role that accumulated history has played in the evolution of microbial traits. In this work I continue this endeavor; I use in silico evolution experi- ments to study the role of history in evolving traits, both simple and complex, in digital organisms. I first ask how the evolution of one trait, a phenotypically plastic response to the environment, affects downstream evolutionary dynamics. I then pivot, switching from observations of how a known trait altered evolution, to retroactively identifying mutations that drastically shifted the long-term fate of the population. Finally, I demonstrate how we can leverage these techniques to better understand fundamental evolutionary dynamics. Though still in their infancy, these techniques and ideas show amazing potential for understanding the nuanced role that history plays in evolution. 1 1.1 Analyzing evolution: selection, chance, and history In thinking about the evolution of a particular trait, I adopt a mindset modified from Travisano et al. (1995) and consider the contributions of three different factors: selection, chance, and history. New or modified traits that provide a net benefit to the survival and reproduction of an organism (i.e., adaptations) are more likely to increase in frequency, thus illustrating the role that selection plays in shaping evolutionary outcomes. Here I split from Travisano et al. (1995), focusing on selection as a process, which results in these adaptations becoming more prevalent in the population. Of course, selection can only act upon genetic sequences that are available in the population. The stochastic nature of random mutations means that some genetic sequences will arise, while others will never even appear for selection to consider in the first place. The random appearance of sequences – or disappearance as misfortune can remove otherwise fit genotypes (i.e., genetic drift) – highlights some of the roles of chance in evolution. The influence of chance is constrained, though, by the starting genotypes that mutations act upon; traits can appear or be modified only if such changes are available in the local genetic neighborhood. This distribution of genetic sequences in the population at the point under investigation in an evolutionary study can be described as the product of its history. It is the interplay of these three aspects that produce the complex dynamics that we observe in evolving populations. Of course, what we call history is just a matter of temporal perspective. From the vantage point of a population in a given state, all of the dynamics, including selection and chance, that brought the population to that state are now consolidated under the label of history. Looking forward, however, both selection and chance will be at play, with different mutations or combinations of mutations occurring at different probabilities, and the resulting combinations having differing survival potential. As time advances, these changes are again relegated to history, while new outlooks now exist for the population based on its new composition. As such, for any study where we examine the balance among these three factors, we need to be clear about the starting point from which our perspective will be based. In this work I focus on the role of history in evolution, which is to say that I investigate how the balance of chance and selection shifts over the course of the population’s history. Since evolution is a stochastic process, we must always discuss possible evolutionary outcomes in terms 2 of probabilities. From one point along a lineage, the evolution of a given trait may be incredibly unlikely, dependent on a rare double mutation or a deleterious mutation that is not immediately purified. From a later point along that lineage, however, that same trait’s evolution may shift to being a near certainty, with strong selective pressure for that trait, or its preceding parts, more in control of the population’s fate. Is there some way for us to predict these shifts in influence between selection and chance? Can we distinguish between a population that is being driven to a specific outcome from one that is simply adrift? And how do these shifts occur? Does chance give way to selection in small increments, or can an individual mutation dramatically alter the balance? 1.2 The role of history and historical contingency in evolution While the ideas of evolution and natural selection have been around for well over one hundred and fifty years (Darwin, 1859), evolutionary biologists continue to argue about, test, and expand upon the different factors that contribute to evolution. While selection was the initial frontrunner, researchers argued for the importance of chance (Kimura, 1968; King and Jukes, 1969; Mayr, 1983) and later history (Gould, 1990; Gould and Lewontin, 1979) in evolution. It may appear obvious that history plays an important role in evolution, as the set of genetic sequences that could feasibly appear in the population relies on what sequences currently exist. Here, however, I mainly focus on the idea of “historical contingency” – the idea that small, often initially inconsequential changes can have a drastic effect on what ultimately evolves. As an idealized example, consider a set of three mutations, A, B, and C, that together give rise to a highly beneficial trait. All three mutations are equally beneficial in isolation, while AB is slightly more beneficial but AC and BC are both detrimental to fitness. In this scenario, populations that have fixed either A or B in isolation have a beneficial pathway to the combined trait, ABC. If instead a population has fixed C by itself, the ABC trait becomes much harder to evolve, as both intermediate steps are deleterious; a double mutation is then needed to reach the trait without losing fitness. Even in this simple example, the initial fixing of C has no penalty when it occurs, but it shifts the possibilities of what is likely to evolve in the future. For a thorough review of the ideas and complications of historical contingency, as well as empirical investigations into its role in evolution, see (Blount et al., 2018). While work has been done to study the role of historical contingency in the evolution of natural populations (e.g., (Keller and Taylor, 2008; Losos et al., 1998; Oke et al., 2019)), here I base my work 3 on empirical studies of historical contingency in experimental evolution. Early work in Escherichia coli produced two drastically different results. Researchers found no influence of the initial value of fitness reflected in the populations’ final evolved fitness, but in the same experiment, they found that the final evolved cell size of a population was highly contingent in the initial cell size of that replicate (Travisano et al., 1995). Building off this framework, Flores-Moya et al. (2012) found evidence that history plays a key role in the evolution of growth rate and toxin cell quota in algae. Santos-Lopez et al. (2021) found that even shallow differences in history can influence the evolution of antimicrobial resistance. Recently, Smith et al. (2022) have shown that, while history does play a role in the evolution of E. coli, the interactions between selection, chance, and history can heavily depend on traits under investigation and the environment being studied. By leveraging clever experimental evolution studies, these researchers have shown that it is possible to disentangle the influence of selection, chance, and history on evolution in a particular system. Further work has investigated historical contingency in evolution without explicitly quantifying the contributions of these three basal factors. Many studies can be categorized under this label, but here I highlight a few representatives. Multiple teams have investigated how historical contingency often creates genotypic variance while selection is able to derive similar phenotypic outcomes (Bed- homme et al., 2013; Simões et al., 2017). In antibiotic resistance, historical contingency can alter both evolvability and the particular genetic pathways evolution takes (Card et al., 2019, 2021). While historical contingency can be caused by any genetic change, recent work has highlighted contingency caused by cryptic genetic variation (Zheng et al., 2019) and synonymous mutations (McGrath et al., 2024). There is also a long and rich history of using digital systems for studying historical contingency’s role in evolution (Taylor and Hallam, 1998). We can even consider studies that were not initially thought of as investigations of historical contingency, such as how to choose the starting population in evolutionary computation (Burke et al., 1998; Schmidt and Lipson, 2009). Specifically analyzing contingency, the seminal work of Travisano et al. (1995) on the roles of selection, chance, and history over time was also replicated and expanded in the digital evolution system Avida, with far more statistical power thanks to the perfect record keeping of digital systems (Wagenaar and Adami, 2004). Braught and Dean (2007) again expanded this work by using neural networks and demonstrated an interaction between selection, chance, history, and the Baldwin effect; they found 4 that learning can influence the impact of the three factors. By comparing normal evolutionary replicates to those where deleterious mutations were automatically reverted, Covert III et al. (2013) found that initially-deleterious mutations can increase the probability of complex traits evolving. Yedid et al. (2008) used a controlled extinction event and found that the pre-extinction presence of a complex trait factored into the re-evolution of the trait after the extinction event. More recently, Bundy et al. (2021) leveraged the speed of digital evolution to test how the depth of history affects future evolution, dealing with generation counts well beyond what is currently feasible in microbial systems. These studies show that not only are these techniques viable in digital systems, but that digital systems can expand upon them to conduct research that would otherwise be impossible. 1.3 Genetic potentiation While this dissertation investigates the role of history in evolution, most chapters narrow that focus to the “potentiation” of a focal trait. The term “potentiation” has been used in many different ways, but here I define potentiation as the likelihood that the focal trait evolves from a given initial genotype or population (adapted from Blount et al. (2008)). If a change (e.g., a mutation or an environmental shift) increases the likelihood that the focal trait evolves, that is a potentiating change, as indicated by the increase in the trait’s potentiation value. Here my emphasis is on genetic potentiation, where changes in potentiation are due solely to accumulated genetic changes in the organisms, as opposed to environmental or anatomical potentiation (Heyduk et al., 2019). Critically, the idea of potentiation is to separate the actualizing mutations that confer the focal trait from the potentiating mutations that made the evolution of that trait more likely. Indeed, a potentiating mutation may have no (or even negative) consequences on fitness in the short term, until additional mutations build upon it. Early studies of potentiation often follow a variation of the question: If mutation X arises, how does that affect the evolution of trait Y? As an example, we can consider the work by Gifford et al. (2018). The authors identify mutations in the transcription factor ampR as present among strains of Pseudomonas capable of quickly evolving resistance to the antibiotic ceftazidime. Importantly, the authors show that mutations to ampR alone do not constitute antibiotic resistance; they only prime the evolution of antibiotic resistance. That is, the mutations in ampR potentiated the evolution of antibiotic resistance. While it is easy to conflate the notion of potentiating mutations with positive (or synergistic) 5 epistatic interactions, it should be noted that the two are technically distinct ideas. Here we con- sider epistasis as originally outlined by Fisher (1919), where epistatic interactions occur when the appearance of two genes create a trait value that differs from the predicted combination of their effects in isolation (typically with an additive or multiplicative expectation). Epistatic interactions can be potentiating, and in fact, many examples of potentiating mutations in the literature demon- strate epistasis between the potentiating mutation and the focal trait (Douglas et al., 2017; Gifford et al., 2018). This relationship, however, does not need to be direct; we can imagine scenarios where the potentiating mutation is a “gateway” to the desired region of the fitness landscape, or where the potentiating mutation epistatically interacts with a third mutation that then increases the fitness of the actualizing step. To consider a mutation potentiating, it only needs to improve the likelihood that the focal trait evolves. How do we find these potentiating mutations? A common approach is to isolate lineages that have evolved the focal trait from an outgroup that have not (either via experimental evolution or collection from the wild) and then analyze genomes from both groups. Mutations commonly found in the actualized group but rarely in the outgroup are likely to be vital to the evolution of the focal trait, either potentiating or actualizing mutations, and further experiments can disentangle the relationship of the mutations (Cummins et al., 2021; Gifford et al., 2018). Here, however, I employ replay experiments, an empirical technique to retroactively identify potentiating mutations within a given evolved lineage. In the pioneering work on replay experiments, Blount et al. (2008) empirically tested whether the novel citrate metabolism in one of Richard Lenski’s Long-Term Evolution Experiment (LTEE) populations (Lenski et al., 1991) was due to a fluke mutation or the accumulation of a potentiated genetic background. To do so, they revived frozen samples from the LTEE to found multiple “replay” populations from various points along the lineage that originally evolved to metabolize citrate. Conceptually, if the fraction of replay populations that evolved citrate metabolism increased between generation N and generation N +K, we know that one or more potentiating mutations occurred in those K generations. Indeed, this result is what they found; the metabolization of citrate was more likely to evolve from samples further along the lineage, providing support that genetic potentiation was a key factor. Essentially, this framework is applying the analysis of Travisano et al. (1995) along steps of a lineage. By starting evolutionary replicates from various points in the population’s history, we are 6 experimentally manipulating the shared history among replicates. Differences in what evolves in these replicates can then identify changes that occurred in the population’s history that shifted the contributions of selection and chance in the evolution of the focal trait. Increases in potentiation indicate an increased contribution of selection, as the trait is now more likely to evolve. Such an increase could happen if the population has moved to a position in genetic space where a more adaptive (or less maladaptive) pathway to the focal trait now exists. Decreases in potentiation, on the other hand, indicate a stronger reliance on chance and could, for example, be the result of convergence to a local optimum in the fitness landscape. The selective pressure of this optimum could leave the population reliant on fluke mutations or genetic drift to escape and potentially find the focal trait. Ultimately, a difference in potentiation between two points on a lineage indicates that the genetic changes between them, which are considered history in the context of the later point, are important in whether the focal trait ultimately evolves. Identifying a substantial change in potentiation from one time point to another opens the possibility of examining what mutations fall in this window, their immediate effect on the organism, and if their appearance in the lineage was due to selection or chance. Additionally, the size of the replay window is arbitrary; we can skip thousands of generations between checks for potentiating events or, in the extreme case, test every single generation. Since the initial study (Blount et al., 2008), researchers have conducted similar experiments (now called analytic replay experiments, (Blount et al., 2018)) in multiple systems and looking at various traits. These include clade extinction and evolvability in E. coli (Turner et al., 2015; Woods et al., 2011), novel receptor usage in Phage λ (Gupta et al., 2022; Meyer et al., 2012), colistin resistance in Pseudomonas aeruginosa (Jochumsen et al., 2016), and, recently, the epistatic interactions in yeast (Vignogna et al., 2021). Across these systems, the researchers showed that the accumulated genetic background is profoundly important in the eventual evolution of the focal trait. While these techniques are relatively new, they offer valuable insight into how the interplay of selection, chance, and history can influence subsequent evolution. These studies look backward, empirically testing what changes led to the final evolved behaviors, but the resulting potentiation dynamics are deeply intertwined with concepts such as predictability in evolution. As such, I expect research in the near future to begin weaving these findings into the broader tapestry of evolutionary dynamics. 7 1.4 Why digital evolution? Researchers addressing these questions about the role of history in evolution initially started where many evolutionary biology questions started: in the fossil record. Indeed, some of the most influential thoughts on the role of history come from fossils in the Burgess Shale (Gould, 1990). More recent techniques allow researchers to supplement these ideas with genomic analysis and phylogenetic reconstruction, powerful tools for determining the history of life on earth. Looking to history, however, has many limitations: we are provided with only a single instance of evolution, the data we do have is incomplete, and we are not able to go back in time to conduct controlled experiments. In recent decades there has been a surge in experimental approaches to studying evolution (Kawecki et al., 2012). If we evolve populations in controlled laboratory conditions, we are able to evolve many populations in parallel, observe almost everything that occurs, and build a range of experimental conditions – thus addressing each of the problems above. Evolutionary biologists have leveraged experimental evolution to empirically test a multitude of hypotheses, such as possible selection pressures for the transition to multicellularity (Ratcliff et al., 2012) and the feedback cycle between phage- and antibiotic- resistance and its clinical implications (Chan et al., 2016; Kortright et al., 2019). I adopt this experimental mindset throughout this dissertation in an attempt to understand the role of historical contingency in evolution. I will use the possibilities revealed from counterfactual experiments to understand “life as it could have been” in our study systems. These techniques will help me separate the flukes from the inevitabilties in the dynamics that shaped the course of evolution as it was originally realized in these studies. However, even the control and speed of in vitro experimental evolution is sometimes not enough. Specifically, we must be able to isolate all of the individual mutations along a lineage and replay evolution using each step as a new starting point. Furthermore, for each of these starting points, we need to be able to conduct enough replays to generate statistically powerful conclusions about evolutionary outcomes. While experimental evolution techniques are powerful (and getting more powerful with time), performing experiments at this scale would still take decades of dedicated work in the laboratory. To overcome these hurdles, I leverage the power of computational evolution systems. These systems maintain populations of self-replicating computer programs that mutate and evolve in the 8 same way as natural organisms. In fact, compelling arguments have been made that digital evo- lution is not a simulation of evolution, but rather an instantiation of it (Pennock, 2007). These digital evolution systems are not a departure from experimental evolution, but rather a choice of experimental system. These computational systems give many benefits over in vitro approaches, including greater speed, automatic high resolution data collection, and the ability to start an exper- iment with the exact conditions of our choosing. Of course, digital evolution also has its drawbacks. For example, there is a much lower limit to the complexity of the organisms and what we have been able to evolve in silico, and open-ended evolution remains elusive (Taylor et al., 2016). Further, due to technological constraints, digital evolution is typically limited to scales of at most tens of thousands of organisms, while natural populations can be vastly larger (Moreno et al., 2024). As a digital evolution researcher, I am often asked how my computational models tell us any- thing about how life evolved in the natural world. While other computational researchers may ask such questions, I do not claim to directly explain life as it was or will be. Instead, my questions are theoretical; studying evolution as a process using life as it could be. My goal is to conduct ex- periments that are infeasible or impossible to conduct in laboratory studies of evolution to further refine our understanding of basic evolutionary dynamics. For example, we can show evidence of the minimal set of requirements for the evolution of a particular trait or behavior (Lalejini and Ofria, 2016; Pontes et al., 2020). We can also boil questions down to their most basic scenarios, removing all possible confounding factors in a way that is not possible in natural organisms and allowing for strong claims about the impacts of the remaining dynamics (Bohm et al., 2024; Wilke et al., 2001). Finally, in the case of history in evolution, we can show examples of how these dynamics unfold, though we cannot claim that is how these dynamics will always unfold. By improving our theoreti- cal understanding and fine-tuning experimental methodologies, we enable wet-lab or natural-system evolutionary biologists to better understand their study system, and, more generally, the evolution of life on earth. 1.5 Thesis Statement The core thesis of this dissertation is two-part: 1) advancement of evolutionary theory requires us to understand how past events result in particular evolutionary outcomes, and 2) computational systems make large-scale replay experiments feasible and allow us to empirically test the role that history played in evolved populations. This rapid experimentation in computational systems allows 9 us to both improve our theoretical understanding of the role of historical contingency in evolution and to refine the methodologies of these experiments, which can then be leveraged in future wet-lab studies. 1.6 Contributions Overall, this dissertation advances our knowledge of the role that history plays in evolution. This is not done by inventing new methodologies nor discovering new phenomena. Instead, the main contributions of this work lie in the intersections; by applying existing techniques to new areas. By leveraging the power of digital evolution, I am able to experimentally test ideas of historical contingency at scales that were previously impossible. Moving to specifics, in Chapter 2 I ask a simple question: What happens after phenotypic plasticity evolves? Here, phenotypic plasticity, where different phenotypes can arise from a single genotype due to environmental variation, is studied because it is a precursor to learning (a theme in later chapters, and what initially brought me to study digital evolution). However, this question is often difficult; while we can readily find species capable of phenotypic plasticity, what would we compare them to? I thus use digital tools to evolve both plastic and non-plastic populations, and then test how their evolutionary dynamics change as a consequence. Across all experiments, I find that the evolution of phenotypic plasticity stabilizes future evolution in a cyclic environment that consistently alternates between two states. Plastic populations experience less evolutionary change, maintain more novel beneficial traits, and accumulate fewer deleterious traits than their non-plastic counterparts. Indeed, evidence suggests that the evolutionary dynamics of plastic populations more closely align to those of populations evolving in a static environment than non-plastic populations in the same environment. For the rest of this dissertation, my focus shifts from effect to cause; from identifying the consequences that result from a known change in history to retroactively identifying the changes that shifted the overall evolutionary fate. This starts with Chapter 3, where I conduct case studies to find and characterize “potentiating mutations” in the evolution of associative learning. By conducting analytic replay experiments (Blount et al., 2018), I identify a single parent-to-offspring reproduction event that substantially increased the likelihood that learning ultimately evolved in each of four case study lineages. This initial foray demonstrates that even single mutations can drastically alter the fate of an evolving population. 10 While the results of Chapter 3 enhance our understanding of potentiation and potentiating mutations, the low count of case studies prevents us from being able to make any statistical claims. Thus we are left with the question: do most of the populations that evolve learning in Avida do so via large potentiating mutations, or were those four replicates unlikely? To answer this question, I replay 50 additional learning replicates in Chapter 4. I find that the four original lineages were not flukes. All 50 lineages not only see a significant increase in potentiation across 50 lineages steps, they all see a significant increase in a single step. Additionally, we see that most of these potentiating lineage steps either occur with a single mutation, or only one mutation had a significant effect on potentiation. This work is by far the largest study of potentiation to date, and it allows for comparison against both previous studies and those in the future. Finally, Chapter 5 takes a different approach. I leverage analytic replay experiments again, but instead of studying potentiation, I use it as a tool for viewing a newly identified evolutionary dynamic: adaptive momentum (Bohm et al., 2024), which broadly states that populations in dis- equilibrium see more evolutionary exploration. While adaptive momentum has been shown in the aggregate, here I demonstrate the effect within the history of an evolving population by analyzing the potentiation of crossing a fitness valley. I show that when a spatial population is experiencing a selective sweep (an example of disequilibrium), mutations near the leading edge of the sweep have a drastic effect on the overall evolution of the population. This contrasts with populations in equilibrium, which effectively rely on pure chance to cross the fitness valley. Further, I demonstrate an example where the structure of the population matters, as the potentiation of the population diminishes if we shuffle the order of the organisms. 11 Chapter 2 Adaptive phenotypic plasticity stabilizes evolution in fluctuating environments Authors: Alexander Lalejini, Austin J. Ferguson, Nkrumah A. Grant, and Charles Ofria This chapter is adapted from (Lalejini et al., 2021), which was peer-reviewed and published in Frontiers in Ecology and Evolution. In this work we use digital evolution to empirically test how the evolution of phenotypic plastic- ity alters future evolutionary dynamics. In all three three cases we tested, we find that phenotypic plasticity stabilizes these dynamics. Specifically, compared to non-plastic populations, plasticity 1) slows evolutionary change, 2) maintains the rate of discovering novel beneficial traits while de- creasing the probability they are lost, and 3) reduces the accumulation of deleterious traits. In the language of the later chapters, we can say that phenotypic plasticity in this system potentiates the evolution of the novel beneficial traits and decreases the probability of accumulating deleterious traits. 2.1 Introduction Natural organisms employ a wide range of evolved strategies for coping with environmental change, such as periodic migration (Winger et al., 2019), bet-hedging (Beaumont et al., 2009), adaptive tracking (Barrett and Schluter, 2008), and phenotypic plasticity (Ghalambor et al., 2007). The particular mechanisms that evolve in response to fluctuating environments will also shift the course of subsequent evolution (Schaum and Collins, 2014; Wennersten and Forsman, 2012). As such, if we are to understand or predict evolutionary outcomes, we must be able to identify which mechanisms are most likely to evolve and what constraints and opportunities they impart on sub- sequent evolution. In this work, we focus on phenotypic plasticity, which can be defined as the capacity for a single genotype to alter phenotypic expression in response to a change in its environment (West- Eberhard, 2003). Phenotypic plasticity is controlled by genes whose expression is coupled to one or more environmental signals, which may be either biotic or abiotic. For example, the sex ratio of the crustacean Gammarus duebeni is modulated by changes in photoperiod and temperature (Dunn et al., 2005), and the reproductive output of some invertebrate species is heightened when infected with parasites to compensate for offspring loss (Chadwick and Little, 2005). 12 Evolutionary biologists have long been interested in how evolutionary change is influenced by phenotypic plasticity because of its role in generating phenotypic variance (Gibert et al., 2019). The effects of phenotypic plasticity on adaptive evolution have been disputed, as few studies have been able to observe both the initial patterns of plasticity and the subsequent divergence of traits in natural populations (Forsman, 2015; Ghalambor et al., 2015, 2007; Hendry, 2016; Wund, 2012). In changing environments, adaptive phenotypic plasticity provides a mechanism for organisms to regulate trait expression within their lifetime, which can stabilize populations through those changes (Gibert et al., 2019). In this context, the stabilizing effect of adaptive plasticity has been hypoth- esized to constrain the rate of adaptive evolution (Ancel, 2000; Gupta and Lewontin, 1982; Huey et al., 2003; Paenke et al., 2007; Price et al., 2003). That is, directional selection may be weak if environmentally-induced phenotypes are close to the optimum; as such, adaptively plastic popu- lations may evolve slowly (relative to non-plastic populations) unless there is a substantial fitness cost to plasticity. Figure 2.1: Hypothetical reaction norms for populations comprising genotypes placed in different environments. A reaction norm describes phenotypic change (or lack thereof) induced by environ- mental variation (West-Eberhard, 2008). In all panels, two environmental conditions (denoted E1 and E2) are shown on the x-axis. The y-axis indicates the phenotype expressed in each environment with OE1 and OE2 designating the optimal phenotype for E1 and E2, respectively. Each pair of points connected by a solid black line denotes a genotype, with the points themselves representing its hypothetical phenotypes in each environment. We present three scenarios for how populations (a) A non-plastic population where phenotypes do could respond to a change from E1 to E2. not change with environmental shifts. (b) An adaptively plastic population where phenotypes dy- namically adjust to the new optimum. (c) A population exhibiting non-adaptive plasticity where environmental change induces phenotypes further away from the optimum. Phenotypic plasticity allows for the accumulation of genetic variation in genomic regions that 13 PhenotypeEnvironmental conditionE1E2(b)Adaptive plasticityE1E2(a)Non-plasticOE1OE2E1E2(c)Non-adaptive plasticity are unexpressed under current environmental conditions. Such cryptic (“hidden”) genetic variation can serve as a source of diversity in the population, upon which selection can act when the environ- ment changes (Levis and Pfennig, 2016; Schlichting, 2008). It remains unclear to what extent and under what circumstances this cryptic variation caches adaptive potential or merely accumulates deleterious alleles (Gibson and Dworkin, 2004; Paaby and Rockman, 2014; Zheng et al., 2019). The “genes as followers” hypothesis (also known as the “plasticity first” hypothesis) predicts that phenotypic plasticity may facilitate adaptive evolutionary change by producing variants with en- hanced fitness under stressful or novel conditions (Levis and Pfennig, 2016; Schwander and Leimar, 2011; West-Eberhard, 2003). Environmentally-induced trait changes can be refined through selec- tion over time (i.e., genetic accommodation). Further, selection may drive plastic phenotypes to lose their environmental dependence over time in a process known as genetic assimilation (Crispo, 2007; Levis and Pfennig, 2016; Pigliucci, 2006; Schlichting and Wund, 2014; West-Eberhard, 2005). In this way, environmentally-induced phenotypic changes can precede an evolutionary response. Phenotypic plasticity may also “rescue” populations from extinction under changing environ- mental conditions by buffering populations against novel stressors. This buffer promotes stability and persistence and grants populations time to further adapt to rapidly changing environmental conditions (Chevin and Lande, 2010; West-Eberhard, 2003). Disparate predictions about how phenotypic plasticity may shift the course of subsequent evo- lution are not necessarily mutually exclusive. Genetic and environmental contexts determine if, and to what extent, phenotypic plasticity promotes or constrains subsequent evolution. Figure 2.1 overviews how we might expect different forms of phenotypic plasticity to result in different evo- lutionary responses after an environmental change. In Figure 2.1a, we would expect non-plastic populations to experience strong directional selection toward the new optimum (OE2) after the environment changes. We would expect an adaptively plastic population (Figure 2.1b) to remain relatively stable after the environment changes, as plasticity shifts organisms’ phenotypes to the new optimum. In Figure 2.1c, we would expect the non-adaptively plastic population to experience strong directional selection on their response to the new environmental conditions; indeed, such maladaptive plasticity may even put the population at risk of extinction in the absence of beneficial mutations. Experimental studies investigating the relationship between phenotypic plasticity and evolu- 14 tionary outcomes can be challenging to conduct in natural systems. Such experiments would require the ability to irreversibly toggle plasticity followed by long periods of evolution during which de- tailed phenotypic data would need to be collected. Digital evolution experiments have emerged as a powerful research framework from which evolution can be studied. In digital evolution, self- replicating computer programs (digital organisms) compete for resources, mutate, and evolve follow- ing Darwinian dynamics (Wilke and Adami, 2002). Digital evolution studies balance the speed and transparency of mathematical and computational simulations with the open-ended realism of labo- ratory experiments. Modern computers allow us to observe many generations of digital evolution at tractable time scales; thousands of generations can take mere minutes as opposed to months, years, or millennia. Digital evolution systems also allow for perfect, non-invasive data tracking. Such transparency permits the tracking of complete evolutionary histories within an experiment, which circumvents the historical problem of drawing evolutionary inferences using incomplete records (from frozen samples or fossils) and extant genetic sequences. Additionally, digital evolution systems allow for experimental manipulations and analyses that go beyond what is possible in wet-lab experiments. Such analyses have included exhaustive knockouts of every site in a genome to identify the function- ality of each (Lenski et al., 2003), comprehensive characterization of local mutational landscapes (Canino-Koning et al., 2019; Lenski et al., 1999), and the real-time reversion of all deleterious mu- tations as they occur to isolate their long-term effects on evolutionary outcomes (Covert III et al., 2013). Furthermore, digital evolution studies allow us to directly toggle the possibility for adaptive plastic responses to evolve, which enables us to empirically test hypotheses that were previously relegated to theoretical analyses. In this study, we conduct digital evolution experiments to investigate how the evolution of adaptive phenotypic plasticity shifts the course of evolution in a cyclically changing environment. We use the Avida Digital Evolution Platform (Ofria et al., 2009). Avida is an open-source system that has been used to conduct a wide range of well-regarded studies on evolutionary dynamics, including the origins of complex features (Lenski et al., 2003), the survival of the flattest effect (Wilke et al., 2001), and the origins of reproductive division of labor (Goldsby et al., 2014). Our experiments build directly on previous studies in Avida that characterized the de novo evolution of adaptive phenotypic plasticity (Clune et al., 2007; Lalejini and Ofria, 2016) as well as previous work investigating the evolutionary consequences of fluctuating environments for populations of 15 non-plastic digital organisms (Canino-Koning et al., 2019; Li and Wilke, 2004). Of particular rele- vance, Clune et al. (2007) and Lalejini and Ofria (2016) experimentally demonstrated that adaptive phenotypic plasticity can evolve given the following four conditions (as described by Ghalambor et al. 2010): (1) populations experience temporal environmental variation, (2) these environments are differentiable by reliable cues, (3) each environment favors different phenotypic traits, and (4) no single phenotype exhibits high fitness across all environments. We build on this previous work, but we shift our focus from the evolutionary causes of adaptive phenotypic plasticity to investigate its evolutionary consequences in a fluctuating environment. Specifically, we examine the effects of adaptive plasticity on subsequent genomic and phenotypic change, the capacity to evolve and then maintain novel traits, and the accumulation of deleterious alleles. Each of our experiments are divided into two phases: in phase one, we precondition sets of founder organisms with differing plastic or non-plastic adaptations; in phase two, we examine the subsequent evolution of populations founded with organisms from phase one under specific environmental conditions (Figure 2.2). First, we examine the evolutionary histories of phase two populations to test whether adaptive plasticity constrained subsequent genomic and phenotypic changes. Next, we evaluate how adaptive plasticity influences how well populations produced by each type of founder can evolve and retain novel adaptive traits. Finally, we examine lineages to determine whether adaptive plasticity facilitated the accumulation of cryptic genetic variation that would prove deleterious when the environment changed. We found that the evolution of adaptive plasticity reduced subsequent rates of evolutionary change in a cyclic environment. The non-plastic populations underwent more frequent selective sweeps and accumulated many more genetic changes over time, as non-plastic populations relied on genetic variation from de novo mutations to continuously readapt to environmental changes. The evolution of adaptive phenotypic plasticity buffered populations against environmental fluctuations, whereas repeated selective sweeps in non-plastic populations drove the accumulation of deleterious mutations and the loss of secondary beneficial traits. As such, adaptively plastic populations were better able to retain novel traits than their non-plastic counterparts. In general, the evolution of adaptive phenotypic plasticity shifted evolutionary dynamics to be more similar to that of popu- lations evolving in a static environment than to non-plastic populations evolving in an identical fluctuating environment. 16 2.2 Materials and Methods 2.2.1 The Avida Digital Evolution Platform Avida is a study system wherein self-replicating computer programs (digital organisms) com- pete for space on a finite toroidal grid (Ofria et al., 2009). Each digital organism is defined by a linear sequence of program instructions (its genome) and a set of virtual hardware components used to interpret and express those instructions. Genomes are expressed sequentially except when the execution of one instruction (e.g., a “jump” instruction) deterministically changes which instruction should be executed next. Genomes are built using an instruction set that is both robust (i.e., any ordering of instructions is syntactically valid, though not necessarily meaningful) and Turing Com- plete (i.e., able to represent any computable function, though not necessarily in an efficient manner). The instruction set includes operations for basic computations, flow control (e.g., conditional logic and looping), input, output, and self-replication. Organisms in Avida reproduce asexually by copying their genome instruction-by-instruction and then dividing. However, copy operations are imperfect and can result in single-instruction substitution mutations in an offspring’s genome. For this work, we configured copy operations to err at a rate of one expected mutation for every 400 instructions copied (i.e., a per-instruction error rate of 0.0025). We held individual genomes at a fixed length of 100 instructions (the default genome size in Avida); that is, we did not include insertion and deletion mutations. We used fixed- length genomes to control for treatment-specific conditions resulting in the evolution of substantially different genome sizes (Lalejini and Ferguson, 2021b)1, which could, on its own, drive differences in evolutionary outcomes among experimental treatments. When an organism divides in Avida, its offspring is placed in a random location on the toroidal grid, replacing any previous occupant. For this work, we used the default 60 by 60 grid size, which limits the maximum population size to 3600 organisms. As such, improvements to the speed of self-replication are advantageous in the competition for space. During evolution, organism replication rates improve in two ways: by improving genome effi- ciency (e.g., using a more compact encoding) or by accelerating the rate at which the genome is expressed (their “metabolic rate”). An organism’s metabolic rate determines the speed at which it 1We repeated our experiments without genome size restrictions and observed qualitatively similar results (see supplemental material, Lalejini and Ferguson 2021b). 17 executes instructions in its genome. Initially, an organism’s metabolic rate is proportional to the length of its genome, but that rate is adjusted as it completes designated functions, such as per- forming Boolean logic functions (Ofria et al., 2009). In this way, we can reward or punish particular phenotypic traits (i.e., Boolean logic functions). Phenotypic plasticity in Avida In this work, we measure a digital organism’s phenotype as the set of Boolean logic functions that it performs (i.e. expresses) in a given environment. Sensory instructions in the Avida instruc- tion set allow organisms to detect how performing a particular function would affect their metabolic rate (see supplemental material for more details, Lalejini and Ferguson 2021b). We define a pheno- typically plastic organism as one that uses sensory information to alter which functions it performs based on the environment. Phenotypic plasticity in Avida can be adaptive or non-adaptive for a given set of environments. Adaptive plasticity shifts net function expression closer to the optimum for the given environments. Non-adaptive plasticity changes function expression in either a neutral or deleterious way. In this work, optimal plasticity toggles functions to always perfectly match the set of rewarded functions for the given set of environments. 2.2.2 Experimental design We conducted three independent experiments using Avida to investigate how the evolution of adaptive plasticity influences evolutionary outcomes in fluctuating environments. For each ex- periment, we compared the evolutionary outcomes of populations evolved under three treatments (Figure 2.2): (1) a PLASTIC treatment where the environment fluctuates, and digital organisms can use sensory instructions to differentiate between environmental states; (2) a NON-PLASTIC treatment with identical environment fluctuations, but where sensory instructions are disabled; and (3) a STATIC control where organisms evolve in a constant environment. Each experiment was divided into two phases that each lasted for 200,000 updates2 of evolution (Figure 2.2), which is equivalent to approximately 30,000 to 40,000 generations. In phase one of each experiment, we evolved plastic, non-plastic, and control organisms for use in phase two. In phase two, we founded new populations with these evolved organisms and examined their subsequent 2One update in Avida is the amount of time required for the average organism to execute 30 instructions. See (Ofria et al., 2009) for more details. 18 Figure 2.2: The three plots in panel (a) show the environmental conditions used in every experiment and whether they reward or punish each base function. The two plots in (b) show the additional functions added in phases 2B and 2C. Panel (c) shows treatment differences and experimental phases. Treatments are listed on the left, with each treatment specifying its environmental configuration and whether sensors are functional. We conducted three independent two-phase experiments, each described on the right. Phases 2B and 2C are textured to match their function definitions in panel (b). Phase one is repeated for each experiment with 100 replicate populations per treatment per experiment. For each replicate at the end of phase one, we used an organism of the most abundant genotype to found the second phase population. All STATIC and NON-PLASTIC populations move on to phase two, but PLASTIC populations only continue to the second phase if their most abundant genotype exhibits optimal plasticity. Metrics are recorded only in phase two. evolution under given combinations of treatment and experimental conditions. During phase two, we tracked and saved each population’s evolutionary history as well as saving the full final population. Phase one was for preconditioning only; all comparisons between treatments were performed on phase two data. Environments We constructed three experimental environmental conditions, abbreviated hereafter as “ENV- A”, “ENV-B”, and “ENV-ALL”. In ENV-A, organisms are rewarded for expressing the NOT, AND, and OR Boolean logic functions, and organisms are punished for expressing the NAND, AND- 19 NOTANDORNANDAND-NOTOR-NOTENV-AENV-BENV-ALLMetabolic effectx1x2x0.5NOTANDORNANDAND-NOTOR-NOTBoolean logic functionNOTANDORNANDAND-NOTOR-NOTa.c.Phase 2A:ASTATICALLPLASTICNON-PLASTICPhase 1: PreconditioningTreatmentSensors?BA......AB...ALLABA......ABTransferPhase 2B71 novel functionsPhase 2CDeleterious functionEnvironments are identical to Phase 1, but additional evolutionary history information is recordedPhase 2B:71 additional functions are all rewarded constantly. The evolution of these novel functions is recorded.Phase 2C:An explicitly deleterious function is added. All executions of this function are recorded.199,900200,0001000200UpdatesYesNoNo300AALLBA......AB...ALLABA......AB199,900200,0001000200Updates300AALLBA......AB...ALLABA......ABAALLBA......AB...ALLABA......AB...b. RewardPunishmentMetabolic effectx1x1.1x0.9 RewardPunishmentPhase-specific functions NOT, and OR-NOT functions. ENV-B is the reverse of ENV-A; that is, in ENV-B, organisms are rewarded for expressing the NAND, AND-NOT, and OR-NOT functions and are punished for expressing the NOT, AND, and OR functions. In ENV-ALL, organisms are rewarded for expressing each of the NOT, AND, OR, NAND, AND-NOT, and OR-NOT functions. In all environmental conditions (ENV-A, ENV-B, and ENV-ALL), a rewarded function expressed by an organism doubles their metabolic rate, allowing them to execute twice as many instructions in the same amount of time. A punished function halves an organism’s metabolic rate. Each Boolean logic function is a non-trivial trait to evolve, as they each require the coordination of multiple genetic instructions to express (Lenski et al., 2003). In both the PLASTIC and NON-PLASTIC treatments, the environment cycles between equal- length periods of ENV-A and ENV-B. Each of these periods persist for 100 updates (approximately 15 to 20 generations). Thus, populations experience a total of 1,000 full periods of ENV-A interlaced with 1,000 full periods of ENV-B during each experimental phase. Previous work has shown that this rate of change reliably allows for the evolution of adaptive phenotypic plasticity in Avida (Clune et al., 2007; Lalejini and Ofria, 2016). Organisms in the PLASTIC treatments differentiate between ENV-A and ENV-B by executing one of six sensory instructions, each associated with a particular Boolean logic function; these sensory instructions detect whether their associated function is currently rewarded or punished. By using sensory information in combination with execution flow-control instructions, organisms can conditionally perform different functions depending on the current environmental conditions. Experiment Phase 1 – Environment preconditioning For each treatment, we founded 100 independent populations from a common ancestral strain capable only of self-replication. At the end of phase one, we identified the most abundant (i.e., dominant) genotype and sampled an organism with that genotype from each replicate population to found a new population for phase two. For the PLASTIC treatment, we needed to ensure that our observations during the second phase of each experiment reflected the evolutionary consequences of adaptive plasticity. To do so, measured the plasticity of each PLASTIC-treatment population’s dominant genotype by in- dependently testing that genotype in each of ENV-A and ENV-B and recording the phenotype expressed in each environment. We discarded PLASTIC-treatment phase one populations if the 20 dominant genotype did not exhibit optimal plasticity (as defined in Section 2.2.1), which ensured that PLASTIC-treatment phase two populations were founded with an optimally plastic organism. Experiment Phase 2A – Evolutionary change rate We conducted experiment phase 2A to test for differences in evolutionary change—the accu- mulation of genetic and phenotypic changes—among populations evolving under each of our three treatment conditions (PLASTIC, NON-PLASTIC, and STATIC). Phase 2A continued exactly as phase one, except we tracked the rates of evolutionary change in each of the PLASTIC-, NON- PLASTIC-, and STATIC-treatment populations. Specifically, we quantified evolutionary change using four metrics (each described in Table 2.1): (1) coalescence event count, (2) mutation count, (3) phenotypic volatility, and (4) mutational robustness. While environmental conditions during phases one and 2A are identical, these phases are dis- tinct in how populations are founded: each phase one population is founded with a common ancestor capable only of self-replication, whereas each phase two population is founded with an organism that was evolved in its treatment-specific conditions. Thus, during phase two, phylogenies are rooted with an ancestor that is well-adapted to its treatment conditions, which, in turn, ensures that our observations can exclude dynamics associated with initial adaptation. Experiment Phase 2B – Novel function evolution We conducted experiment phase 2B to quantify the extent to which organisms evolved under PLASTIC-, NON-PLASTIC-, and STATIC-treatment conditions were able to acquire and retain novel functions. Phase 2B extended the conditions of phase one by adding 71 novel Boolean logic functions (Ofria et al., 2009), which were always rewarded in all treatments. The original six phase one functions (NOT, NAND, AND, OR-NOT, OR, and AND-NOT; hereafter called “base” functions) continued to be rewarded or punished according to the particular treatment conditions. An organism’s metabolic rate was increased by 10% for each novel function that it expressed (limited to one reward per function). This reward provided a selective pressure to evolve these functions, but their benefits did not overwhelm the 100% metabolic rate increase conferred by rewarded base functions. As such, populations in the PLASTIC and NON-PLASTIC treatments could not easily escape environmental fluctuations by abandoning the fluctuating base functions. During this experiment, we used three metrics to quantify novel function acquisition and reten- tion in evolving populations (each described in Table 2.1): (1) final novel function count, (2) novel 21 function discovery, and (3) novel function loss. Experiment Phase 2C – Deleterious instruction accumulation We conducted experiment phase 2C to quantify the extent to which organisms evolved under PLASTIC-, NON-PLASTIC-, and STATIC-treatment conditions acquired deleterious instructions via mutation. Phase 2C extended the instruction set of phase one with a deleterious instruction. When an organism executes a deleterious instruction, it performs a “deleterious” function, which reduces the organism’s metabolic rate (and thus reproductive success) but does not otherwise alter the organism’s behavior. We imposed a 10% penalty each time an organism performed the dele- terious function, making the deleterious instruction explicitly harmful to execute. We did not limit the number of times that an organism could perform the deleterious function, and as such, organisms could perform the deleterious function as many times as they executed the deleterious instruction. We tracked the number of times each organism along the dominant lineage performed the deleterious function. Specifically, we used two metrics (each described in Table 2.1): (1) final deleterious function count and (2) deleterious function acquisition count. 2.2.3 Experimental analyses For each of our experiments, we tracked and analyzed the phylogenetic histories of evolving populations during phase two and generated a set of summary statistics (Table 2.1). For each phy- logenetic history, we counted the number of times that the most recent common ancestor for the population shifted and used this value as the number of coalescence events. Next, at the final time point, we identified the most abundant genotype in the evolved population, and chose a represen- tative organism from that genotype for further analysis. We used the lineage from the founding organism to the representative organism to summarize the evolutionary pathway of a population. This lineage represents the majority of the evolutionary history from the given population as long as the entire population traces back to the lineage in recent history. We manually inspected evolved phylogenies and found no evidence that any of our experimental treatments supported long-term coexistence. As such, analyses of the representative lineages reflect the majority of evolutionary history for a given population. Some of our metrics (Table 2.1) required us to measure genotype-by-environment interactions. Importantly, in the fluctuating environments, we needed to differentiate phenotypic changes that 22 were caused by mutations from those that were caused by environmental changes. To accomplish this, we produced organisms with the given focal genotype, measured their phenotype in each environmental condition, and aggregated the resulting phenotypes to create a phenotypic profile. Although organisms with different genotypes may express the same set of functions across envi- ronmental conditions, their phenotypic profiles may not necessarily be the same. For example, an organism that expresses NOT in ENV-A and NAND in ENV-B has a distinct phenotypic profile from one that expresses NAND in ENV-A and NOT in ENV-B. Because phenotypic profiles en- capsulate function expression across all relevant environmental conditions (ENV-A and ENV-B), a change in phenotypic profile from parent to offspring indicates a mutationally-induced phenotypic change. While most analyses employed here are retrospective metrics applied to lineages, digital evo- lution allows precise manipulations on individual organisms and genomes. Mutational robustness uses this technique when looking at the possible mutations on a representative genotype. Genomes in Avida are linear sequences of instructions, and as such, possible mutations can be simulated by substituting other instructions at the desired site. Indeed, the mutational robustness of a genotype examines all one-step mutations (i.e., each mutation where exactly one instruction is substituted). This allows us to disentangle whether the frequency of mutationally-induced phenotypic changes observed along a lineage is a consequence of evolved genetic architectures versus the result of pop- ulation dynamics that actively skewed the distribution of organisms along the lineage; that is, are genomes organized such that mutations are more likely to induce phenotypic changes, or are or- ganisms with phenotypes different from their parents more likely to be successful and thus appear along the representative lineage? 2.2.4 Statistical analyses Across all of our experiments, we differentiated between sample distributions using non- parametric statistical tests. For each major analysis, we first performed a Kruskal-Wallis test (Kruskal and Wallis, 1952) to determine if there were significant differences in results from the PLASTIC, NON-PLASTIC, and STATIC treatments (significance level α = 0.05). If so, we applied a Wilcoxon rank-sum test (Wilcoxon, 1992) to distinguish between pairs of treatments. We applied Bonferroni corrections for multiple comparisons (Rice, 1989) where appropriate. 23 2.2.5 Software availability We conducted our experiments using a modified version of the Avida software, which is open source and freely available on GitHub (Lalejini and Ferguson, 2021b). Modifications to Avida in- cluded an improved phylogeny tracking system that enabled us to track coalescence events and the addition of custom sensory instructions specific to our experiments. We used Python for data processing, and we conducted all statistical analyses using R version 4 (R Core Team, 2021). We used the tidyverse collection of R packages (Wickham et al., 2019) to wrangle data, and we used the following R packages for analysis, graphing, and visualization: ggplot2 (Wickham et al., 2020), cowplot (Wilke, 2020), Color Brewer (Harrower and Brewer, 2003; Neuwirth, 2014), rstatix (Kas- sambara, 2021), ggsignif (Ahlmann-Eltze and Patil, 2021), scales (Wickham and Seidel, 2020), Hmisc (Harrell Jr, 2020), fmsb (Nakazawa, 2019), and boot (Canty and Ripley, 2019). We used R markdown (Allaire et al., 2020) and bookdown (Xie, 2020) to generate web-enabled supplemental material. All of the source code for our experiments and analyses, including configuration files and guides for replication, can be found in our supplemental material, which is hosted on GitHub (Lale- jini and Ferguson, 2021b). Additionally, our experimental data is available on the Open Science Framework at https://osf.io/sav2c/ (Lalejini and Ferguson, 2021a). 24 Metric Coalescence event count Mutation count Phenotypic volatility Mutational robustness Final novel function count Novel function discovery Novel function loss Final deleterious function count Deleterious function acquisition count Description Number of coalescence events that have oc- curred, which indicates the frequency of selec- tive sweeps in the population. Sum of all mutations that have occurred along a lineage. Number of instances where parent and offspring phenotypic profiles do not match along a lin- eage. Proportion of mutations (from the set of all pos- sible one-step mutations) that do not change the phenotypic profile of a focal genotype. We also measured realized mutational robustness, which is the proportion of mutated offspring along a lineage whose phenotypic profile matches that of their parent. Count of unique novel functions performed by the representative organism in a final popula- tion from experiment phase 2B. This metric can range from 0 to 71 and measures how well the fitness landscape was exploited at a given point in time. functions ever per- Number of unique novel formed along a given lineage in experimental phase 2B, even if a function is later lost. This metric can range from 0 to 71 and measures a given lineage’s level of exploration of the fitness landscape. Number of instances along a given lineage from experimental phase 2B where a novel function is performed by a parent but not its offspring. This metric measures how often a given lineage fails to retain evolved traits over time. Number of times the deleterious function is per- formed by the representative organism from a final population from experiment phase 2C. Number of instances along a given lineage where a mutation causes an offspring to perform the deleterious function more times than its parent. Table 2.1: Metric descriptions. 25 2.3 Results 2.3.1 Adaptive phenotypic plasticity slows evolutionary change in fluctuating environments Figure 2.3: Raincloud plots (Allen et al., 2019) of (a) coalescence event count, (b) mutation count, and (c) phenotypic volatility. See Table 2.1 for descriptions of each metric. Each plot is annotated with statistically significant comparisons (Bonferroni-corrected pairwise Wilcoxon rank-sum tests). Note that adaptive phenotypic plasticity evolved in 42 of 100 replicates from the PLASTIC treat- ment during phase one of this experiment; we used this more limited group to found 42 phase-two PLASTIC replicates from which we report these PLASTIC data. In experimental phase 2A, we tested whether adaptive phenotypic plasticity constrained or promoted subsequent evolutionary change in a fluctuating environment. First, we compared the total amount of evolutionary change in populations evolved under the PLASTIC, NON-PLASTIC, and STATIC treatments as measured by coalescence event count, mutation count, and phenotypic volatility (Figure 2.3). According to each of these metrics, NON-PLASTIC populations experi- enced a larger magnitude of evolutionary change than either PLASTIC or STATIC populations. We observed significantly higher coalescence event counts in NON-PLASTIC populations than in PLASTIC or STATIC populations (Figure 2.3a). NON-PLASTIC lineages had significantly higher mutation counts (Figure 2.3b) and phenotypic volatility than PLASTIC or STATIC lineages (Fig- ure 2.3c). Changing environments have been shown to increase generational turnover (i.e., how rapidly generations elapse) in Avida populations (Canino-Koning et al., 2016), which could explain why we observe a larger magnitude of evolutionary change at the end of 200,000 updates of evolution in NON-PLASTIC populations. Indeed, we found that significantly more generations of evolution elapsed in NON-PLASTIC populations (mean of 41090 ± 2702 std. dev.) than in PLASTIC (mean 26 p < 1e−04p < 1e−04010100100010000STATICNON−PLASTICPLASTICCoalescence event count (log scale)Kruskal−Wallis, p < 1e−04Coalescence events countap < 1e−04p < 1e−04p = 0.001944010100100010000STATICNON−PLASTICPLASTICMutation count (log scale)Kruskal−Wallis, p < 1e−04Mutation countbp < 1e−04p < 1e−04p < 1e−04010100100010000STATICNON−PLASTICPLASTICPhenotypic volatility (log scale)Kruskal−Wallis, p < 1e−04Phenotypic volatilityc of 31016 ± 2615 std. dev.) or STATIC (mean of 30002 ± 3011 std. dev.) populations during phase 2A (corrected Wilcoxon rank-sum tests, p < 10−4). To evaluate whether increased generational turnover explains the greater magnitude of evo- lutionary change in NON-PLASTIC populations, we examined the average number of generations between coalescence events and the realized mutational robustness of lineages (Table 2.1). A co- alescence event indicates a selective sweep, which is a hallmark of adaptive evolutionary change. Realized mutational robustness measures the frequency that mutations cause phenotypic changes along a lineage. We expect that static conditions should favor fit lineages with high realized muta- tional robustness that no longer undergo rapid adaptive change and hence do not trigger frequent coalescence events. Under fluctuating conditions, however, lineages must be composed of plastic organisms if they are to maintain both high fitness and realized mutational robustness. Without plasticity, we expect fluctuating conditions to produce lineages with low realized mutational robust- ness and frequent coalescence events as populations must continually acquire and fix mutations to readapt to the environment. Figure 2.4: Raincloud plots of (a) average number of generations between coalescence events, and (b) realized mutational robustness (Table 2.1). Each plot is annotated with statistically significant comparisons (Bonferroni-corrected pairwise Wilcoxon rank-sum tests). On average, significantly fewer generations elapsed between coalescence events in NON- PLASTIC populations than in either PLASTIC or STATIC populations (Figure 2.4a). We also found that both STATIC and PLASTIC lineages exhibited higher realized mutational robustness relative to that of NON-PLASTIC lineages (Figure 2.4b); that is, mutations observed along NON- PLASTIC lineages more often caused phenotypic changes in offspring. Overall, our results indicate 27 p < 1e−04p < 1e−040500100015002000STATICNON−PLASTICPLASTICAvg. generations between coalescence eventsKruskal−Wallis, p < 1e−04Generations between coalescence eventsap < 1e−04p < 1e−04p < 1e−040.500.751.00STATICNON−PLASTICPLASTICRealized mutational robustnessKruskal−Wallis, p < 1e−04Realized mutational robustnessb that NON-PLASTIC populations underwent more rapid (and thus a greater amount of) evolutionary change than either PLASTIC or STATIC populations. While both STATIC and PLASTIC lineages exhibited high realized mutational robustness, we found that STATIC lineages exhibited higher realized robustness than PLASTIC lineages (Figure 2.4b). Overall, there were rare instances of mutations that caused a change in phenotypic profile across all PLASTIC lineages. Of these mutations, we found that over 80% (83 out of 102) of changes to phenotypic profiles were cryptic. That is, the mutations affected traits that would not have been expressed in the environment that the organism was born into but would have been expressed had the environment changed. Figure 2.5: Raincloud plot of mutational robustness of each representative genotype (Table 2.1). The plot is annotated with statistically significant comparisons (Bonferroni-corrected pairwise Wilcoxon rank-sum tests). Given that NON-PLASTIC lineages exhibited the lowest realized mutational robustness of our three experimental treatments, we sought to determine if this effect was driven by differences in evolved genetic architectures. Specifically, did the NON-PLASTIC genetic architectures evolve such that mutations were more likely to result in phenotypic change? Such a mutational bias would trade off descendant fitness in the same environment in exchange for a chance of increasing descendant fitness in alternate environments. This strategy would be an example of diversifying bet-hedging (i.e., reducing expected mean fitness to lower variance in fitness) (Childs et al., 2010). Alternatively, the lower realized mutational robustness in NON-PLASTIC lineages could be due to survivorship bias, as we measured realized mutational robustness as the fraction of mutations observed along successful lineages that caused a phenotypic change. That is, our measure of realized mutational 28 p < 1e−04p < 1e−04p = 0.0030.000.250.500.751.00STATICNON−PLASTICPLASTICMutational robustnessKruskal−Wallis, p < 1e−04Mutational robustness robustness concentrates only on mutations that appear in the representative lineage (i.e., “survivors” of selection), ignoring mutations that did not. Thus, lower realized mutational robustness in NON- PLASTIC lineages could simply be due to phenotype-altering mutations being more frequently favored by selection. We analyzed the mutational robustness of representative genotypes by calculating the fraction of single-instruction mutations that change the phenotypic profile. We found that mutations to representative genotypes on NON-PLASTIC lineages are less likely to result in a phenotypic change than mutations to comparable genotypes on either STATIC or PLASTIC lineages (Figure 2.5). These data provide evidence against NON-PLASTIC lineages engaging in a mutation-driven bet- hedging strategy, and instead, are consistent with the hypothesis that lower realized mutational robustness in the NON-PLASTIC treatment was due to survivorship bias. In general, adaptive plasticity stabilized PLASTIC-treatment populations against environmen- tal fluctuations, and their evolutionary dynamics more closely resembled those of populations evolv- ing in a static environment. We observed no significant difference in the number and frequency of coalescence events in PLASTIC and STATIC populations. We did, however, observe small, but statistically significant, differences in each of the following metrics: elapsed generations, mutation counts, phenotypic volatility, realized mutational robustness, and mutational robustness between PLASTIC and STATIC populations. We expect that these differences are a result of plastic organ- isms needing to simultaneously maintain both function and function-regulation machinery, resulting in more genetic components that can be broken by mutations; moreover, many of these components are under relaxed selection in periods between environmental changes. Overall, these differences were not substantial enough to play an obvious role in any of the dynamics we analyzed, but could be examined further in the future. 2.3.2 Adaptively plastic populations retain more novel functions than non- plastic populations in fluctuating environments We have so far shown that adaptive plasticity constrains the rate of evolutionary change in fluctuating environments. However, it is unclear how this dynamic influences the evolution of novel functions. Based on their relative rates of evolutionary change, we might expect NON-PLASTIC- treatment populations to evolve more novel functions than PLASTIC-treatment populations. But, how much of the evolutionary change in NON-PLASTIC populations is useful for exploring novel 29 Figure 2.6: Raincloud plots of (a) final novel function count, (b) novel function discovery, and (c) novel function loss. See Table 2.1 for descriptions of each metric. Each plot is annotated with statistically significant comparisons (Bonferroni-corrected pairwise Wilcoxon rank-sum tests). Note that adaptive phenotypic plasticity evolved in 42 of 100 replicates from the PLASTIC treatment during phase one of this experiment; we used this more limited group to seed the resulting 42 phase- two PLASTIC replicates. regions of the fitness landscape versus continually rediscovering the same regions? To answer this question, we quantified the number of novel functions performed by a representa- tive organism in the final population of each replicate. We found that both PLASTIC and STATIC populations had significantly higher final function counts than NON-PLASTIC populations at the end of the experiment (Figure 2.6a). The final novel function count in PLASTIC and STATIC lin- eages could be higher than that of the NON-PLASTIC lineages for several non-mutually exclusive reasons. One possibility is that PLASTIC and STATIC lineages could be exploring a larger area of the fitness landscape when compared to NON-PLASTIC lineages. Another possibility is that the propensity of the NON-PLASTIC lineages to maintain novel traits could be significantly lower than PLASTIC or STATIC lineages. When we looked at the total sum of novel functions discovered by each of the PLASTIC, STATIC, and NON-PLASTIC lineages, we found that NON-PLASTIC lineages generally explored a larger area of the fitness landscape (Figure 2.6b). Although the NON- PLASTIC lineages discovered more novel functions, those lineages also exhibited significantly higher novel function loss when compared to PLASTIC and STATIC lineages (Figure 2.6c). A larger number of generations elapsed in NON-PLASTIC populations than in PLASTIC or STATIC populations during our experiment (Lalejini and Ferguson, 2021b). Are NON-PLASTIC lineages discovering and losing novel functions more frequently than PLASTIC or STATIC lineages, or are our observations a result of differences in generational turnover? To answer this question, we 30 p < 1e−04p < 1e−04051015STATICNON−PLASTICPLASTICFinal novel function countKruskal−Wallis, p < 1e−04Final novel function countap < 1e−04p = 0.003051015STATICNON−PLASTICPLASTICNovel function discoveryKruskal−Wallis, p < 1e−04Novel function discoverybp < 1e−04p < 1e−04p = 0.0023580101001000STATICNON−PLASTICPLASTICNovel function loss (log scale)Kruskal−Wallis, p < 1e−04Novel function lossc Figure 2.7: Raincloud plots of (a) novel function discovery frequency and (b) novel function loss frequency. Each plot is annotated with statistically significant comparisons (Bonferroni-corrected pairwise Wilcoxon rank-sum tests). converted the metrics of novel function discovery and novel function loss to rates by dividing each metric by the number of elapsed generations along the associated representative lineages. We found no significant difference in the frequency of novel function discovery between NON-PLASTIC and STATIC lineages, and we found that PLASTIC lineages had a lower frequency of novel function discovery than STATIC lineages (Figure 2.7a). Therefore, we cannot reject the possibility that the larger magnitude of function discovery in NON-PLASTIC lineages was driven by a larger number of elapsed generations. NON-PLASTIC lineages had a higher frequency of function loss than either PLASTIC or STATIC lineages, and PLASTIC lineages tended to have a lower frequency of novel function loss than STATIC lineages (Figure 2.7b). Next, we examined the frequency at which novel function loss along lineages co-occurred with the loss or gain of any of the six base functions. Across all NON-PLASTIC representative lineages, over 97% (10998 out of 11229) of instances of novel function loss co-occurred with a simultaneous change in base function profile. In contrast, across all PLASTIC and STATIC dominant lineages, we observed that approximately 20% (29 out of 142) and 2% (13 out of 631), respectively, of instances of novel function loss co-occurred with a simultaneous change in base function profile. As such, the losses of novel functions in NON-PLASTIC lineages appear to be primarily due to hitchhiking or epistatic effects where a mutation that knocks out a maladaptive base function (after the environment changes) also knocks out a beneficial novel function. 31 p = 0.0240.000000.000250.000500.000750.00100STATICNON−PLASTICPLASTICNovel function discovery frequencyKruskal−Wallis, p = 0.02806Novel function discovery frequencyap < 1e−04p < 1e−04p = 0.0011940.0000.0050.0100.0150.020STATICNON−PLASTICPLASTICNovel function loss frequencyKruskal−Wallis, p < 1e−04Novel function loss frequencyb 2.3.3 Lineages without plasticity that evolve in fluctuating environments ex- press more deleterious functions Phenotypic plasticity allows for genetic variation to accumulate in genomic regions that are unexpressed, which could lead to the fixation of deleterious instructions in PLASTIC populations. However, in NON-PLASTIC lineages, we observe a higher rate of novel function loss, indicating that they may be more susceptible to deleterious mutations (Figure 2.7b). Therefore, in experiment phase 2C, we tested whether adaptive phenotypic plasticity can in- crease the incidence of deleterious function performance. Specifically, we added an instruction that triggered an explicitly deleterious function and measured the number of times it was executed. Each execution of the deleterious instruction reduces an organism’s fitness by 10%. At the beginning of phase 2C, the deleterious instruction is not present in the population, as it was not part of the instruction set during phase one of evolution. Accordingly, if a deleterious instruction fixes in a population, it must be the result of evolutionary dynamics during phase 2C, including cryptic variation, hitchhiking, or as a result of sign epistasis where a deleterious instruction knocks out an even more maladaptive trait. Figure 2.8: Raincloud plots of (a) deleterious function acquisition, (b) deleterious function acqui- sition frequency, and (c) the proportion of mutations that increase deleterious function expression along a lineage that co-occur with a change in phenotypic profile. Each plot is annotated with statistically significant comparisons (Bonferroni-corrected pairwise Wilcoxon rank-sum tests). Note that adaptive phenotypic plasticity evolved in 43 of 100 replicates from the PLASTIC treatment during phase one of this experiment; we used this more limited group to seed the 43 phase-two PLASTIC replicates. At the end of our experiment, no representative organisms from the PLASTIC or STATIC treatments performed the deleterious function under any environmental condition; however, repre- 32 p < 1e−04p < 1e−040204060STATICNON−PLASTICPLASTICDeleterious function acquisition countKruskal−Wallis, p < 1e−04Deleterious function acquisition countap < 1e−04p < 1e−040.00000.00050.00100.0015STATICNON−PLASTICPLASTICDeleterious function acquisition frequencyKruskal−Wallis, p < 1e−04Deleterious function acquisition frequencybp < 1e−04p < 1e−040.000.250.500.751.00STATICNON−PLASTICPLASTICFraction of linked deleterious function acquisitionKruskal−Wallis, p < 1e−04Linked deleterious function acquisitionc sentative organisms in 14% of replicates of the NON-PLASTIC treatment performed the deleterious function at least once. NON-PLASTIC lineages contained significantly more mutations that con- ferred the deleterious function as compared to PLASTIC or STATIC lineages (Figure 2.8a), and these mutations occurred at a significantly higher frequency in NON-PLASTIC lineages (Figure 2.8b). Next, we measured how often mutations that increased deleterious function performance co- occurred with changes to the base function profile within representative lineages. A deleterious instruction can fix in a population by having a beneficial effect that outweighs its inherent cost (e.g., knocking out a punished function) or through linkage with a secondary beneficial mutation at another site within the genome. Across all NON-PLASTIC representative lineages, we found that approximately 49% (956 out of 1916) of mutations that increased deleterious function expression co- occurred with a change in the base function profile (Figure 2.8c). In all representative lineages from the PLASTIC treatment, only 18 mutations increased deleterious function expression, and none co- occurred with a change in base function profile (Figure 2.8c). Likewise, only 58 mutations increased deleterious function performance in all representative lineages from the STATIC treatment, and none co-occurred with a change in base function profile (Figure 2.8c). We did not find compelling evidence that the few mutations that increased deleterious function expression occurred as cryptic variation in PLASTIC lineages. We repeated this experiment with 3% and 30% metabolic rate penalties associated with the deleterious function, which produced results that were consistent with those reported here (Lalejini and Ferguson, 2021b). 2.4 Discussion In this work, we used evolving populations of digital organisms to determine how adaptive phenotypic plasticity alters subsequent evolutionary dynamics and influences evolutionary outcomes in fluctuating environments. Specifically, we compared lineages of adaptively plastic organisms in fluctuating environments to both non-plastic organisms in those same environments and other non- plastic organisms in static environments. 2.4.1 Evolutionary change We found strong evidence that adaptive plasticity slows evolutionary change in fluctuating en- vironments. Adaptively plastic populations experienced fewer coalescence events and fewer total 33 genetic changes relative to non-plastic populations evolving under identical environmental condi- tions (Figure 2.3). Whereas non-plastic populations relied on de novo mutations to adapt to each environmental fluctuation, plastic populations leveraged sensory instructions to regulate function performance. Indeed, in fluctuating environments, selection pressures toggle after each environmen- tal change. We hypothesize that in non-plastic populations such toggling would repeatedly drive the fixation of mutations that align an organism’s phenotypic profile to the new conditions. This hypothesis is supported by the increased frequency of coalescence events in these populations (Fig- ure 2.4a) as well as increased rates of genetic and phenotypic changes observed along the lineages of non-plastic organisms. Representative lineages in the non-plastic treatment experienced lower realized mutational robustness than plastic and static lineages (Figure 2.4b). We reasoned that this lower realized mutational robustness was due to non-plastic populations evolving a bet-hedging strategy where mutations are more likely to modify the phenotypic profile. However, when we switched from mea- suring the realized mutational robustness of representative lineages to measuring the mutational robustness of representative genotypes (i.e., what fraction of one-step mutants change the pheno- typic profile), we observed that non-plastic genotypes exhibited the highest mutational robustness of all three treatments (Figure 2.5). This result runs contrary to both our expectations and the results of other fluctuating environment studies in Avida (Canino-Koning et al., 2019). Canino- Koning et al. (2019) found that mutational robustness is negatively correlated with the number of function-encoding sites in the genome. In our work, most plastic and static genotypes encode all six base functions, while most non-plastic genotypes only encode functions from one environment; this results in fewer function-encoding sites, which may increase mutational robustness in non-plastic genotypes (relative to plastic and static genotypes). Regardless of the cause, this higher mutational robustness in non-plastic organisms indicates that bet-hedging is not driving the low realized mu- tational robustness observed in non-plastic lineages. Thus, we expect the lower realized mutational robustness in non-plastic lineages to be driven by survivorship bias. Because non-plastic lineages must rely on mutations to adapt to environmental changes, phenotype-altering mutations are often highly advantageous, and their selection decreases the realized mutational robustness of successful lineages. To our knowledge, this study is the first in-depth empirical investigation into how the de novo 34 evolution of adaptive plasticity shifts the course of subsequent evolution in a cyclic environment. The relative rates of evolutionary change that we observed in non-plastic populations, however, are consistent with results from previous digital evolution studies. For example, Dolson et al. (2020) showed that non-plastic populations that were evolved in cyclically changing environments exhibited higher phenotypic volatility and accumulated more mutations than that of populations evolved under static conditions. Furthermore, Lalejini and Ofria (2016) visually inspected the evolutionary histories of non-plastic organisms evolved in fluctuating environments, observing that mutations along successful lineages readily switched the set of traits expressed by offspring. Our results are also consistent with conventional evolutionary theory. A trait’s evolutionary response to selection depends on the strength of directional selection and on the amount of genetic variation for selection to act upon (Lande and Arnold, 1983; Zimmer and Emlen, 2013). In our experiments, non-plastic populations repeatedly experienced strong directional selection to toggle which functions were expressed after each environmental change. As such, retrospective analyses of successful lineages revealed rapid evolutionary responses (that is, high rates of genetic and pheno- typic changes). Evolved adaptive plasticity shielded populations from strong directional selection when the environment changed by eliminating the need for a rapid evolutionary response to toggle function expression. Indeed, both theoretical and empirical studies have shown that adaptive plas- ticity can constrain evolutionary change by weakening directional selection on evolving populations (Ghalambor et al., 2015; Paenke et al., 2007; Price et al., 2003). 2.4.2 The evolution and maintenance of novel functions In fluctuating environments, non-plastic populations explored a larger area of the fitness land- scape than adaptively plastic populations (Figure 2.6b). However, adaptively plastic populations better exploited the fitness landscape, retaining a greater number of novel functions than non-plastic populations evolving under identical environmental conditions (Figure 2.6a). In our experiment, novel functions were less important to survival than the fluctuating base functions. In non-plastic populations, when a mutation changes a base function to better align with current environmental conditions, its benefit will often outweigh the cost of losing one or more novel functions. Indeed, we found that along non-plastic representative lineages, 97% of the mutations associated with novel function loss co-occurred with phenotypic changes that helped offspring adapt to current environ- mental conditions. 35 Previous studies have shown that transitory environmental changes can improve overall fitness landscape exploration in evolving populations of non-plastic digital organisms (Nahum et al., 2017). Similarly, changing environments have been shown to increase the rate of evolutionary adaptation in simulated network models (Kashtan et al., 2007). In our system, however, we found that repeated fluctuations reduced the ability of non-plastic populations to maintain and exploit functions; that said, we did find that repeated fluctuations may improve overall function discovery by increasing generational turnover. Consistent with our findings, Canino-Koning et al. (2019) found that non- plastic populations of digital organisms evolving in a cyclic environment maintained fewer novel traits than populations evolving in static environments. Our results suggest that adaptive phenotypic plasticity can improve the potential for popula- tions to exploit novel resources by stabilizing them against stressful environmental changes. The stability that we observe may also lend some support to the hypothesis that phenotypic plasticity can rescue populations from extinction under changing environmental conditions (Chevin et al., 2010). Our data do not necessarily provide evidence for or against the genes as followers hypothesis. The genes as followers hypothesis focuses on contexts where plastic populations experience novel or abnormally stressful environmental change. However, in our system, environmental changes were cyclic (not novel), and no single environmental change was abnormally stressful. Further, the intro- duction of novel functions during the second phase of the experiment merely added static opportu- nities for fitness improvement. This addition did not change the meaning of existing environmental cues, nor did it require those cues to be used in new ways. 2.4.3 The accumulation of deleterious alleles We found that non-plastic lineages that evolved in a fluctuating environment exhibited both greater totals and higher rates of deleterious function acquisition than that of adaptively plastic lineages (Figure 2.8). There are several, non-mutually exclusive possibilities that could explain the fixation of explicitly deleterious instructions: random genetic drift, deleterious hitchhiking, epistatic effects, and cryptic variation (in plastic organisms). We find it unlikely that random genetic drift explains our observations. Each time an organism expresses a deleterious instruction, the organism incurs a 10% penalty to their replication rate, which results in strong purifying selection against mutations that cause offspring to execute deleterious instructions. 36 In asexual populations without horizontal gene transfer, all co-occurring mutations are linked. As such, deleterious mutations linked with a stronger beneficial mutation (i.e., a driver) can some- times “hitchhike” to fixation (Buskirk et al., 2017; Smith and Haigh, 1974; Van den Bergh et al., 2018). Natural selection normally prevents deleterious mutations from reaching high frequencies, as such mutants are outcompeted. However, when a beneficial mutation sweeps to fixation in a clonal population, it carries along any linked genetic material, including other beneficial, neutral, or deleterious mutations (Barton, 2000; Smith and Haigh, 1974). Therefore, deleterious genetic hitch- hiking could have contributed to deleterious instruction accumulation along non-plastic lineages in changing environments. Epistatic effects (i.e., interactions between genes) could have also contributed to deleterious instruction accumulation along non-plastic lineages. On their own, mutations that increase deleterious instruction execution are maladaptive; however, if such a mutation were to also knock out an even more harmful function, that mutation may have a net beneficial effect. As such, mu- tations that confer increased deleterious instruction execution could directly drive a selective sweep. Representative lineages from non-plastic populations in the cyclic environment exhibited higher mutation accumulation (Figure 2.3b), novel function loss (Figure 2.6c), and deleterious function ac- quisition (Figure 2.8a) than their plastic counterparts. In aggregate, we found that many (∼49%; 956 / 1916) mutations that increased deleterious instruction execution in offspring co-occurred with mutations that provided an even stronger benefit by adapting the offspring to an environ- mental change. We expect that an even larger fraction of these deleterious mutations were linked to beneficial mutations, but our analysis only counted mutations that co-occurred in the same generation. Our analyses did not distinguish between epistatic effects and deleterious hitchhiking; however, more fine-grained analyses of secondary effects of mutations that conferred deleterious instruction execution could be performed in future work to disentangle these two mechanisms. Theory predicts that under relaxed selection deleterious mutations should accumulate as cryp- tic variation in unexpressed traits (Lahti et al., 2009). Contrary to this expectation, we did not find evidence of deleterious instructions accumulating as cryptic variation in adaptively plastic lineages. One possible explanation is that the period of time between environmental changes was too brief for variants carrying unexpressed deleterious instructions to drift to high frequencies 37 before the environment changed, after which purifying selection would have removed such variants. Indeed, we would not expect drift to fix an unexpressed trait since we tuned the frequency of en- vironmental fluctuations to prevent valuable traits from being randomly eliminated during the off environment. Additionally, plastic organisms in Avida usually adjust their phenotype by toggling the expression of a minimal number of key instructions, leaving little genomic space for cryptic variation to accumulate. 2.4.4 Limitations and future directions Our work lays the groundwork for using digital evolution experiments to investigate the evolu- tionary consequences of phenotypic plasticity in a range of contexts. However, the data presented here are limited to the evolution of adaptively plastic populations. Future work might explore the evolutionary consequences of maladaptive and non-adaptive phenotypic plasticity (e.g., Leroi et al. 1994), which are known to bias evolutionary outcomes (Ghalambor et al., 2015). Additionally in our experiments, sensory instructions perfectly differentiated between ENV-A and ENV-B, and environmental fluctuations never exposed populations to entirely new conditions. These parameters have been shown to influence evolutionary outcomes (Boyer et al., 2021; Li and Wilke, 2004), which if relaxed in the context of further digital evolution experiments, may yield additional insights. Our experiments also focused on asexually reproducing digital organisms. Sex- ual reproduction has been shown to be advantageous in rapidly changing environments, such as the cyclic environments used in our study (Misevic et al., 2010). Future work could investigate how sexual reproduction affects the evolutionary consequences of adaptive plasticity. We focused our analyses on the lineages of organisms with the most abundant genotype in the final population. These successful lineages represented the majority of the evolutionary histories of populations at the end of our experiment, as populations did not exhibit long-term coexistence of different clades. Our analyses, therefore, gave us an accurate picture of what fixed in the population. We did not, however, examine the lineages of extinct clades. Future work will extend our analyses to include extinct lineages, giving us a more complete view of evolutionary history, which may allow us to better distinguish adaptively plastic populations from populations evolving in a static environment. As with any wet-lab experiment, our results are in the context of a particular model organism: “Avidian” self-replicating computer programs. Digital organisms in Avida regulate responses to en- 38 vironmental cues using a combination of sensory instructions and conditional logic instructions (if statements). The if instructions conditionally execute a single instruction depending on previous computations and the state of memory. As such, plastic organisms in Avida typically regulate phe- notypes by toggling the expression of a small number of key instructions as opposed to regulating cohorts of instructions under the control of a single regulatory sequence (Lalejini and Ferguson, 2021b). This bias may limit the accumulation of hidden genetic variation in Avida genomes. How- ever, as there are many model biological organisms, there are many model digital organisms that have different regulatory mechanisms (e.g., Lalejini and Ofria 2018) that should be used to test the generality of our results. As with most digital evolution experiments, our mutation rates were high and population sizes were small (3600 individuals) relative to experiments with microbes or conditions common in nature. As such, beneficial mutations can be generated rapidly and selective sweeps can occur quickly. Moreover, our analyses were limited to a single rate of environmental change and simple function reward structures, which likely influenced the rates of selective sweeps observed in our experiments. Future studies could address these limitations by increasing population sizes, decreasing mutation rates, investigating different function rewards and punishments, and altering the time spent in each environment. Supplemental Material The supplemental material for this article is hosted on GitHub and can be found online at https: //github.com/amlalejini/evolutionary-consequences-of-plasticity (Lalejini and Ferguson, 2021b). Data Availability Statement The datasets generated and analyzed for this study can be found on the Open Science Frame- work at https://osf.io/sav2c/ (Lalejini and Ferguson, 2021a). 39 Chapter 3 Potentiating mutations facilitate the evolution of associative learning in digital organisms Authors: Austin J. Ferguson and Charles Ofria This chapter is adapted from (Ferguson and Ofria, 2023), which was peer-reviewed and published in the proceedings of the 2023 Conference on Artificial Life. In this work we use analytic replay experiments to quantify how the probability of evolving associative learning (i.e., its potentiation) changes over the course of evolved digital lineages. Specifi- cally, we replay four case study lineages that successfully evolved learning. We find that potentiation can increase suddenly and identify “potentiating mutations” in each lineage that substantially in- crease potentiation in a single generation. We then dive into the mechanics of these mutations and how they interact with later mutations to confer associative learning. 3.1 Introduction How likely is the evolution of a particular trait? Researchers have long been interested in predicting evolutionary outcomes, but the inherent stochasticity in the process makes this goal exceptionally challenging. In order to make more accurate predictions, we would need to better understand how and why the underlying probabilities of potential outcomes change over time. Looking purely retrospectively at evolution in nature, this type of analysis is not possible (at least not without a time machine). Leveraging the flexibility and controls available in experimental evolution, however, allows us to empirically test questions that were previously only hypothetical (Kawecki et al., 2012). Here, we focus on Stephen Jay Gould’s idea of “replaying the tape of life” (Gould, 1990). The idea is simple: If we were to start life over again from the same initial conditions, would evolution follow the same pathway? Alas, Gould remarked that this experiment is unfortunately impossible. While it may be impossible to replay the entire tape of life, practitioners of experimental evolution have conducted this experiment on a smaller scale. Travisano et al. (1995), Wagenaar and Adami (2004), and Blount et al. (2008) introduced and refined methods of investigating the role of historical contingency in evolving populations: parallel and analytic replay experiments. By evolving multiple populations from the same starting organisms, researchers can identify the range and distribution of outcomes. These populations can be evolved simultaneously from a given starting 40 point (parallel replays), however many microbial and digital populations allow us to preserve a “fossil record”, opening up another possibility. Analytic replay experiments systematically revive historical populations to re-evolve them from multiple time points, allowing researchers to identify alternative possibilities after the fact (Blount et al., 2018). For a more thorough review of the story behind analytic replay experiments and an overview of several papers that have used them, see Chapter 1. In this work we use digital evolution, specifically the evolution of self-replicating computer programs in the Avida Digital Evolution Platform (Ofria and Wilke, 2004), which has previously been used to conduct replay experiments. Yedid et al. (2008) employed this technique to investigate the re-evolution of traits following an extinction episode, while Covert III et al. (2013) used analytic replay experiments to study the importance of individual deleterious mutations in the evolution of complex traits. We selected associative learning as a complex behavior to study potentiation. Associative learning is a non-trivial capability exhibited by most complex organisms. In digital evolution systems like Avida, it serves as a rare yet evolvable trait (Pontes et al., 2020). For an Avida organism to exhibit associative learning, it must be capable of sensing its environment, taking action, and storing information in memory. The evolution of associative learning has been studied via experimental evolution in both digital (McGregor et al., 2012; Pontes et al., 2020) and natural systems (Dunlap and Stephens, 2014; Mery and Kawecki, 2002), yet many questions remain about how it evolves. While more complex forms of learning are found in nature, associative learning remains an important building block for most others and insights about how it arises may be informative for understanding the broader evolution of intelligence, especially within digital contexts. In this work, we begin to analyze how the likelihood of evolving a complex trait changes along a successful lineage. Using analytic replay experiments, we identified individual mutations that cause drastic increases in the potentiation of associative learning. We then analyzed those mutations and their mutational neighborhoods to begin characterizing how a mutation is potentiating. While these replay experiments are informative and useful for exploring counterfactual evolutionary possibilities, they are also computationally intensive. As such, we start by focusing on a set of case-study lineages to develop an initial framework for understanding how potentiation can occur. Analyzing four successful lineages, we find that potentiation can increase suddenly, even due to a single mutation. Since these lineages were selected because they successfully evolved associative 41 learning, potentiation generally increases in each, though some decreases do occur. Potentiating mutations vary in initial effect, making them challenging to detect. Retrospective analysis allows us to identify them, however, and begin hypothesizing about the dynamics that allow these mutations to potentiate associative learning. This work demonstrates using analytic replay experiments for quantifying potentiation along a lineage and establishes baselines and techniques for future studies. 3.2 Methods Here we describe the digital evolution system and experiment setup used to conduct this work. 3.2.1 The Avida Digital Evolution Platform This work uses the Avida Digital Evolution Platform (Ofria and Wilke, 2004), which was described in Chapter 2. Here we use an early build of version 5.0 of Avida, cur- rently under development as part of the Modular Agent Based Evolver 2 (MABE2) framework (https://github.com/mercere99/MABE2). Avida has previously been used to study both associa- tive learning (Grabowski et al., 2010; Pontes et al., 2020) and historical contingency via replay experiments (Covert III et al., 2013; Yedid et al., 2008). Fundamentally, Avida is designed to have tools necessary to conduct work at the scale required for analytic replay experiments. Avida genomes consist of assembly-like instructions that transfer data between registers, make basic comparisons, perform mathematical operations, etc. We use an extended instruction set that includes extra flow control and environment-specific instructions (Ferguson, 2023). We used Avida populations on a 60x60 toroidal grid, resulting in a population cap of 3,600 organisms. Offspring are placed in a grid cell next to their parent, overwriting any existing organism in that cell; the parent organism is also reset. During reproduction, point mutations occur in offspring at a rate of 0.0075 per instruction, while single-instruction insertion and deletion mutations occur at a rate of 0.05 per reproduction. Organisms reproduce by executing the Repro instruction. To prevent organisms from immediately replicating, organisms must execute 1,500 instructions before the Repro instruction can be activated. Associative learning To test the evolution of associative learning, we created a simplified version of the Avida path following environment (Pontes et al., 2020). Instead of navigating 2D space, an organism exists in one of four states: forward, left, right, or the error state, backward. Organisms are given a Sense instruction, which will give them the nutrient cue of their current state. Using this nutrient cue, 42 organisms need to execute the appropriate instruction (one per state) to progress along the path. The forward and backward states have fixed cues (0 and -1, respectively), while, at birth, each organism is assigned random cue values for left and right in the range of [1, 106]. Organisms can genetically encode forward and backward, but must learn left or right in their lifetime to perfectly solve the task. Each path begins with one of four preset starting state sequences, chosen randomly for each organism at birth, followed by additional random states. The four preset paths are the “one-fixed turn” paths from (Pontes et al., 2020), where organisms are guaranteed to encounter a left state before a right. If the organism is not in an error state and executes the appropriate instruction, they are rewarded and move to the next state. If an organism executes the wrong instruction (e.g., the Left instruction in the right state), it is penalized and placed in the error state. While in the error state, the organism must execute the Backward instruction to return to the previous state and be allowed to try again; it will be penalized for any other action. A cooldown is applied, however, such that executing the Backward instruction causes the organism to wait for the equivalent of 10 additional instruction executions. Organisms are scored based on the number of valid states they successfully traversed minus the number of incorrect moves made, with a maximum score of 300. Fitness is calculated as 1.25score, so each additional correct movement grants a 25% boost in fitness regardless of the total number of correct movements. In this environment, optimal behavior requires associative learning in the form of imprinting. Since the paths are guaranteed to have a left state before a right state, the optimal behavior is to find and store the first positive cue value as the left cue. Combined with genetically-encoded forward and backward logic, storing and using the left cue is enough for organisms to identify the right cue through a process of elimination. Other possible behaviors involve error correction (assuming all turns are one direction, then correcting when wrong), bet-hedged learning (assuming more about the paths, e.g., that there are no instances of two lefts in a row), and various mixed strategies. To categorize the behavior of a genotype, we evaluate it in 100 trials to ensure we observe how it performs in all four environments with different random cues. We then classify each of the 100 trials. Trials are classified as learning if the organism correctly handles greater than 90% of the states they were in, error correction if they always successfully navigated one turn state but not the other, and “low activity” if they failed to successfully navigate at least 25 states. To be categorized 43 as learning or error correction, all 100 trials of that genotype must be of that class. If one or more trials were low activity, the genotype was categorized as “bet-hedged learning” or “bet-hedged error correction”. If a genotype displayed at least one learning trial and at least one error correction trial, they were classified as “mixed bet hedging”. Finally, all remaining genotypes were categorized as “low activity”. This categorization system was used across all three phases of this work. 3.2.2 Experiment framework To identify mutations that substantially increased the likelihood that learning evolved, we split the work into three phases (see Figure 3.1 for an overview). First, we seeded 200 initial parallel replicates in the associative learning environment with a default ancestor only capable of reproduction. Each replicate was given 250,000 updates, where one update is the time it takes for all organisms to execute 30 instructions, on average. We identified the most abundant genotype in each final population to represent the replicate and classified its behavior. We then extracted the “dominant lineage”, stretching from the ancestor to the representative genotype. Each step in the lineage corresponds to a change in genotype between parent and offspring, with clonal offspring occupying the same step as their parent. While a step is often a single mutation, it is possible that one step is composed of multiple co-occurring mutations. To begin analyzing changes in potentiation, we ran exploratory replays replicates on four lin- eages capable of learning. For each, we seeded independent replicates for every 50th step in the lineage, up to step 1,000. All replay replicates evolved in the same associative learning environ- ment, and replays were given the same number of updates as had occurred after that genotype first appeared (e.g., replays for a genotype that appeared at update 150,000 would be evolved for the remaining 100,000 updates). Potentiation was measured as the portion of replay replicates that evolved learning. Because replays were seeded with a single organism, some replay populations went extinct before ever reproducing and thus were not factored in (the minimum number of finished re- play replicates from a given lineage step was 38, while three case study lineages had a minimum of 48). While the exploratory replays provide an overview of how potentiation changed, we dug deeper by running targeted replays to further explore windows of increasing potentiation. Specifically, we found the 50- or 100-step “potentiation window” that sees the largest increase in potentiation in the exploratory results, and seeded additional replays for every step in that range. These targeted 44 replays were conducted identically to the exploratory replays, only they did not skip steps. Though computationally expensive, these replays illuminated the impact every genotypic change had on potentiation. Running 50 replay replicates per step still results in considerable noise, but we were able to identify mutations that clearly and substantially increased potentiation using these targeted replays. We hand-analyzed algorithms in all potentiating mutations, here defined as single lineage steps that result in a potentiation increase of 25 percentage points or more. Further, we assessed genotype fitness in context of their lineage to identify if potentiation mutations were beneficial, deleterious, or neutral. Finally, we characterized the local fitness landscape of each genotype (one and two mutations out), measuring the presence and fitness of nearby genotypes capable of learning. 3.2.3 Data and software availability Both the data and the software used to conduct this work are available in the supplemental material (Ferguson, 2023). Analyses were conducted in the R statistical computing language (R Core Team, 2021) using the dplyr package to summarize data (Wickham et al., 2022). Data was visualized using the ggplot2 and cowplot packages (Wickham et al., 2020; Wilke, 2020). 45 Figure 3.1: Illustration of the experimental design and hypothetical results. The top panes show the 200 initial parallel replicates seeded with the ancestral genotype and evolved for 250,000 updates. We extracted the lineage of the most abundant genotype in the evolved population (the dominant lineage), shown in red. Next, we conducted exploratory replays (middle panes) by launching replay replicates at regular intervals along focal lineages. These exploratory replays give a coarse-grained view of how potentiation changed over a lineage. We identified the window with the largest poten- tiation gain, shown as the shaded region. Finally, we ran targeted replay replicates for every step in this potentiation window. These fine-grained replay replicates show mutations that resulted in large potentiation increases (shown here with a red dot). 46 1,00095090050100150Initial parallel replicatesSetupHypothetical result…250,000 updates120019919832Exploratory replay replicatesSetupHypothetical result…x50x50x50x50x50x50Percentage of replays that evolved learningPhylogenetic step01,000350400399398397352353Targeted replay replicatesSetupHypothetical result…x50x50x50x50x50x50Percentage of replays that evolved learningPhylogenetic step350400351Dominant lineage 3.3 Results Here we discuss the generation and analysis of the initial replicate runs, followed by the more detailed results from each of the four case study lineages. Figure 3.2: Behavior classification of the final dominant genotypes from the 200 initial parallel replicates. 3.3.1 Evolution of learning in the initial replicates In the first phase of this study, we evolved 200 independent replicates for 250,000 updates (about 3600 generations) in the associative learning domain, each starting with a default ancestor. The distribution of evolved behaviors is shown in Figure 3.2. Only 16 of 200 replicates exhibited associative learning. An additional 15 replicates evolved forms of bet-hedged learning, with two of those replicates gaining and then losing associative learning along their lineage. The majority of replicates relied on some form of error correction, either as a sole strategy (140), a bet-hedged variant (2), or as a fallback due to limited learning (3). Finally, 24 replicates failed to navigate enough states to categorize them, leading us to label them as “low activity”. We analyzed all 16 replicates that evolved and maintained learning, identifying the length of their lineages from onset of evolution until learning stabilized, no longer showing further improve- 47 21514016243050100150LearningBet−hedged learningError correctionBet−hedged error correctionMixed bet hedgingLow activityClassificationNumber of replicates ment. Given the substantial computational power required for each time point studied, we performed replay experiments only on the shortest three such lineages (lineages A-C), plus the shortest lineage that exhibited error correction at some point in its evolutionary history (lineage D). Selecting these particular replicates to replay has the potential to bias the results, as discussed below. 3.3.2 Case studies of individual lineages Below we present the results of the replay experiments performed on the four focal lineages and provide a step-by-step analysis of how key mutations altered both immediate fitness and evolutionary potential (potentiation). Where possible, we explained how these mutations altered the underlying algorithms. For each lineage, potentiation across both exploratory and targeted replays can be found in Figure 3.3. Lineage A Our first case study is one of the shortest lineages and it contains a quick jump to learning (at step 537) from low activity, with only a brief time spent in bet-hedged learning. Exploratory replays for this lineage revealed a stark jump in potentiation between 400 and 500 steps from the ancestor, which we call the potentiation window. From step 400 to step 450, potentiation increased from 20% of replicates to 44%, and increased again to 84% of replicates by step 500. Before this window, potentiation fluctuated around the original 8% found starting from the default ancestor. After the selected window, potentiation increased once more and then fluctuated between 94% and 100%. With this in mind, we seeded 50 replays for each genotype in the potentiation window. Even with the noise due to a small sample size, we identified two mutations that conferred sizable increases in potentiation: steps 432 and 484. The mutation at step 432 brought potentiation above 50% for the first observed time in the lineage. Surprisingly, potentiation then decreased (on average) back to a local minimum of 20% of replicates at step 480. Finally, the mutation at step 484 substantially increased potentiation to 92%, where it stayed for all subsequent replays. Even though the largest jump in potentiation occurred at step 484, learning did not appear in the lineage until step 537. That said, only steps 516 and 525 caused any change in behavior; all other interim mutations occurred in unexecuted regions of the genome. The potentiating mutation at step 484 made a key instruction in the main loop of the genome redundant. It had no immediate effect on fitness, but later (in intermediate step 516) allowed the redundant instruction to be replaced by 48 Figure 3.3: Potentiation of associative learning for all case studies, shown as the percentage of replicates that evolve associative learning when replayed from a given genotype. For each case study, the left plot shows the results of the exploratory replays. We identified a window of potentiation gain in each lineage, indicated by the shaded region. Within that window, we conducted targeted replays for every step along the lineage, shown on the right. The color of the points corresponds to the behavior exhibited at that step of the lineage. A dotted line in the targeted replays indicates the step that conferred the most potentiation. 49 025507510002505007501000Lineage A0255075100400425450475500025507510002505007501000Lineage B02550751005075100125150025507510002505007501000Lineage C0255075100250260270280290300025507510002505007501000Lineage D0255075100500510520530540550Phylogenetic stepPercentage of replays that evolve learningClassificationLearningBet−hedged learningError correctionBet−hedged error correctionMixed bet hedgingLow activity a right turn that granted a small fitness increase as organisms could now navigate until they reached the second left turn. Step 525 further improved navigation, but used a comparison that made an unfounded assumption on whether the left or right random cue is larger. When the assumption was correct organisms were capable of learning the cues, however the assumption is only correct 50% of the time, so this genotype is categorized as bet-hedged learning. Finally, step 537 swapped that comparison with one that makes no assumptions about cue values, enabling the genotype to learn in all environments. Looking at the local mutational neighborhood, the potentiating mutation at step 484 increased the number of two-step mutations that conferred learning from 2 to 9 (of approximately 56 million). Additionally, the fitness of the learning mutations in the local neighborhood increased by three or four orders of magnitude. What about the earlier potentiation that was gained and then lost? The mutation that sub- stantially increased potentiation at step 432 introduced a comparison that had no immediate fitness effect. This comparison remained unimportant until step 525 when it became integral in introducing bet-hedged learning. Neither step 432 nor its predecessor had access to learning within a two-step mutational range. Thus, it is likely that the potentiation comes from that comparison given that we observed it being utilized for learning later on. Why then, did potentiation decrease between steps 432 and 484? At step 432 (and indeed before it), the algorithm had a section where if register B was non-zero, then B stored the cue associated with a left turn. While this information was likely to make the evolution of learning easier, it was unused at that time. As such, the mutations between steps 432 and 484 dismantled that machinery, requiring a replacement to be built before learning could evolve. Lineage B Similar to Lineage A, this lineage transitioned from low activity to learning through a brief period of bet-hedged learning. Exploratory replays on this lineage reveal that learning was potenti- ated almost immediately; by step 150 potentiation had climbed above 95%, where it stayed for the rest of the lineage. As such, the potentiation window included steps 50 through 150. Unlike Lineage A, the targeted replays reveal a general trend of increasing potentiation, with step 104 as a notable outlier. Mutations from steps 50 through 103 slowly increased potentiation from 4% to 44%, but the mutation at step 104 jumped to 80%. From there, another slow increase 50 continued to raise potentiation to a peak of 98% at step 150. Learning did not appear until step 195, over 90 steps beyond the largest potentiating mutation. Given that 34 intermediate mutations altered the encoded algorithm, the mechanistic pathway to achieve learning is more complicated than can be broken down in this work. However, the potentiating mutation at step 104 modified the execution flow of the genome, which appears to have been essential for the later evolution of learning. While two mutations occurred at step 104, only one caused a functional change: an instruction to swap data between registers was mutated to a left turn. Prior to this mutation, the genome encoded a left turn later on, after which the execution became trapped in an endless loop. The potentiating mutation was immediately beneficial; it allowed organisms to take the left turn earlier, which, in turn, allowed them to avoid the loop. As a side effect, a large portion of the genome that was previously executed was now skipped, and these instructions remained skipped when learning evolved 91 steps later. Looking at the local fitness landscape, learning was neither present in the potentiating step’s landscape nor in the step before. We hypothesize that the potentiation came from the change in execution flow, and that skipping over those instructions avoided a pitfall and freed up execution time that may have been useful in evolving learning. Lineage C Lineage C has the biggest single-mutation potentiation increase (64 percentage points), and that mutation was deleterious when it occurred at step 279 along the lineage. Learning later appeared at step 305. The potentiating step mutated a no-operation instruction into a conditional flow control in- struction. At step 278 the genotype was capable of bet-hedged learning, but the mutation at 279 knocked out all instances of learning, reclassifying the behavior as “low activity.” The next step restored some fitness, and then step 285 interacted with the mutation at 279 to not only restore fitness, but to dramatically improve it. Ultimately, the potentiating mutation at step 279 allowed the algorithm to more precisely discriminate between the left and right cues. Prior to the mutation, a less-than comparison was used, which only functioned correctly in 50% of instances, specifically those where the left cue was less than the right cue. The potentiating mutation switched to an equality comparison, which alleviated the assumption and initiated the transition from bet-hedging to learning. 51 Interestingly, the potentiating mutation lowered both the number and average fitness of learning mutations available in the local fitness landscape, Lineage D Of the four replicates we analyzed, Lineage D is the only one that evolved error correction before learning. Like the earlier lineages, the exploratory replays show that almost all potentiation comes from a single window. In this case, potentiation grew from 34% of replicates to 96% between steps 500 and 550. Targeted replays are especially noisy for this lineage, but generally show an increase in potentiation, especially in the latter half of the window. The largest jump in potentiation occurred at step 548, near the end of the window. Several prior mutations also showed notable potentiation increases, but in each case later mutations appeared to counteract them. Specifically, steps 542 and 543 appear to have higher potentiation than the points around them, but potentiation dipped back below 50% before the largest jump at 548. Out of all four lineages, D has the fewest steps between the largest potentiating step (548) and the first appearance of learning (556). At the time, the potentiating mutations at 548 caused no discernible change in fitness even though they increased potentiation by 50 percentage points. Two mutations occurred at step 548: a point mutation swapped a flow control instruction for a math instruction and an insertion mutation added a comparative conditional instruction into the main execution loop. At step 548 the genotype encoded a naive error correction algorithm: after setup, organisms could always handle right states, but always failed left states, recovered, and then continued. A mutation at step 556 swapped a sensing instruction with a math instruction, and this combined with the prior comparison instruction from step 548 to allow the organism to move left when needed and shifted the behavior to learning. The local fitness landscape supports the idea that the comparison instruction was useful to the evolution of learning, as the potentiating mutation increased the number of learning genotypes in the two-step mutational neighborhood from under 800 to over 100,000. Looking back at the apparent false start at steps 542 and 543, it is not clear what algorithmic changes these mutations conferred. These steps did, however, alter the set of learning behaviors that fell within the local mutational neighborhood. At step 541, there were only 324 two-step mutations that conferred learning, and only three of those resulted in a substantial fitness increase (a metabolic rate > 1015). After steps 542 and 543, that number rose such that over 900 two-step 52 mutations could confer learning, with over 500 resulting in a substantial fitness increase (including over 200 that reached a merit > 1025). We may be unsure of the exact effects of these mutations on the mechanics of the algorithm, but the changes in the local fitness landscape are profound. 3.4 Discussion and Conclusion 3.4.1 Potentiation can rise suddenly We have documented several cases where single mutations dramatically increased the proba- bility of associative learning later evolving. Of the four lineages analyzed, each had a single step in the lineage that resulted in a substantial increase in potentiation (ranging from 36 to 64 percent- age points). Indeed, two of the lineages had an additional potentiating mutation that resulted in an increase of over 30 percentage points. Looking only at exploratory replays, each lineage has a 50-step window that resulted in a potentiation increase of at least 40 percentage points. While four lineages are insufficient to make any strong claims, these results demonstrate that it is possible for single mutations to drastically increase potentiation, and provide compelling evidence that they may, in fact, be common. In Lineages B and D, however, we do also observe regions with smaller, incremental increases in potentiation. Further studies are clearly necessary to more fully understand the general patterns and processes by which potentiation rises across different representations and environments. 3.4.2 Potentiation can decrease along a successful lineage In two of the lineages we analyzed (A and D), we see evidence of potentiation decreasing over spans of the lineage. With only 50 replay populations per lineage step, our results are noisy and it is difficult to isolate what is occurring during these periods of potentiation decline. While we were unable to identify any “anti-potentiating” mutations with effects as large as the positive potentiation mutations, it is possible for a single step in a lineage to greatly decrease potentiation. Since we limited our analyses to runs where associative learning arose in the original replicate, we did not expect a preponderance of anti-potentiating mutations, but were intrigued to see evidence of them, even if at low effect. These same analytic replay experiments could be applied to lineages that failed to evolve the focal behavior, to see if potentiation of that behavior experiences sudden drops. Similarly, our replays targeted windows with substantial increases in potentiation; other windows would be more likely to include decreases. Finally, failed replays from starting points with otherwise high potentiation must have failed for a reason; they too could be used as a likely source 53 (albeit more artificial) of anti-potentiating mutations. 3.4.3 Potentiating mutations can appear innocuous when they first occur We analyzed the mutational step in each of the four lineages that conferred the greatest increase in potentiation. Of those four mutational events, two were neutral, one was deleterious, and one was beneficial. Even among these few replicates, there is no obvious pattern in the properties of potentiating mutations. Of the two neutral mutations, one made an instruction redundant while the other added a conditional instruction that had no effect when it was initially introduced. The deleterious and beneficial mutations both caused the execution flow to loop back earlier than it did before. Additionally, the number of mutations between the potentiating mutation and the appear- ance of learning varied wildly between lineages, ranging from 8 steps up to 91. The potentiating mutations in these four lineages are unique, and at the current time there is no pattern emerging among them. So, how are these mutations any different from other mutations? Untangling this mystery could be critical for predicting evolutionary outcomes or accelerating adaptive evolution. 3.4.4 We can identify how a mutation is potentiating There are many mechanisms by which a mutation could facilitate the evolution of associative learning. For example, the mutation could provide a building block that is helpful to perform the task. But for a mutation to be potentiating it must notably increase the probability of associative learning appearing in the future. Any change, no matter how helpful, that was already likely to occur would not be considered potentiating. Indeed, it is the earlier mutations that made that change so likely that would be potentiating. Of course, those mutations are also more challenging to identify. We have three different hypotheses for how a mutation could be potentiating: (1) It is the initial move into a genetic neighborhood with associative learning, (2) It is a shift into a genetic neighborhood with a more valuable version of learning, or (3) It is a “gateway” mutation that unlocks a beneficial pathway to learning, even though learning is not in the immediate genetic neighborhood. Across the potentiating mutations we analyzed, we have found evidence for each of these hy- potheses. The largest potentiating mutation in Lineage D supports Hypothesis 1, as it is the first time in the lineage that learning is only one mutation away. The main potentiating mutation in Lineage A and the earlier potentiation mutations in Lineage D support Hypothesis 2 as both cause drastic increases in the fitness benefit of learning mutations in the two-step neighborhood. 54 Finally, Lineages B, C and the early potentiating mutation from Lineage A all provide support for Hypothesis 3. The mutations from Lineages A and B both have zero learning mutations in their two-step neighborhoods. Interestingly, Lineage C sees a decrease in the number and fitness of learning mutations in the local neighborhood. Hypothesis 3 has many possible mechanisms by which it may work. For example, new traits may produce a single, clear, beneficial pathway of improvements to follow. Alternatively, a new building block may open a larger region with many different ways of evolving associative learning. Finally, the mutation may actually damage existing functionality or remove existing interactions that were impeding further evolution. While all three hypotheses have some support, future work can begin to uncover if a certain hypothesis is seen more often, what conditions might result in each scenario, or if additional analyses are needed to truly characterize these potentiating mutations. 3.4.5 Outlook This work is only an early step, focused on developing techniques and expectations for per- forming fine-grained analyses of replay experiments. Next, we must expand beyond four lineages, to collect broader, more systematic replay data, automating as much of the process as possible. We conducted this study on associative learning in Avida, but the underlying techniques must be examined broadly in other environments and substrates to ensure that our results are not unique to Avida or the evolution of associative learning. Within the current study system, there are many questions that remain unanswered: We focused on large increases in potentiation, but are there more obvious signals associated with decreases? How much of the noise that we see in our data is due to limiting ourselves to 50 replicates, and how much of it is do to actual shifts in potentiation with each mutation? What does potentiation look like in replicates that fail to evolve learning? Finally, it would be valuable to compare the specific evolutionary pathways the different replays take. Do they follow the same trend or do they differ? This would allow us to understand if, for example, a potentiating mutation funnels evolution in a fixed direction. Ultimately, these analytic replay techniques provide us with a tool for examining evolution in a prospective fashion, not just the retrospective approach that we are traditionally limited to. They will allow for the development of new evolutionary theory and predictive capacity that will be invaluable, both for understanding how meaningful complexity is produced in the natural world and for improving evolutionary applications. 55 Chapter 4 A large-scale digital evolution study of historical contingencies in associative learning Authors: Austin J. Ferguson and Charles Ofria This chapter is in final preparation before submission. This chapter directly extends the previous chapter. We expand from analyzing the potentiation dynamics of four case study lineages to replaying 50 lineages. With this larger dataset, we begin to conduct statistically powerful analyses of the the distribution of potentiation changes. In all 50 lineages we find a single-generation step in the lineage where the potentiation of associative learning increases significantly. Further, we verify that these increases are most often driven by a single mutation, though we do find evidence that multiple mutations can interact to increase potentiation. We find that these large potentiation increases are comparable to some, but not all, previous works on potentiation, and discuss those comparisons in detail. 4.1 Introduction In Chapter 3 I conducted four case studies and analyzed the genetic potentiation of associative learning in the Avida digital evolution system. As in the previous chapter, here I use “genetic potentiation” to refer to changes in the genetic background that promote the evolution of a particular trait or behavior, in this case associative learning (Blount et al., 2008). We found that potentiation can increase over a short period of time, including a single mutation between parent and offspring. While I conducted a deep examination of the observed potentiation dynamics in Chapter 3, I limited myself to examining only four case study lineages. Here, I expand this work by again replaying the evolution of associative learning in Avida, but shifting from detailed case studies to aggregate data by analyzing the potentiation dynamics of 50 lineages. My overall goal for this work is to provide a more comprehensive point of comparison for other studies on potentiation dynamics. While several previous studies of potentiation dynamics exist, they all analyze the dynamics of only one or at most a few lineages. By analyzing 50 lineages, here we shift from a few data points of potentiation measurements to full distributions backed by statistically powerful conclusions. With this work, I aim to accomplish two goals: 1) to allow us to examine previous studies in a new light, possibly identifying trends in potentiation dynamics across systems, and 2) to provide a benchmark of potentiation dynamics for future studies to compare 56 against. There are multiple, but not many, papers that directly or indirectly study potentiation dynamics in ways comparable to this work. We are testing the effect of accumulated genetic backgrounds on the subsequent evolution of a particular trait. Thus, we can discuss other papers that use analytic replay experiments (Blount et al., 2018) to test the potentiation of a trait across different points in an evolved lineage. Alternatively, we can leverage mutant studies, where experimental evolution is conducted on organisms with and without a particular set of mutations, but an otherwise identical genetic background. However, we can only compare against studies that look at the binary categorical outcome of “did X trait evolve?” across replicates; we cannot directly compare to similar studies of continuous phenotypes or of evolvability (Travisano et al., 1995; Wagenaar and Adami, 2004; Woods et al., 2011). In practice, you can turn a continuous variable into a categorical variable (e.g., “did the replicates evolve a size greater than 1µm2”), however such analysis decisions should be made before experimentation; post-hoc analysis can unintentionally inject our own bias, thus creating a statistical skew. We find that, at both 50 lineage steps and single-step levels, potentiation for associative learning can increase substantially. Further, single-step increases in potentiation are likely to be driven by a single mutation, though we do see both epistatic and non-epistatic interactions between multiple mutations to alter potentiation. Finally, we compare our potentiation dynamics against other works that leverage analytic replay experiments to test the effects of actual accumulated genetic history (Blount et al. (2008), Chapter 3). Additionally, we compare against works that examine constructed one-step mutants (Jochumsen et al., 2016), or varying levels of shared history across evolutionary replicates (Meyer et al., 2012). We find that several, but not all, of these other studies have observed similarly large increases in potentiation. This result provides a baseline point of comparison for future potentiation studies, both in silico and in vitro. 4.2 Methods This work extends the study described in Chapter 3. As such, most of the methods remain the same. At the core of these studies, we use the Avida digital evolution platform (Ofria and Wilke, 2004) to evolve digital organisms capable of simple associative learning in a path-following domain. Here we provide a brief outline of the methodology, highlighting areas where the methods differ from the previous chapter. All code, configuration scripts, and summarized data are available 57 online (Ferguson, 2023). 4.2.1 Initial evolutionary replicates First, we evolved populations capable of associative learning to provide lineages for us to replay. In order to accumulate 50 “learning replicates” (i.e., replicates that were capable of performing associative learning at the end of evolution), we ran batches of 500 replicates until we reached 50 learning replicates. This required 8 batches or 4,000 replicates total, for an ultimate success rate of 1.25%. The main details of these initial replicates were identical to Chapter 3. We started each replicate with a single Avida organism capable only of reproducing itself, and we capped the population size at 3,600 organisms by using a 60x60 grid for our population. We again evolved each initial replicate for 250,000 Avida updates and then identified a representative genotype – the most abundant genotype at update 250,000 – and extracted the genotypic lineage leading to it from the original ancestor. We used this lineage from each replicate to perform our replay analyses. Mutation rates were kept the same as the previous study: per-site substitution mutations occurred at a rate of 0.0075, while insertion and deletion mutations occurred independently at a rate of 0.05 per reproduction event. Organisms were again able to replicate themselves via the Repro instruction, but they were still required to execute 1,500 instructions before being allowed to reproduce. We evaluate the organisms on the same path-following task as the previous chapter. At each step in the path, we provided the organisms with an integer cue indicating which action they should take to maximize their fitness (move forward, turn left, turn right, or move backward). Organisms must interpret the current cue and execute the correct action, with each action being a single, atomic instruction that we added to the Avida instruction set. The “turn left” and “turn right” cues, however, have randomized values for each organism evaluation, and as such organisms must associate the cue values with the correct actions during their lifetimes (i.e., they must encounter the cues, empirically determine their meaning, and remember them) to perform optimally. This path-following environment is the same as in Chapter 3 with two exceptions: First, Chapter 3 used the pre-defined paths from (Pontes et al., 2020), where each organism was placed on one of four “one fixed turn” paths. This configuration guaranteed that organisms would always encounter the left cue before the right cue. For this follow-up work, however, we placed all of the organisms on completely random paths. We have thus removed the guarantee of seeing one 58 cue before the other, substantially increasing the difficulty of the learning task. Second, we have refined the definitions of metabolic rate and fitness. An organism’s score is calculated the same way as before: organisms gain a reward of +1 for each correct movement and a −1 for each incorrect movement. Metabolic rate, however, is now calculated as 2score instead of 1.25score. This new calculation means that each additional correct movement doubles metabolic rate and each incorrect movement halves metabolic rate. This change creates a steeper gradient – increasing performance will result in much higher metabolic rate (and thus fitness), but this also makes performance decreases much more deleterious. While metabolic rate is used for determining how often an organism executes instructions during the evolutionary run, here we calculate fitness specifically as metabolic rate divided by the reproduction time of the organism – thus organisms can increase their fitness by improving performance or by reproducing faster. 4.2.2 Exploratory replays As in Chapter 3, our goal is to identify how the genetic potentiation of learning changes over evolution, as evidenced by the course of the focal lineages. To accomplish this, we split our analyses into two types of analytic replay experiments (Blount et al., 2018): 1) exploratory replays that provide a coarse-grained overview of how potentiation changes over a lineage and 2) targeted replays that identify individual “potentiating” steps in a lineage. To calculate the potentiation level for associative learning in a given genotype, we simply replayed evolution from that genotype 50 times and recorded the percentage of replicates that evolved associative learning. We maintained the same stopping criterion for all replays: evolution would end after a lineage experienced 250,000 total updates (e.g., for a genotype that appeared at update 150,000, replays would evolve for the remaining 100,000 updates). In the previous work, replay replicates started from a single organism – the genotype along the lineage whose potentiation we were measuring. However, this technique resulted in a few replay replicates that went extinct for various reasons, such as being unable to reproduce if a specific cue sequence was encountered. Here, replays again consisted of 50 evolutionary replicates, but this time we started each replay replicate with a full clonal population of the genotype being tested. This change resulted in all replay replicates completing normally, and has the side effect of reducing the potential for adaptive momentum in our replay replicates (Chapter 5). We conducted exploratory replays for all 50 replicates that evolved learning. Here we began 59 our exploratory replays in the same way as before: we identified the step in the lineage that first exhibited learning and replayed that genotype. In Chapter 3, we then started at the ancestral genotype and replayed every 50th step in the lineage until we reached this first learning genotype. Here, after replaying the first learning phenotype, we then iterated backward. We would take 50 steps backward (toward the ancestor) along the lineage and run another batch of replay replicates. We repeated this process until we reached a genotype that resulted in potentiation ≤ 20%. The results of the previous chapter demonstrated that this backward traversal would likely capture the largest increases in potentiation, as all four increases were greater than 20 percentage points. Therefore, the only way that we would miss the largest increase in potentiation would be if there was substantial gain in potentiation followed by substantial loss, which we saw no support for in the previous chapter. This scheme allowed us to avoid replaying the earliest genotypes in the lineage, and since they would have taken the longest to run (more updates per replay), this change substantially reduced the computational resources needed to conduct our exploratory replays. In addition to our 50 learning replicates, we ran exploratory replays for ten replicates that did not evolve learning. We replayed two randomly-sampled replicates for each of the other five final behaviors (bet-hedged learning, error correction, bet-hedged error correction, mixed bet hedging, and low activity). As these replicates never evolved learning, we resorted to conducting exploratory replays as in Chapter 3: starting at step 50 in the lineage, we replayed every 50th step along the lineage. These non-learning replicates were replayed to give us a baseline of how potentiation might change in “unsuccessful” replicates. We recorded both the potentiation of learning and the potentiation for the other behaviors in these additional replicates. For each replicate (both learning and non-learning), we found the 50-step window with the largest gain in potentiation. We then conducted a two-tailed Fisher’s exact test between the po- tentiation before and after this window to determine if the potentiation gain of the window was significant. 4.2.3 Targeted replays As in the previous chapter, after conducting our exploratory replays we used that data to go a step further: replaying individual lineage steps. The goal of these targeted replays was to identify the “potentiating step” – the parent-offspring step in the lineage that conferred the largest increase in potentiation. 60 The exploratory replays provided insight into how potentiation changed over 50-step windows, and we used this information to construct a set of windows for targeted replays. Our goal was to capture as much potentiation gain as possible while remaining computationally feasible. We initialized the set with the 50-step window with the largest potentiation gain as identified by the exploratory replays. However, Chapter 3 showed that there are occasionally multiple windows (usually adjacent) that have similar increases in potentiation, so the potentiating step could be in one of these other windows. To catch these instances, before conducting targeted replays we recursively examined the windows adjacent to the those in the selected set. If an adjacent window had ≥ 90% of the potentiation gain of the maximum per-window gain, and the final potentiation of the window was above 10%, we then added this window to our set. Of the 50 replicates, the set contained only one window for 46 replicates, and the other four replicates contained exactly two windows. We then conducted targeted replays by running 50 replay replicates for every genotype in the selected set. The “potentiating step” was then identified as the step in the lineage with the largest absolute increase in potentiation from the previous step. Similar to the exploratory replays, we found the potentiating step for each of the 50 learning replicates and then tested the significance of the potentiation difference. We again performed a two-tailed Fisher’s exact test, this time between the potentiating step and the step before it in the lineage. 4.2.4 Post-replay analyses After conducting the targeted replays and identifying the maximum potentiating step of each lineage, we then conducted two additional analyses to further disentangle the details of each poten- tiating step. First, we re-analyzed the genotypes just before and just after the potentiating step in each lineage. For each pair of genotypes, we evaluated them on the same set of 100 paths. This allowed us to perform a paired Wilcoxon test to look for significant differences in fitness between the original genotype and the mutant. Since fitness effects (e.g., beneficial, deleterious, slightly deleterious, etc) are often loosely defined, this technique allows us to discuss these fitness differences in terms of significance and effect size, while making it possible to cleanly identify mutations that have exactly zero effect on fitness. For the second analysis we analyzed the replicates with more than one mutation in the poten- 61 tiating step. By calculating the edit distance (Wagner and Fischer, 1974) between the genotypes before and after the potentiating step, we isolated the N mutations in the potentiating step. With the individual mutations identified, we ran 50 replay replicates for each of the 2N possible combina- tions of mutations for replicates where N > 1. We then applied a logistic regression model to test the significance of the contributions for the mutations and all possible interactions. This analysis allows us to identify whether the potentiation gain was driven by one of the N mutations or by a combination of mutations. 4.3 Results and Discussion Here we detail the results of all experiments and place them in the broader landscape of his- torical contingency literature. Figure 4.1: Subplot A) Columns show the number of initial replicates whose representative genotype evolved each type of behavior. Above each column, the numbers show the number of replicates that evolved that behavior and that count as a percentage of all 4,000 replicates. Subplot B) Raincloud plots show fitness of all 4,000 initial replicates’ representative genotypes, grouped by evolved behavior. All pairwise comparisons are highly significant (p <= 0.001; Pairwise Mann- Whitney with Bonferroni correction) except for the one marked pair (bet-hedged learning and bet-hedged error correction did not have a significant difference). 62 50822121092217161.25%2.05%0.52%52.73%0.55%42.9%0500100015002000LearningBet−hedged learningError correctionBet−hedged error correctionMixed bet hedgingLow activityClassificationNumber of replicatesANSAll other p <= 0.0010.000.050.100.150.20LearningBet−hedged learningError correctionBet−hedged error correctionMixed bet hedgingLow activityClassificationMean fitnessB 4.3.1 The evolution of learning is exceedingly rare on random paths As shown in Figure 4.1A, associative learning evolved in only 50 of 4,000 replicates (1.25%). This is considerably lower than the probability of evolving associative learning in previous work, with 8% of replicates evolving learning in Chapter 3 and up to 14% in the “two fixed turns” environment of (Pontes et al., 2020). However, these higher success rates where observed in environments that guaranteed the order of one or more turns at the start of each path. In this work we have removed those guarantees entirely; the next step in the path is always uniformly chosen at random among a left turn, right turn, or a step straight ahead. Thus, we must compare these results to the “random start” experiments in (Pontes et al., 2020). In their work, zero of 50 replicates evolved learning under such conditions. This work therefore marks the first time that associative learning has evolved in Avida without any guarantees about cue ordering. It should be mentioned, however, that this work is more lenient with the Move Backward instruction, placing the organism directly back on the path instead of moving them one tile backward and requiring them to reorient on their own, as in (Pontes et al., 2020). While learning rarely evolved, error correcting behaviors evolved in the majority (53%) of replicates. Similarly, most of the other replicates (43%) were classified as “Low Activity” as, even at the end of the evolutionary replicate, they failed to correctly identify at least 25 cues in any of 100 trials. The remaining replicates evolved bet-hedged behaviors, with bet-hedged learning, bet-hedged error correction, and mixed bet hedging each present in less than 3% of final evolved populations. Finally, we see that organisms that perform learning have significantly higher fitness than organisms performing any other strategy (Figure 4.1B, all p << 0.001, pairwise Mann Whitney with Bonferroni corrections for multiple comparisons). Mixed bet hedging has the next highest fitness, falling between its two constituent parts, learning and error correction. Bet-hedged learning and bet-hedged error correction both have lower fitness, on average, than their non-bet hedged counterparts. Finally, “Low activity” has very low fitness, as expected. These fitness values show us that learning was not rare because it was a poor solution to the problem, but rather due to the rugged nature of the underlying fitness landscape. 63 Figure 4.2: Aggregate data for the exploratory replays (A,B) and targeted replays (C,D) of learning replicates. Subplots A and C) Raincloud plots showing the distribution of potentiation changes for the cumulative effects across all replayed windows (A) or the individual effects of all replayed lineage steps (C). The solid vertical line shows zero change and the dashed vertical line shows the mean of the potentiation changes. For each of the 50 replicates, a purple point shows the maximum single-window (A) or single-step (C) potentiation increase. These 50 points are then shown in isolation in (B) and (D). Subplots B and D) Raincloud plots showing the distribution of maximum single-window (B) or single-step (D) potentiation changes for each replicate (one point for each of the 50 learning replicates). The dashed vertical line shows the mean of these changes. 4.3.2 Substantial increases in potentiation are common By number of replays conducted, this work is the largest study of potentiation via analytic re- play experiments to date. Here we detail the key finding of this work: large increases in potentiation are not only possible, but are common in this system. This result is supported at multiple levels, as we detail below. For all 50 replicates, figures showing potentiation over time for both exploratory and targeted replays are available in the appendix. 64 −50050100Change in potentiationA0255075100Change in potentiationB−50050100Change in potentiationC0255075100Change in potentiationD Figure 4.3: Scatter plot showing the potentiation before and after each exploratory window (Subplot A) or lineage step (Subplot B). The color and shape of the 50 main points show significance of the potentiation change (Fisher’s exact test). Semi-transparent points show the rest of the per-window (A, 10% opacity) or per-step (B, 5% opacity) potentiation changes. The line shows y = x, or no potentiation change. Potentiation gain is the vertical distance to this line. The dashed line shows the maximum potentiation gain of each replicate, averaged across all 50 replicates. Figure 4.4: Subplot A) Distribution of the maximum per-window change in potentiation for learning (top) and most likely behavior to evolve (bottom) of each of the 10 replayed non-learning replicates. Colors for the bottom distribution indicate the most likely behavior for each replicate. Vertical dashed lines show the mean for each group. Subplot B) The maximum potentiation level of learning observed in each of the 10 non-learning replicates, sorted. All replicates have windows of significant potentiation gain Exploratory replays show us that, on average, the 50-step windows we replay show little po- tentiation gain (mean ≈ 6, median = 2 percentage points) (see Figure 4.2A). However, we see that 65 02550751000255075100Previous potentiationNew potentiationA02550751000255075100Previous potentiationNew potentiationBSignificancep <= 0.05p <= 0.01p <= 0.001Most likelybehaviorLearning0255075100Change in potentiationA0426120622464025507510012345678910Non−learning replicateMaximum potentiationof learningBLow activityError correctionBet−hedged error correctionMixed bet hedgingBet−hedged learning these replays also cover a wide range, from -44 (i.e., 44 percentage points of potentiation loss) to +98 (i.e., effectively full gain in potentiation in only 50 lineage steps). When we consider these 50 learning replicates, we can also examine the maximum per-window potentiation gain of each replicate. These maximum gains are highlighted in Figure 4.2A and shown in isolation in Figure 4.2B. When only considering these maximum per-replicate gains, we see a mean gain of approximately +56, a median gain of +49, and a range of [+32, +98] percentage points. Figure 4.3A shows all potentiation gains as a combination of their previous and new potentiation values. We see that potentiation increases significantly for all 50 learning replicates, even with only 50 replays each (p ≤ 0.01, Fisher’s exact). Indeed, 47 of these replicates are highly significant (p ≤ 0.001). Thus we have further evidence that potentiation of learning can increase considerably in a relatively short amount of time. Significant single-step increases in potentiation exist in all replicates While we see these large increases in potentiation over 50-step intervals, we can also look at smaller timescales. Indeed, the targeted, single-step replay results tell a similar story. Figure 4.2D shows the potentiation gains at all replayed steps. Here, the mean is approximately +1, the median is 0, and the range is [−38, +88] percentage points. Thus we see a similar trend to the exploratory replays: the bulk of the distribution residing near zero, but with a short tail in the negative direction and a long tail in the positive direction. Focusing on only the maximum potentiating step for each lineage (Figure 4.2D), we see a mean of approximately +42, a median of +38, and a range of [+20, +88] percentage points. Immediately we see that, in many replicates, a single step in the dominant lineage can account for 50 percentage points of the potentiation of associative learning. However, not all replicates see these huge jumps; multiple replicates have a potentiating step of less than 25 percentage points. When we examine the individual potentiating steps (Figure 4.3B), we see that all of these changes are significant. Of the 50 replicates, 32 were significant at p ≤ 0.001, nine more at p ≤ 0.01, and the final nine additional replicates at p ≤ 0.05. Non-learning replicates see windows of potentiation increase in other behaviors We also see evidence of large gains in potentiation of behaviors other than learning. Figure 4.4 details the results of our ten non-learning replicates that were replayed. Of these 10 replicates, the single-window potentiation gains for learning were relatively small, with a mean increase of 66 +7.6, median of +4, and a range of [0, +26] percentage points. These low values are not surprising, however, as these lineages did not ultimately evolve learning. The potentiation for learning was never actualized and eventually lost. It should be noted that, even so, a potentiation gain of 26 percentage points over 50 lineage steps is substantial, which is highly significant (p ≤ 0.001, Fisher’s exact). Indeed, we see that one of the ten replicates reached a maximum learning potentiation of 46% at one point in its evolutionary history, though seven replicates never crossed more than a 6% chance of evolving learning (Figure 4.4B). While the non-learning replicates did not see large increases in the potentiation of learning, they did see large increases in the potentiation of other behaviors. Figure 4.4A also shows the largest per-window potentiation increase in the most likely behavior at the end of evolution. Here we see a mean of +40.8, a median of +36, and a range of [+26, +64] percentage points. All of these differences are significant (p ≤ 0.01, Fisher’s exact), and eight of them highly significant (p ≤ 0.001). Thus, the large increases in potentiation that we have observed are not limited to the evolution of learning. Our expectation is that we would observe large single-step potentiation increases for these non-learning behaviors too if we conducted targeted replays on these lineages. Figures of how potentiation changed over time for all 10 non-learning replicates are available in the appendix. Comparison to other works In addition to a deeper investigation into the underlying patterN associated with historical contingency, the original motivation for this work was to validate the findings of Chapter 3 and provide evidence as to their generality. That is, we wanted to test if the large per-window and per-step increases we observed in that work were common or if they were mere flukes. Indeed, we can now confirm that both the per-window and single-step potentiation increases from that chapter are common in this particular system. The maximum per-step potentiation increases we observed in the previous chapter (range of [+36, +64] percentage points) fall comfortably within the range that we observe here ([+20, +88] percentage points). It is more difficult to compare these results to other works, however. While several works have leveraged replay experiments to study the potentiation of a particular trait or behavior, only one has applied replays to multiple points in a single lineage (Blount et al., 2008). Blount et al. replayed multiple time points of frozen E. coli from the Ara-3 strain to empirically test the potentiation of citrate metabolism. The largest jump in potentiation they found was ~+13.3%, shifting from 0/30 67 replicates at generation 31,500 to 4/30 replicates at generation 32,000 This is a much smaller jump in potentiation, measured over a much longer period of time compared to our results in this work as the tests were 500 generations apart. While not directly pulling from an evolving lineage, Jochumsen et al. (2016) tested the effects of individual mutations on the evolution of colistin resistance in Pseudomonas aeruginosa. Via experimental evolution, they found evidence that two individual mutations (phoQ and pmrB ) each potentiated the evolution of colistin resistance at the 16 µg ml concentration level. These two mutations each increased the potential to evolve this colistin level from a < 5% chance to a > 70% chance. These results directly align with our results – that a single mutation can drastically increase the probability of a focal behavior evolving. For other works, Turner et al. (2015) performed similar analytic replay techniques, but never saw the focal results – the extinction of the Cit− clade in the Long Term Evolution Experiment – in any of the replay replicates. Meyer et al. (2012) replayed combinations of Phage λ and E. coli. While some of their results are striking (e.g., the “all-or-nothing“ pattern of OmpF receptor targeting across certain treatments), the comparisons were across strains and thus not a direct comparison. In prior digital work, Covert III et al. (2013) showed that a single deleterious mutation could increase the evolution of a complex behavior in Avida (the EQU logic task) from 13/20 (65%) to 20/20 (100%) of evolutionary replicates, providing further evidence of substantial potentiation gain from a single mutation. Similarly, Bryson (2012) showed multiple instances of ≥ 50 percentage points of increase in short-term (always 10,000 updates) replays of EQU in Avida. 4.3.3 Increases in potentiation are often driven by a single mutation For each of the learning lineages, we have identified the potentiating step; that is, the step in the phylogeny that conferred the largest increase in potentiation. Since Avida has a per-site rate for substitution mutations, and insertion and deletion mutations that can occur simultaneously, these lineage steps can consist of more than one mutation. Even so, 60% of the potentiating steps (30 of 50) are comprised of only one mutation. Of the remaining 20 potentiating steps, 14 steps contained two mutations, three steps contained three mutations, one step contained four mutations, and two steps contained five mutations. Of course, the 30 potentiating steps that consisted of a single mutation must be driven by that particular mutation. For each of the other 20 potentiating steps (where multiple mutations 68 Figure 4.5: Selected results from the mutation split experiment. For each subplot, the rightmost column shows the potentiation immediately after the largest potentiating step in that replicate (also shown via solid horizontal line), and the leftmost column shows the potentiation immediately before that step in the lineage (also shown via dashed horizontal line). This potentiating step involved N mutations, where N is 2 for subplots A and B, 3 for subplot C, and 5 for subplot D. There are a total of 2N bars in each subplot, representing the 2N possible combinations of these mutations, as indicated by the small boxes below each bar. The left bars show potentiation with none of the N mutations presents, the right bars have all N mutations present, and the middle bars (purple) show the potentiation of every other possible combination of these mutations. Significance of each mutation combination in the logistic regression is shown under each column (* for p ≤ 0.05, ** for p ≤ 0.01, *** for p ≤ 0.001). In subplots A and D potentiation is driven by just one of the mutations; in B potentiation requires both mutations (which are individually lethal); and in C all three mutations are independently-potentiating. co-occurred), we ran replay experiments to split the mutations and determined that not all were necessary to increase potentiation. Indeed, in 14 cases we identified exactly one mutation that significantly increased potentiation on its own. The two steps involving five-mutation are included here, as a single mutation explained the potentiation gain in both cases. Data from an example of potentiation being driven by one of two co-occurring mutations can be seen in Figure 4.5A, and an 69 ***0.000.250.500.751.00PotentiationA*********0.000.250.500.751.00PotentiationB**********0.000.250.500.751.00PotentiationC*****0.000.250.500.751.00PotentiationD example of one driving mutation out of five total mutations can be seen in Figure 4.5D. As discussed above, other work has shown that single mutations can have a large impact on potentiation. This has been shown in colistin resistance in Pseudomonas aeruginosa (Jochumsen et al., 2016) and in the evolution of EQU in Avida (Covert III et al., 2013). Thus we have instances where potentiation increases drastically with single mutations in both digital and natural organisms. 4.3.4 Mutation interactions influence potentiation While we are able to confidently say that many replicates saw potentiation increase with a single mutation, we also observed instances where potentiation changes appeared to require interac- tions across multiple simultaneous mutations, or other unexpected interactions. We identified one replicate where two mutations independently increased potentiation, and combined they accounted for the full step’s potentiation gain. In one of the steps containing three mutations, all three muta- tions independently increased potentiation significantly, but the combination of mutations did not further increase potentiation (Figure 4.5C). In two replicates we see strong sign epistasis; each step consisted of two mutations, and when analyzed independently these mutations proved to be lethal (and thus potentiation is 0), but when the mutations were combined the resulting mutants were not only viable, but potentiated (Figure 4.5B). This is akin to the case study of EQU potentiation in (Covert III et al., 2013), where two deleterious mutations (one nearly lethal) combined to actualize EQU. Finally, of the 50 replicates, two were found to have no significant mutation effects, and further replicates would be required to disentangle the effects of the mutations and their interactions. 4.3.5 No real-time hallmarks of potentiating mutations were evident in an evolving population As discussed in Chapter 3, while we can retrospectively identify potentiating mutations, we have yet to find strong indicators of these mutations during evolution. Any such consistent indicators would be invaluable for problem solving with evolutionary computation or other forms of applied evolution. Our early hypothesis was that these potentiating mutations are likely to be deleterious, or at least neutral. Our reasoning was that if the mutations are beneficial, they would already be likely to be selected and thus they would already be factored into evolutionary potential. Of the four case study lineages in Chapter 3, we saw that one potentiating step was deleterious, two were neutral, and one was beneficial. 70 Figure 4.6: The fitness effect and potentiation gain of the potentiating step of all 50 learning replicates. To calculate fitness, both the potentiation step and the step before it were evaluated on the same set of 100 random seeds. We ran a paired Wilcoxon rank-sum test for these two fitness groups for each replicate, and divided and colored the results based on these significance findings. Non-significant points are split into those that are truly silent mutations (i.e., zero change in fitness across all 100 trials) and those that are not (fitness values changed, but not significantly). The fitness ratio, used for the x-axis, is calculated as the average post-mutation fitness divided by the average pre-mutation fitness. The vertical line shows a fitness ratio of one, which indicates no change in the average effect. Here, we have refined our methods to compare the potentiating step to the step before it, evaluating both genotypes on the same set of 100 random paths to conduct a pairwise comparison (Figure 4.6). Using a paired Wilcoxon test, we find that in 28 of 50 lineages the potentiating step did not have a significant effect on realized fitness. Indeed, in 15 of these replicates, the two genotypes displayed exactly the same behavior – the mutations were truly silent, while in 13 changes to fitness occurred, but not enough to meaningfully change the distribution. The truly neutral mutations may be modifying instructions that are executed but do not directly effect the phenotype, or they could be cryptic genetic variation occurring in unexpressed regions of the genotype (Chapter 2). Of the remaining 22 replicates with significant changes to fitness, we find that 21 significantly increase fitness, and only one mutation was significantly deleterious. Further, our one deleterious mutation was significant (p < 0.001) but not substantial, with less than a 1% change in fitness. The fitness effects of the significantly beneficial mutations varied wildly, with many conferring small impacts on fitness. Two mutations had a greater than 50% increase in fitness, one of which had a fitness over 250% of the pre-mutation fitness. Thus, we have strong evidence against our hypothesis that potentiating mutations are likely deleterious. We observe that potentiating mu- tations are often neutral or nearly-neutral, though they can also drastically increase fitness. In a 71 Silent, N = 15NS, N = 13p <= 0.05, N = 4p <= 0.01, N = 4p <= 0.001, N = 141.01.52.02.51.01.52.02.51.01.52.02.51.01.52.02.51.01.52.02.50.000.250.500.751.00Fitness ratio (after mutation / before mutation)Potentiation gain genetic space as complicated as Avida’s, even highly-beneficial mutations are not guaranteed to be found, let alone sweep. Given the number of alternative possibilities, some of which may be highly advantageous themselves, even beneficial mutations can be highly potentiating. Indeed, even if dele- terious mutations would open up new pathways through the genotype space, the low probability of them gaining substantial numbers in the population may eliminate the possibility of them being potentiating. One final hypothesis for identifying potentiation mutations as they occur is to look for mu- tations that alter the overall behavioral phenotype. Of the 50 potentiating lineage steps that we analyzed, we found that 43 of 50 left the underlying behavioral phenotype unchanged. Of the seven that did change the phenotype, all were originally error correction behaviors. After the potentiat- ing step, five then shifted to a mixed bet hedging strategy – they still performed error correction on some paths but were capable of learning on others. One more replicate evolved learning with the potentiating step, likely representing a low probability mutation that was actualized. The final behavior-changing mutation shifted the behavior from error correction to bet-hedged error correc- tion. While this replicate might represent an interesting case – the degradation of behavior – the majority of potentiating mutations did not change the behavior and thus behavior changes are not a reliable indicator of potentiating mutations when they occur. 4.4 Conclusion Here we have conducted a study of potentiation with the largest number of analytic replay experiments to date. By replaying 50 lineages of digital organisms that evolved associative learning, we have provided further evidence that potentiation can increase suddenly, with strong evidence that a single mutation can result in large increases in potentiation. We have also shown that potentiation can increase as a result of the co-occurrence of multiple mutations, with or without epistatic interactions. Finally, we were unable to find a signal for these potentiating mutations when they appear – as such, we are still only able to identify these mutations retrospectively. 4.4.1 Limitations and applicability to other systems In both this work and Chapter 3 we primarily investigate the potentiation of one behavior, associative learning, in the Avida digital evolution framework. As such, we cannot claim that the potentiation dynamics observed in this work are globally applicable. Indeed, we can only claim these results as evidence that large, single-mutation increases in potentiation are possible, and that, 72 in certain systems, they might be common. While this work is among the first to explore patterns in potentiation across many evolved lineages, it is far from the first to explore potentiation dynamics. Do the results of other studies match the large increases in potentiation we observed for the evolution of associated learning in Avida? In some cases it appears that short spans of evolution, and in fact even single mutations, confer substantial increases in potentiation (Covert III et al., 2013; Jochumsen et al., 2016). These increases are not universal, as other works have seen maximum increases in potentiation substantially lower than observed here (Blount et al., 2008). In conducting these comparisons, however, we must consider that the sample sizes for these other works are quite low and that may bias our comparisons. Additionally, all replay replicates in (Blount et al., 2008) evolved for the same number of generations regardless of the starting point, while we ensured that all replay replicates experienced the same number of Avida updates as the original lineage experienced after the sampled genotype appeared. We would expect the former scheme to observe substantially lower potentiation in replicates founded from earlier points in the population’s history. Finally, it is possible that the lineages in these other studies did see substantial increases in potentiation, but those particular changes were missed in those experiments. If these potentiation dynamics truly differ, what might explain these differences? While Avida has a rich and complex genetic architecture compared to many other digital evolution experiments, these digital organisms are still vastly less complex than even the most simple natural bacteria. An Avida organism in this work, with our original genome length of 100 instructions, has roughly 9,000 possible one-step mutations that can occur. With a naive assumption that every nucleotide could mutate to any other nucleotide at every site in the E. coli genome (~4.6 million base pairs, Blattner et al. (1997)), we would expect over 13 million one-step substitution mutations alone. While this is only a rough estimate, it is clear that the likelihood of a particular mutation appearing is much lower in natural organisms, which can bias the potentiation changes. This difference only compounds as we consider the likelihood of multiple necessary mutations arising and fixing. Therefore, we expect that natural organisms will typically see lower levels of potentiation for equivalent genetic distances from the focal behavior. Additionally, there is likely a difference in how the focal behavior arises and is classified. We are stringent in our system and only classify a behavior as learning if it is already well-refined, and thus 73 even the first organism in the population to be classified as having evolved learning is likely to have high fitness. In natural systems, biologists will often denote a non-trivial trait or behavior as present even if it still requires considerable refinement. For example, in the E. coli studies in the LTEE, citrate metabolism was not highly advantageous initially, but rather needed considerable refinement before it rose to prominence in the population (Blount et al., 2012). Thus, even genotypes capable of weak citrate metabolism may still have low potentiation for maintaining that trait in the longer term, as it may be lost before it is refined. Follow-up work to our study should consider analyzing the potentiation of all forms of learning, including bet-hedged learning and mixed bet hedging strategies that combine learning and error correction, to see if potentiation dynamics change. Further, learning is a terminal behavior in our system – the lack of open-endedness dictates that once learning evolves, it is unlikely to be replaced with a more effective survival strategy. Combined, these factors may create larger increases in potentiation, and should be dissected in future digital studies. Finally, our system uses small population sizes (3,600 organisms) compared to natural systems (which are often in excess of tens of millions of organisms), and our replay techniques may also bias the changes in potentiation. By using a smaller population size, we increase the effect of genetic drift on our population and increase the likelihood that neutral or even deleterious mutations can spread before being purified. In fact, by replaying populations via a full clonal population of an genotype from the dominant lineage, we are resetting all population dynamics from the potentiation measurements. During normal evolution of a population, not only does a mutation need to occur, but the organism in which it resides also needs to successfully replicate at least once. By using clonal populations for our replay experiments, we ensure that every mutation establishes, eliminating possible intermediate potentiation values. 4.4.2 Future work In this chapter we described our deep analysis of the potentiation dynamics behind the evolution of associative learning in Avida. First and foremost, future work should investigate potentiation dynamics in a range of other systems. Digital systems as simple as bitstring evolution problems could illuminate how the dynamics are affected by factors such as mutation rate and population size. Slightly more complicated systems could investigate characteristics of the genetic space such as connectedness, dimensionality, mutational operators, or crossover. Finally, more wet-lab studies of potentiation across a wider range of environments or in a more varied selection of natural organisms 74 can provide more real-world grounding for these results, though such systems are considerably more labor-intensive. As we alluded to in Chapter 3, not only are we interested in identifying mutations that substan- tially increase potentiation, but we are also interested in the mutations that significantly decrease potentiation. In this work we replayed ten replicates that did not evolve associative learning, to see how their potentiation dynamics differed from “successful” replicates. Even within those ten repli- cates we observed one replicate that reached a height of 46% potentiation for associative learning. Extending this idea, if we have 50 replicates that evolved learning, we might expect 100 replicates to have reached 50% potentiation but only half of them to have succeeded in that coin flip. This conjures the question: do we see large decreases in potentiation similar to the increases observed in this work? Are there single mutations that doom a would-be promising lineage to an “unsuccessful” fate? For example, an immediately beneficial mutation might sweep through a population trapping it on a local optima and preventing it from following a pathway that would lead a population to higher levels of fitness. Identifying these anti-potentiating mutations in real time could potentially be as valuable as identifying potentiating mutations as they could allow us to slow the evolution of pathogenic or otherwise harmful populations. While additional studies would be needed to fully delve into these dynamics, early work should dive into this dataset. When conducting replay ex- periments, we are effectively quantifying the potentiation of each possible behavior. Thus we have non-learning potentiation data, but analyzing this data is outside the scope of this chapter. Finally, additional work is needed to identify techniques that reduce the cost of performing replay experiments. This work has taken one step: iterating backward from the appearance of the focal behavior during the exploratory replay phase. Additional optimizations could include minimizing the original number of replicates per replayed time step and then running additional replicates for those that seem interesting. Finding more efficient ways of searching for potentiating steps is critical. We cannot directly perform a binary search for finding the largest potentiating step, as potentiation does not increase monotonically and thus large increases in potentiation can be hidden by decreases in the same window. However, there may be variations of binary search that will provide the maximum potentiating step within some margin of error or the maximum sustained increase in potentiation. In particular, using digital systems to find and verify these ideas might make future in vitro experiments more tractable to perform. 75 Overall, these results provide a useful benchmark for any future projects that use replay exper- iments to study historical contingency. As more studies of mutations are conducted, we can refine our ideas about how potentiation changes over the course of a population’s history. As we progress, we can leverage our understanding of potentiation dynamics to better infer what happened in the evolution of life around us, better prepare for how populations will evolve under climate change, and better leverage evolution to solve technical problems. 76 Chapter 5 Predicting the unpredictable: Using replay experiments to disentangle how evolutionary outcomes are altered by adaptive momentum Authors: Austin J. Ferguson, Charles Ofria, and Clifford Bohm This chapter is adapted from a paper that has been peer-reviewed and accepted to appear in the proceedings of the 2024 Conference on Artificial Life. In this work we use the lens of historical contingency and potentiation to provide a new vantage point in understanding the adaptive momentum effect. According to the adaptive momentum framework, when a population is in disequilibrium some advantaged organisms can buffer more deleterious mutations than usual. This increased buffering increases evolutionary exploration until equilibrium is reestablished. Here we replay the evolution of select populations evolving in an idealized environment: a one-dimensional sawtooth function. At each step we are able to quantify their potential to cross the next deleterious fitness valley. We find that populations experiencing disequilibrium, here caused by a selective sweep, see their potential to cross increase drastically with mutations near the leading edge of the sweep. Conversely, populations in equilibrium rely purely on chance to cross – only the last mutations before a valley cross confer an increase in potential. Finally, we also performed “shuffled” replays to demonstrate that the structure of the population, not just its genetic makeup, is key in increased valley-crossing potential in this system. 5.1 Introduction In Chapter 1 we discussed how researchers are beginning to use analytic replay experiments to quantify how long-term evolutionary outcomes changed over the course of a population’s history. Indeed, in Chapters 3 and 4 we employed these replay techniques to identify key “potentiating” mu- tations in the evolution of associative learning in Avida. Here, we use these same replay techniques to understand the “adaptive momentum” effect (Bohm et al., 2024). Previously, replay experiments have been employed to measure the potentiation of complex traits and behaviors, both in microbial organisms (Blount et al., 2008; Gupta et al., 2022; Jochumsen et al., 2016; Meyer et al., 2012) and in digital organisms (Chapters 3 and 4). Here we instead focus on the potentiation of a significantly simpler task: crossing deleterious fitness valleys in a one- dimensional sawtooth function. These fitness valleys require exactly six mutations to cross, which, 77 due to the one-dimensional nature of this system, must be acquired in order. Therefore, we already know that any potentiating mutations must be a subset of these six mutations, and thus we are not searching for them as we were in the previous two chapters. However, we do not know the amount of potentiation conferred by each of these six mutations. By comparing these potentiation dynamics of two categories of populations (those in disequilibrium and those in equilibrium), we can directly observe the effect that adaptive momentum has on evolutionary exploration and how these effects change over time. 5.1.1 Adaptive Momentum The adaptive momentum framework suggests that periods of disequilibrium resulting from phenomena like selective sweeps and range expansions can enhance genetic exploration (Bohm et al., 2024). We use analytic replay experiments to investigate this effect in selective sweeps. Consider the appearance of a beneficial mutation in an asexual spatial population. If this mutation establishes and is sufficiently strong, it can trigger a selective sweep where the genotypes with the beneficial mutation come to dominate the population. During the sweep, there will be a boundary between individuals with the beneficial mutation and those without (wild type). If advantaged individuals along this boundary accrue relatively small deleterious mutations, they may still have a combined fitness benefit over the wild type. Thus, individuals along the leading edge of the sweep will have an increased potential to accumulate deleterious mutations that, in turn, increase the potential to explore genetic space and facilitate genetic discovery across fitness valleys. Adaptive momentum persists until the wild type is eradicated and equilibrium is reestablished. However, if during fixation a new beneficial mutation is discovered, the state of disequilibrium will persist, thus extending the “momentum window” (the period of adaptive momentum). 1 In the work presented here, we create an ideal environment for adaptive momentum by using a one-dimensional spatial population evolving on a rising sawtooth fitness function. We use replay experiments to measure how the state of the population affects the potential to cross the next fitness valley. Replay experiments show that adaptive momentum increases the potential for populations 1Adaptive momentum describes how disequilibrium during evolution can result in periods of increased mutation buffering. Such disequilibrium can be caused by several conditions, including sweeps, range expansions, and increases in carrying capacity in both spatial and well-mixed pop- ulations. The adaptive momentum framework further considers how the increased potential for mutational buffering can also affect large scale evolutionary rates via increased genetic exploration and subsequent genetic discovery. 78 to cross fitness valleys, an effect that diminishes over time. Using simple assumptions about the structure of the leading edge of a selective sweep, we generated a predictive model of the potential for valley crossing during adaptive momentum. While this estimation is highly accurate for early steps into the fitness valley, it loses accuracy when the leading edge is deeper in the valley. The time required for a population to reach deeper mutations in the fitness valley is likely also increasing the number of stochastic events that cause differences from the assumptions of the model. Finally, by shuffling organism positions in our population snapshots we can disrupt population structure. This shuffled analysis allowed us to verify that the organization of the population, not just its genetic composition, is vital to valley-crossing potential. Overall, this work refines the framework proposed by adaptive momentum while advancing the methodology of analytic replay experiments as a tool for studying historical contingency, exposing sections of both that are ripe for further study. 5.2 Methods 5.2.1 Evolution system In order to isolate the adaptive momentum effect and keep computational costs feasible, we used a minimal agent-based evolution model where each organism’s genome consists of a single integer, with value x. Fitness is based in a repeated “sawtooth” function, similar to the one used in (Bohm et al., 2024), to create a landscape with regular fitness peaks separated by valleys. Each organism is assigned a quality score (s(x)), and then that score exponentiated to determine fitness (f (x) = 10s(x)). Score is calculated using the sawtooth function: s(x) = x − (x mod w) w G − (x mod w)D where G is the gain per peak, D is the decrease per step into the valley, and w is the valley width (number of mutational steps from one peak to the next). Effectively, the score function calculates the quality of the highest peak achieved and subtracts the cost of the current mutational step into the valley. We used G = 1.0, D = 0.05, and w = 6. This means that peaks appear every six steps; in this case at x = 6, x = 12, x = 18, and so forth. We refer to these peaks by their height, so those three peaks would be p1, p2, and p3, respectively. We refer to the steps between peaks by the prior peak and an offset (e.g., x = 17 is p2 + 5, the last step before p3). Figure 5.1 illustrates this 79 sawtooth function and the relevant peaks. Crossing a valley increases an organism’s fitness by a factor of 10, and each step into the valley reduces fitness by ~12%, relative to the previous peak. We selected these drastic fitness differences to clearly demonstrate the effect.While this degree of selective benefit may be rare in biological systems, cases such as antibiotic resistance can show fitness increases of this magnitude (Gullberg et al., 2011). Figure 5.1: The sawtooth function used in this work, with the four peaks mentioned in this work labeled. Score, s(x), and fitness are both shown, with fitness being 10s(x). We evolved populations of 512 organisms in a one-dimensional spatial population – a single line of organisms – that did not wrap. This structure maximizes fixation times and makes selective sweeps easy to track, as they could only move left or right. The population evolved via discrete, non- overlapping generations. To fill positions in the next generation, we performed rounds of spatial roulette selection, each involving three organisms: the organism in that position in the previous generation and its two immediate neighbors (unless the organism is on the edge, in which case the roulette is between only two organisms). We then copied the selected individual to produce the offspring, with a 0.0125 chance to mutate; mutations either increment or decrement the offspring’s genome value by one. Selective sweeps are thus limited to advancing one position in the population per generation, limiting the growth rate and therefore the fixation time of a beneficial mutation. 5.2.2 Experiment design We conducted this work in three stages: 1) we validated that our model demonstrates adaptive momentum; 2) we ran “benchmarking” data to create an expectation of how potentiation changes during a momentum window; and 3) we conducted analytic replay experiments to observe changes 80 p1p2p3p4012341001011021031046121824Organism genome valueScoreFitness in potentiation in evolved lineages. Here we outline these experiments in more detail. Experiment I: Model Validation To ensure that our model could produce the adaptive momentum effect, we replicated the primary experiment from (Bohm et al., 2024) comparing crossing times in populations starting from equilibrium versus those starting during the disequilibrium of a previous selective sweep. We ran 500,000 evolutionary replicates, where each replicate started with 512 organisms at p1 and ran for 5,000 generations. In replicates where p2 was discovered, we recorded the generation of discovery and extended the run duration for another 5,000 generations beyond the point of discovery. If p3 was discovered before time ran out, we recorded this time as well. This methodology produced two time distributions: time to first crossing and time between first and second crossing, both capped at 5,000 generations. Experiment II: Benchmarking The adaptive momentum framework posits that populations in disequilibrium experience an increased rate of adaptation. In spatial populations experiencing a selective sweep, the disequilib- rium should manifest near the leading edge of the sweep. Specifically, adaptive momentum allows deleterious mutations to accumulate within the advantaged subpopulation along the leading edge. This temporarily expanded mutant cloud increases genetic exploration, accounting for an observed increase in the rate of adaptation. Figure 5.2: Starting layout for experiment II populations, with three clearly-defined sections: NP “post-sweep” mutant organisms at peak p2 (left), NW “wild type” organisms at peak p1 (right), and 8 “leading edge” organisms at a treatment-specific position in the valley past p2 (middle). We created idealized scenarios to study the dynamics of adaptive momentum as they unfold. Each population had organisms on p2 sweeping across organisms on p1, with a well-defined leading edge (Fig. 5.2). Experimental treatments used all combinations of how far into the next fitness valley the leading edge started (from x = p2 to x = p2 + 5), and how far the sweep had progressed 81 …Mutant Type (NP orgs)Leading Edge (8)Wild Type (NW orgs)Sweep… across the population (from NP = 0, at the beginning of a sweep, to NP = 504 at the end.) We ran each replicate for 768 − NP total generations; the reduced number for larger NP was used to make the comparison fair, subtracting off the minimum time it could have taken to establish NP post-sweep organisms. For each condition, we measured how often replicates crossed the next valley to reach p3. We recorded the number of replicates that successfully crossed to p3 in each treatment. This measurement provided an expectation of a population’s potential to cross the valley based only on the initial state of the leading edge. Additionally, we also ran “shuffled” controls under otherwise identical conditions, but where we removed population structure (and thus the notion of a leading edge) by randomly shuffling the organisms before each evolutionary replicate. Experiment III: Analytic replays of evolved lineages Finally, we focused on the treatment from Experiment II that represented the start of a selective sweep; that is, no post-sweep organisms (NP = 0) and a leading edge that just made it to p2. We ran 500 replicates under these conditions, saving snapshots at each generation, allowing us to perfectly recreate the population at any time point. We randomly selected 10 replicates that failed to reach p3, 10 random replicates that did reach p3 (but not further), and all four replicates that crossed two valleys to reach p4. For each of these 24 replicates, we performed 1000 analytic replays at every fourth generation, restarting evolution from a given time point with different random seeds to investigate the role of chance in determining the distribution of potential evolutionary outcomes (Blount et al., 2018). Next, we selected one representative sample from each of the three categories to study at high resolution, replaying 10,000 replicates from every generation. Using these replays, we recorded the probability that a replicate would cross the valley to p3 or p4 at each time point. These data allow us to calculate how the crossing probabilities changed over time. We also ran 10,000 equilibrium replicates and replayed representative replicates in the same manner. Finally, just as in the shuffle benchmarking experiment, we performed “shuffled” replays on disequilibrium replicates that crossed a single valley. Specifically, we shuffled the population before starting each replay to measure the role of the population’s spatial organization in crossing potential. 82 5.2.3 Data and software availability All code, analyses, summarized data, and figures not included in this work are available in the supplemental material (Ferguson, 2024). The model was built using the Modular Agent Based Evolver version 2.0 (MABE2) (https://github.com/mercere99/MABE2). All analyses and plots were generated using the R statistical computing language version 4.1.2 (R Core Team, 2021) and the ggplot2 (Wickham et al., 2020), dplyr (Wickham et al., 2022), HMisc (Harrell Jr, 2020), tidyr (Wickham and Girlich, 2022) packages. 5.3 Results Figure 5.3: (A) Distribution of the number of generations required to cross valleys in the validation experiment. Relative generations refer to the elapsed time (in generations) for the first cross (pur- ple), and from the first cross to the second cross (orange). The dashed and solid vertical lines show minimum and maximum fixation times, respectively. (B) Benchmarking data showing the potential of a leading edge to cross a valley, with a range of leading edge starting positions and values (see Fig. 5.2). Each point represents 10,000 replicates. (C) Shuffled benchmarking data. Each replicate is shuffled prior to evolution, otherwise identical to center plot. 5.3.1 Validation of adaptive momentum effect First, we measured the time it took for replicates starting from a full population of p2 to cross the valley to p3 and, when relevant, from p3 to p4. Figure 5.3A shows the timing distributions of populations that crossed the first valley over the first 5,000 generations (purple) as well as the timing distributions of second crossings that occur within 5,000 generations of a first (orange). We see that the time of first crossing appear uniformly distributed across the 5,000 generations, while the second crosses are strongly skewed toward shorter time periods. Indeed, of 500,000 replicates, 6,485 crossed the first valley within 5,000 generations (~1.3%) with a mean cross time of ~2,569 83 050100150010002000300040005000Relative generationCountAValleycrossedFirstSecond02550751000100200300400500Leading edge indexPotential to crossB02550751000100200300400500Leading edge indexPotential to crossCLeading edge valuep2p2+1p2+2p2+3p2+4p2+5 generations and a median of 2,602 generations. Based on these values, we would expect roughly 84 replicates to cross twice (6, 485 × 1.3%, or 0.0169% of all 500,000 replicates), but instead we see 902 replicates (~0.18%) cross the second valley, a much higher rate than expected if the probabilities of first and second crossing were equal. In addition to a higher than expected rate of crossing, the mean cross time between first and second crossings is ~579 generations and the median cross time is 401 generations, substantially lower than the first crossing times. Finally, when we consider only those second crossing times that occurred more than 1000 generations after a first crossing, we find that the rate of these second crossings is similar to the first crossing rate (~1.23%). These results comport with the framework of adaptive momentum. They show that an initial beneficial discovery can quickly lead to additional discoveries (during the fixation period), but if the second discovery does not happen before equilibrium is reestablished, the rate of valley crossing is better predicted by the first valley crossing times. 5.3.2 Empirical benchmarks The empirical benchmark data (Fig. 5.3B) illustrate how the initial state of the population af- fects the potential to cross valleys. As expected, steps further into the valley increase the probability of crossing, regardless of where the leading edge is. On the other hand, the probability of crossing decreases as the ratio of p2 organisms (mutant type) increases relative to p1 organisms (wild type) – as the selective sweep progresses, fewer opportunities remain for additional mutations to accu- mulate. While the potential to cross varies considerably with the type of organisms in the leading edge, these data clearly show that all conditions describing early sweep conditions (i.e., having a leading edge and a significant ratio of the remaining population on p1) substantially increase the probability of crossing the valley compared to populations that are close to reaching equilibrium on p2. We use these results to provide a baseline prediction for the analytic replay experiments. 5.3.3 Analytic replay experiments Of our 500 initial replicates to find candidates for replay experiments, four replicates (0.8%) crossed two valleys in the allotted time, 83 crossed exactly one valley (16.6%), and the remaining 413 did not cross any valleys (82.6%). We randomly-sampled 10 no-cross replicates, 10 single-cross replicates, and took all four double-cross replicates to run coarse-grained replays. All plots are available in the supplement (Ferguson, 2024). 84 From these coarse-grained replays, we selected one representative replicate from each category to re-run, conducting 10,000 replay trials at every time point to give a fine-grained view. For each replayed replicate, we show the potential of crossing the next valley over time, paired with a Muller plot (Muller, 1932) of the initial replicate that was replayed. Figures 5.4, 5.5, and 5.6 show the single-cross, double-cross, and no-cross replicates, respectively. In each plot, the replay results are overlaid on an image generated from the benchmark data. To generate the background image, we treat the benchmark data as a lookup function. When we start a replay replicate, we initialize the population using the snapshot from the target generation of the initial replicate. These snapshots show that the leading edge does not perfectly advance one position every generation; there are many generations where the leading edge either fails to advance or is pushed back one position by the wild type organisms. To make this comparison fair, we find the leading edge in that particular snapshot and then look up the corresponding expectation values from the benchmark data. This adjustment ensures that we are comparing against the correct benchmark data regardless of the motion of the leading edge. Overall, the potentiation observed in all three replicates closely match the benchmark expec- tations. The potentiation occasionally increases or decreases suddenly; tracing these changes to the Muller plots typically shows that these events correlate to the gain or loss of a mutation at (or near) the leading edge at that time. For example, the two temporary peaks in potential in Figure 5.4, at roughly generations 250 and 375, can be directly traced to the leading edge temporarily dipping to p2 + 4. The potentiation at any particular step in the valley decreases over time, as the selective sweep progresses and the adaptive momentum window closes. Figure 5.6 shows that, while the replicate had substantial potentiation at times (briefly above 50%), it failed to capitalize before the window closed. This failure was exacerbated by two leading mutations from p2 + 3 to p2 + 2, the second of which dropped the potential from over 25% to under 10%, after which the population never recovers. All three replicates show periods of potentiation higher than what our benchmark data would predict given their leading edge genomic value. The benchmarking data modeled a leading edge of eight organisms, but the Muller plots show that the leading edge grows and shrinks over time. The Muller plots show that underprediction is generally associated with either an expanded leading edge or an excess of individuals with lower fitness behind the leading edge. We hypothesize that these 85 underpredictions are generally observed in deeper steps into the valley because of the historical contingency required to reach that point (i.e., to reach p2 + 5 the leading edge must have passed through p2 + 4, which may still exist behind the leading edge). For comparison, we also ran 10,000 replicates that started at equilibrium (i.e., not in a mo- mentum window). Of those, we replayed the one replicate that crossed twice, 10 randomly sampled replicates from 19 that crossed once, and 10 random replicates from the 9,980 that failed to cross. Figure 5.7 shows the replicate that crossed twice. The first crossing in this replicate occurred quite early, crossing the valley on generation 168. However, the potential before crossing is similar to all first-cross replicates: while the potential in an adaptive momentum window starts at ~17%, the potential for replicates outside momentum windows starts at ~0.2%. This low potential continues for the first 143 generations, followed by 16 generations with an average potential of ~1.8%, four generations between 15% and 20%, and then the cross. This pure chance, “all or nothing” poten- tiation is not unique to this replicate; the same dynamic can be seen in the ten other successful replicates we analyzed (Ferguson, 2024). Finally, we also show the potentiation of crossing the second valley, which in some cases is real- ized (Figs. 5.5 and 5.7), and other times is not (Figs. 5.4 and 5.6). As potentiation is probabilistic in nature, the potential to cross the second valley is the probability of crossing the first valley from the current state of the population times the probability of crossing a second valley from a naïve starting position (i.e., the elevated potentiation of a population at the beginning of a sweep). We see this dynamic early in the replays – increases in first-cross potential are reflected, at smaller scales, in second-cross potential, with a much larger increase upon successful crossing of the first valley in Figures 5.4 and 5.5. This result is consistent with the adaptive momentum framework, which posits that the discovery of a new peak will initiate a new adaptive momentum window. Strikingly, this same dynamic holds true for Figure 5.7 – while the replicate did not start in a momentum window, the first cross creates a window which increases the potential of the second cross. 5.3.4 Shuffled population experiments To test the importance of population structure for potentiation, we repeated the benchmarking and replay experiments with shuffled populations. In both cases, we kept the experiments identical except for an additional shuffle step: before starting a replicate, we shuffled the order of organisms in the population and then proceeded with evolution as normal. 86 The shuffled benchmark data in Figure 5.3C shows that only populations with a leading edge of p2 + 5 are able to maintain greater than a 10% chance to cross if the leading edge has swept more than half the population. Two differences can decrease crossing potential: 1) multiple leading edges can form, allowing faster fixation and 2) the eight leading-edge organisms are more likely to encounter higher-fitness mutant-type organisms and thus be purified faster. A representative sample from the single-cross replicates is shown in Figure 5.8. The replay data indicate that potential to cross remains relatively low, never reaching a 20% chance, until a critical mass of nearly-crossed organisms skyrockets the potential from 9.2% to 100%. The earlier spikes in potentiation always correspond to the appearance of a single organism that is one step away from crossing the valley, which was then lost in the original population. These trends are consistent across all 10 replicates that were replayed (Ferguson, 2024). 87 Figure 5.4: Top plot: Analytic replay data for a representative replicate that crossed one valley during a momentum window, overlayed on baseline data. Lines show the probability of crossing both the first valley (orange line; initially-higher) and the second valley (yellow line; initially-lower). The background shows the expected potential to cross (as shown in Fig. 5.3B); data here are shifted to align with the realized leading edge position. Bottom plot: A Muller plot of all organisms in the original 1D population over time. Dark colors show descent into valleys, with hue identifying the valley being crossed. In both plots, vertical dashed lines show initial valley crosses. Colors in the legend apply to all plots. 88 02550751000200400600Time (generations)Potential to cross01002003004005000200400600Time (generations)Index in populationp1p2p3p4 Figure 5.5: The analytic replay data for a representative replicate that crossed two valleys during a momentum window. See Figure 5.4 for details and legend. 89 02550751000200400600Time (generations)Potential to cross01002003004005000200400600Time (generations)Index in population Figure 5.6: The analytic replay data for a representative replicate that failed to cross a valley during a momentum window. See Figure 5.4 for details and legend. 90 02550751000200400600Time (generations)Potential to cross01002003004005000200400600Time (generations)Index in population Figure 5.7: Top plot: The analytic replay data for the sole replicate that did not start in a momentum window, but still managed to cross a valley, and indeed crossed twice. The red line shows the potential to cross the first valley, while the orange line shows the potential to cross the second valley, exhibiting the hallmarks of adaptive momentum due to the first cross. Bottom plot: a Muller plot of the original population, like in Figure 5.4. 91 02550751000200400600Time (generations)Potential to cross01002003004005000200400600Time (generations)Index in population Figure 5.8: The black line (bottom) shows the potential for a shuffled population to cross a valley. The orange line (top) shows the standard potential for the population to cross (same data as Fig. 5.4). The shuffled line consists of 1,000 samples every 4 generations. 92 02550751000200400600Time (generations)Potential to cross 5.4 Discussion and Conclusion 5.4.1 Potentiation exhibits adaptive momentum In this work, we have corroborated adaptive momentum’s benefit to valley crossing (as outlined in (Bohm et al., 2024)) and expanded our understanding of the dynamic. Our initial experiments demonstrated that our system can undergo adaptive momentum, and that disequilibrium is the key driver. However, our main goal in this paper is to provide an alternate vantage point from which to view the dynamics of adaptive momentum. While the original paper validated the effect via aggregated data, here we analyze the underlying dynamics in action on individual populations by quantifying potentiation via replay experiments. We have shown that populations outside of momentum windows must rely on chance alone to cross a valley. During these “equilibrium” periods, early mutations into the valley, although required for a valley crossing, have no substantial impact on the population’s probability of crossing. Conversely, populations in momentum windows immediately see a drastically higher chance to cross, and every mutation in the leading edge is either potentiating or anti-potentiating, depending on the direction. This result is highlighted in Figure 5.7, where the first cross was pure chance while the second cross was driven by adaptive momentum. This work is only a beginning; future work should apply these techniques to examine the role of adaptive momentum in more complex and realistic fitness landscapes. 5.4.2 The leading edge in a spatial selective sweep determines potentiation The adaptive momentum framework posits that disequilibrium in a population can reduce the selection pressure on the advantaged subpopulation (Bohm et al., 2024). In spatial populations, disequilibrium is focused at the leading edge of selective sweeps, where advantaged mutants encroach on the wild type. Indeed, we observe that the genotype of the organism at the leading edge of the sweep is a strong predictor of potentiation. Moreover, as we can see in the Muller plots, there are often mutated organisms lagging the leading edge. When the leading edge is further into the valley, those lagging organisms have a non-negligible chance to accumulate sufficient mutations to finish crossing the valley, thus increasing the potential. We aimed to create the simplest system for studying valley crossing in spatial populations. In these one-dimensional populations, our artificially-started sweeps have exactly one leading edge. These edges become more complex in two-dimensional digital systems and only increase in com- 93 plexity moving toward more natural systems. While the identification and measurements of the leading edge may become more difficult, we expect similar dynamics to hold. It is critical that we continue to improve our understanding of adaptive momentum in these simple systems to build a solid theoretical foundation. 5.4.3 Population heterogeneity and structure affect potentiation Previous experiments have conducted analytic replays starting from clonal populations (Blount et al., 2008; Ferguson and Ofria, 2023). Here, however, we used perfect population snapshots that record every organism in every generation. Our technique provides a more fine-grained look into how potentiation changes. Furthermore, previous work has often used the most abundant genotype at a time point to seed replay experiments. Across all of our experiments, the most abundant genotype was always on one of the peaks. Replays looking at the potential of the first cross would see effectively zero probability for p1 and p2 and 100% probability for p3 and p4, which have already crossed, missing all nuance and change in this potential. The nuance gained by replaying population snapshots provides important insight into using analytic replay experiments, and puts the large jumps in potentiation seen in previous work into question (Ferguson and Ofria, 2023). While those large jumps in potentiation are valid, they are likely missing important intermediate genotypes or population dynamics. In effect, they show genetic effects isolated from effects of population structure. Future work looking to leverage replay experiments should be aware that population structure and composition can affect evolutionary outcomes, and should thus carefully consider how replays are initialized. Further, by shuffling the population snapshots we have shown that it is not just the portion of the population with each genotype that matters, but also the structural relationships and interac- tions among the organisms (Figure 5.8). Other computational studies will likely be needed to tease apart when and how this organization matters as perfectly preserving structure and composition of natural populations for so many replicates is currently impossible. As such, we should leverage com- putational models to develop techniques possible in both digital and natural populations, possibly finding a middle ground between a single clonal sample and full population snapshots. 5.4.4 Outlook This work not only provides evidence to support our understanding of adaptive momentum, but further clarifies the underlying mechanisms. By quantifying the potentiation of valley crosses and 94 relating this measure to population composition and structure, we have provided insights into the historical contingencies and long-term trajectories of populations experiencing adaptive momentum. Further, we advance the methodology of analytic replay experiments by conducting them at a greater scale than previously seen, demonstrating a new use case, and leveraging perfect population snapshots to seed the replay experiments. These advances combine to show that replay experiments can extend genetic potentiation to include the effects of population dynamics. All together, this work constitutes both a step toward better understanding adaptive momentum and a methodological refinement of replay experiments for understanding historical contingency. 95 Chapter 6 Conclusions and outlook This dissertation constitutes an important step in understanding how historical contingency can shape long-term evolutionary outcomes. Using digital evolution, I have shown that the evolution of phenotypic plasticity can act as an evolutionary stabilizer, negating the effects of environmental variation on future evolutionary dynamics. By leveraging analytic replay experiments, I have shown that a single mutation can shift the likelihood of an evolutionary outcome from incredibly unlikely to almost certain. Further, I have demonstrated that the same replay techniques can be used to clearly illustrate basic evolutionary dynamics via illuminating the long-term impacts of short-term changes. There remains an enormous amount work to fully understand the role of history in evolution. Thankfully, however, this work has given me some insight into how we think about historical contin- gency, potentiation, and replay experiments, as well as the future work that should be done to further our understanding. This chapter is dedicated to recapitulating the lessons learned throughout this dissertation and discussing how we can build upon them to continue enhancing our understanding of historical contingency in evolution. 6.1 Contributions Here I summarize the contributions of each research chapter. In Chapter 2 I experimentally measured the impact of evolved phenotypic plasticity on fur- ther evolution. I used the digital evolution software Avida to evolve both plastic and non-plastic populations in a cyclically changing environment. Once evolved, I moved populations to three novel environments to observe the difference that plasticity made in each. First, I found that phenotypic plasticity slows the rate of evolutionary change compared to the non-plastic populations that must continuously adapt to environmental changes. Second, I showed that plastic populations accumu- lated novel beneficial traits at the same rate as non-plastic populations, but plastic populations were able to retain the traits while non-plastic populations continuously lost them. Finally, I examined how, in their continuous struggle to adapt to the environment, non-plastic populations accumulate deleterious traits significantly more often than plastic populations. Overall this chapter identified how the evolution of one trait, phenotypic plasticity, shifts the dynamics of further evolution by drastically reducing the impact of environmental changes. 96 Next, in Chapter 3, I quantified how the likelihood of evolving associative learning (i.e., the potentiation of associative learning) changed over the course of four case study lineages. To do this, I employed “analytical replay experiments”, restarting evolution from various points along each lineage. Across the four lineages, I found substantial single-step increases in potentiation, ranging from 36 to 64 percentage points. I then analyzed the individual mutations that increased potentiation, finding a variety of dynamics in these potentiating mutations. In this chapter, I demonstrated that digital evolution can be used to perform studies on potentiation dynamics that are intractable in living organisms. In this case I used replay experiments to track down individual potentiating mutations. I then extended this work in Chapter 4, analyzing the potentiation dynamics of 50 new lineages that were able to evolve associative learning. I found that the large single-step potentiation increases in Chapter 3 were not a fluke. While the distribution of all observed single-step potentiation increases was centered around zero, the maximum per-replicate increase was significant for all 50 replicates. Indeed, looking at the distribution of maximum per-step potentiation increase across the 50 replicates, I found a range of [20, 88], a mean of approximately 42, and a median of 38 percentage points. I also replayed ten lineages that did not evolve learning, and in each I found significant single- step increases in the potentiation of the most likely behavior at the end of evolution. Finally, I found that in 44 of 50 replicates the largest potentiation gain was caused by a single mutation, and all but one of these mutations were either beneficial or neutral, with 15 mutations causing no change to the organism’s phenotype. Chapter 5 pivots to using replay experiments and potentiation to illustrate the adaptive mo- mentum effect in a simple digital model. The adaptive momentum framework posits that disequilib- rium, such as a selective sweep, can temporarily increase evolutionary exploration. By quantifying the potentiation of crossing a deleterious fitness valley, I show that early mutations into the valley drastically increase potentiation when the population is in disequilibrium, yet have no detectable effect when the population is in equilibrium. By performing replays with the organisms in the population shuffled at random before starting, I show that population structure plays a critical role in this scenario. This work also demonstrates how replaying population snapshots instead of clonal populations can improve the granularity of the potentiation dynamics, and in fact is especially relevant to the dynamics of adaptive momentum. 97 All together, these chapters 1) advance our understanding of the role that history plays in evolution 2) showcase the power of replay experiments in finding critical points in a population’s history and 3) demonstrate the advantage of using digital models to test and refine methodologies in evolutionary biology before applying them to natural organisms in the lab. 6.2 Insights for wet lab studies While conducting the research outlined in this dissertation, I have had time to reflect on how the outcomes of these studies – all of which are digital – can be transferred to natural organisms and wet lab studies of their evolution. This question is, by far, the most common question I receive from evolutionary biologists who do not work with digital organisms themselves. As an initial summary, you can consider these digital studies as a rapid form of prototyping and benchmarking. These results are valid, and should be considered when trying to understand the role of history in evolution. However, my overall goal for my ongoing digital research is two-fold: First, I wish to refine the methodologies behind techniques such as replay experiments. Studies leveraging these techniques are costly; evolving microbial population can take months or years for a single experiment. By trying these techniques at a larger scale, I hope to identify time-saving optimizations, identify possible pitfalls, and develop meaningful analyses. This way, future wet lab studies can conduct similar studies more efficiently and confidently. For example, Chapters 3 and 4 have demonstrated that a two-phase replay experiment can be used to find potentiating steps while saving substantial amounts of time. Second, I hope to provide a baseline that future studies – be they in vitro or in silico – can compare against. The need for these benchmark data is evident in this dissertation. Upon completion of Chapter 3, while I was confident in the experimental setup and the evolution system, I was surprised and a mildly concerned by how large the potentiation increases were. This result was my original motivation behind conducting the scaled-up experiments for Chapter 4; to ensure that observing such chances was not a probabilistic anomaly. Unfortunately, conducting replay replicates at the single-mutation resolution for that many replicates, is currently impossible in wet lab systems. As such, this work can serve as the first large-scale benchmark, allowing for comparisons until such large studies are possible to conduct with natural organisms. Over the course of my PhD, I have shifted from considering myself mainly a “computer scientist” to primarily a “computational evolutionary biologist”. With this mindset, the goal in my research 98 has become to facilitate a synergistic interplay between wet lab experimental evolution and digital evolution. Wet lab researchers are limited in what experiments they can conduct due to time, money, and experimental control. Digital evolution researchers are limited in what their studies can say about evolution in the natural world. By capitalizing on the strengths of each discipline, we can continue to cover the challenges of the other. 6.3 Future work We will first consider the obvious extensions to this work. Foremost, while the chapters of this dissertation dive deep into the role that history plays in evolution, I have analyzed only three experimental setups (two related environments in Avida and a simple one-dimensional integral landscape). We clearly need to expand this work to other types of organisms and environments. My long-term hope is for adventurous and determined wet lab evolutionary biologists to conduct more studies in this vein. However, I have barely begun to scratch the surface of digital evolution models. As such, here I discuss two axes we should vary: the digital evolution model and the task or environment within it. We can also vary the parameters of the system (e.g., the mutation rate, population size, etc), which I discuss in detail below. There exist dozens of mature digital evolution, evolutionary computation, and similar software models in use today. One particular avenue of exploration is to see how potentiation dynamics change between these different models. For example, Chapters 3 and 4 study the evolution of associative learning in Avida using a path following task. Would the potentiation of path-following associative learning in Markov Brains (Hintze et al., 2017) or PushGP (Spector and Robinson, 2002) follow similar trends to those observed in Avida? The solutions would certainly look quite different, but future experiments are needed to empirically compare the potentiation dynamics. Even if we limit ourselves to studies within Avida, potentiation dynamics may drastically differ across environments. Would we see large increases in potentiation of optimal plasticity in the plasticity environment from Chapter 2 or in the potentiation of the logic trait EQU in the classic “logic 9” environment (Ofria and Wilke, 2004)? Comparison across these environments is favorable: we could use roughly the same instruction set (barring environment-specific instructions), population size, mutation operators, etc. I find it likely, however, that potentiation dynamics would still differ. As discussed in Chapter 4, I hypothesize that potentiation is influenced by the optimization needed after the first versions of a behavior appear, which is possible, for example, with suboptimal 99 plasticity. In the “logic 9” environment, some logic tasks are building blocks of others. It is possible their potentiation is thus highly correlated, but it is also possible that the potentiation for a simple task actually decreases when it appears, as it actualizes in a way that makes it likely to be lost as the more complex tasks evolve. Below I detail more specific ideas that have arisen during the creation of this thesis. 6.3.1 On the multiplicative nature of potentiation Chapters 3 and 4 identify “potentiating mutations” that caused the largest increases in po- tentiation for a given lineage. However, this is only one interpretation of the “largest increase” in potentiation, focusing on the absolute percentage point change. We could, however, shift to a relative view of potentiation. To illustrate the difference in these two values, imagine we have two mutations. The first mutation increases potentiation from 10% to 40%, while the second increases potentiation from 40% to 80%. Under our current framing, the second mutation is our largest potentiating mutation, as it increases potentiation by 40 percentage points (compared to 30 for the first mutation). However, an alternate framing is that the first mutation quadrupled the probability of evolving our focal trait, while the second mutation only doubled the probability. To begin contemplating the ramifications of using absolute vs relative potentiation change, we can consider the simple fitness landscape in Figure 6.1. If we assume a scenario with a low mutation rate and strong selection pressure, since fitness increases from the root node to the leaves, evolutionary theory dictates that all populations should evolve to a leaf node. Depending on the starting node, the population can encounter branching nodes over the course of evolution. Since each branching node has two neighboring nodes with higher yet equal fitness, we expect the population to go down each branch with 50% probability. Here we consider the potentiation of the population evolving to reach the star node. In this example, potentiation for the star node from nodes between node B and the star are all approximately 100%. Potentiation at the node B is expected to be 50%, as the population will either evolve down the left or right branch. Similarly, potentiation is expected to be 25% at node A and 12.5% at the root node. Thus, in terms of absolute potentiation differences along a trajectory from the root to the target node, potentiation will increase most moving from node B to the left at a 50% increase. However, if we instead look at relative increases, moving off of the root, node A, and node B are all the same, doubling potentiation. This difference is demonstrated in Figure 6.2 100 Figure 6.1: A hypothetical fitness landscape in the form of a binary tree. Each node represents a particular genotype, and edges represent available mutations. Branch points, which have two higher-fitness neighbors, are shown larger than other nodes. The root node is shown in red, and two nodes are marked (A and B) for the in-text example. We can consider potentiation relative to the star, which can be reached with a series of mutations that use the left option at each branch point. Considering these differences as absolute or relative are both valid approaches, in the same way that other measures such as epistasis can be measured either additively or multiplicatively. In Chapters 3 and 4 I took an absolute approach as I wanted to analyze the overall potentiation of associative learning and how it changed. While the “decision points” encountered by evolution are interesting, they were not my main focus. What did prove relevant, however,is that because those experiments were conducted in Avida, I was limited to the resolution of my data. I strained the computation resources available in order to do 50 replays for each potentiation measurement, meaning that I had a base resolution of 2% intervals, before even considering noise. A 5% to 10% increase can be significant and impactful, but the number of replicates needed to determine that difference with a high degree of statistical confidence were computationally infeasible. Future work should consider both approaches however, as large relative increases in potentiation might happen much earlier in the population’s history and might provide additional insight. 101 AB12345678910111213Fitness Figure 6.2: Empirically measured potentiation of nodes from the fitness landscape in Figure 6.1. Subplot A shows potentiation for each node on the left side of the tree. Potentiation for a node is calculated as the percentage of replicates (out of 10,000) started with a full population of 100 organisms on that node that evolve to the target node after 500 generations of roulette selection. Subplot B shows the absolute potentiation change when moving to each node, calculated as the dif- ference between potentiation measured from the current position and the previous position. Subplot C shows the relative potentiation change, calculated as the potentiation measured at the current position divided by potentiation measured at the previous position. 102 0255075100RootABTargetPotentiationA0255075100RootABTargetAbsolutepotentiation changeB1.001.251.501.752.00RootABTargetReplayed nodeRelativepotentiation changeC 6.3.2 Understanding factors that influence potentiation In Chapter 3, I detailed the inner mechanics of the digital organisms and how the potentiating mutations paved the way for the evolution of learning. However, which dynamics factor into poten- tiation are still poorly understood. For example, my initial hypothesis was that most potentiating mutations would be deleterious or neutral, as beneficial mutations are likely to get selected and thus unlikely to influence the overall probability that the focal trait evolves. Chapter 4 provided strong counter evidence, with 49 of 50 potentiating mutations being either beneficial or having no significant fitness effect. This raises the question: what other factors are at play in potentiation? Three factors I want to draw attention to are the size of the mutational neighborhood, the mutation rate, and the size of the population. Combined with the topography of the fitness land- scape, these factors dictate the likelihood that a given mutation will appear. Specifically, as the number of neighbors increases, the mutation rate decreases, or the population size decreases, it becomes less likely that any given mutation will ever occur. As such, even the mutation with the highest possible fitness can be potentiating, as there is a chance it never appears and thus selection cannot act upon it. Additionally, even if a highly beneficial mutation appears, there is always the chance that it is lost to genetic drift. Future studies should examine how varying each of these parameters in isolation changes the potentiation dynamics. In designing the experiment in Chapter 3, I chose to use Avida partially because I was unsure how much complexity was needed to observe interesting potentiation changes. Chapter 5 has demonstrated that this complexity is not required. While future work should observe potentiation in other complex systems, potentiation should also be studied in the simplest possible systems to conduct controlled studies like the ones mentioned here. 6.3.3 Improving evolutionary computation This dissertation is focused on improving our understanding of evolution as a process and the refinement of techniques to be applied back to wet lab studies. However, these techniques and ideas can also be applied to the problem-solving field of evolutionary computation (EC). In order to improve efficiency and problem-solving ability, EC practitioners are constantly refining their evolving systems. Even beyond the initial design phase, most systems have a collection of parameters and operators to fine-tune the system for the particular task at hand. Optimizing the 103 design decisions and parameter values of these systems is a continuous effort in EC research. How might a better understanding of historical contingency and potentiation improve our un- derstanding in these systems? Let us imagine that we want to test the effects of adding new mutational operators to our EC system. A proper first step would be to run a two treatment exper- iment: one batch of experiments that have access to these new operators and a control treatment that does not. With enough replicates, we should see if our changes improve the problem solving success of our system. Indeed, we could dive further and test these new operators in isolation or in combination, to disentangle which operators improve our success. To consider the benefit of adopting a history-based mindset, we must switch from asking if these new mutational operators improve our problem solving success to asking how the new operators improve our evolutionary search. We can, and often do, make inferences about what events might have been critical to a replicate’s success. For example, we might identify a gene duplication event occurring shortly before the focal trait evolves. However, only by experimentally testing the effect of accumulated history can we verify which specific events were pivotal in the population’s ultimate success and which were mere coincidence. In our example, the gene duplication mentioned above might have helped, but it also may have followed another mutation whose importance is harder to detect. It is this interplay of dynamics that we aim to clarify. By leveraging these techniques, we may be able to nail down interactions between constituent parts, such as our mutation operators, that are beneficial and that we want to encourage via our design choices and parameters settings. However, with this increased analytic power comes an increase in computational requirements. Techniques such as replay experiments have the potential to deliver critical insights into EC systems, but they are too computationally expensive to conduct regularly. As such, future work should aim to identify cases where these techniques are most useful, so practitioners can leverage them when they are the correct tool for the job. 6.3.4 Quantifying evolvability Generally, analytic replay experiments have focused on the potentiation of beneficial traits. This leads to a simple observation: if the focal trait is beneficial, an increase in its potentiation is likely to correspond to an increase in the long-term fitness of the lineage, depending on the alternative trajectories that could have been taken. Though we lack an agreed upon definition of evolvability (Pigliucci, 2008), here I use the definition of Payne and Wagner (2019): “the ability of 104 a biological system to produce phenotypic variation that is both heritable and adaptive”. Thus, I argue that an increase in average long-term fitness constitutes an increase in evolvability. I am not the first to combine the ideas of evolvability and historical contingency. Woods et al. (2011) were among the first to leverage analytic replay experiments, and they used these replays to test the long-term evolvability of two clades of E. coli. In the original replicate, the clade that eventually won had lower fitness at the time of sampling. Replay experiments started from that original point demonstrated that the “eventual winner” clade consistently evolved higher fitness despite its lower initial fitness – it was more evolvable. In conducting multiple replay experiments, I have realized that these systems are ripe for further studies in evolvability. We must first consider the concept of “evolvability-enhancing muations” as introduced by (Wagner, 2023). Evolvability-enhancing mutations are defined as mutations whose one-step mutational neighborhood has a higher average fitness than the one-step neighborhood of the wild type. As noted by the author, we could also define these evolvability-enhancing mutations using a two-step, three-step, or larger mutational neighborhood. Increasing the range of the mutational neighborhood would overcome issues such as evolvability-enhancing mutations that lead to local fitness peaks, limiting long-term fitness. Unfortunately, performing a N-step mutant analysis becomes intractable on most fitness land- scapes for, at most, N > 2. Thankfully, we can make a simplification: what if we only consider the mutational trajectories that are most likely? This is exactly what replay experiments do; each replay replicate samples the fitness landscape that is reachable from the starting condition in a given amount of time. While we are only sampling a fraction of the possible trajectories, the sample should be representative of what will evolve from the starting condition. Finally, conducting these studies requires little to no additional work. If a replay experiment is conducted to observe the potentiation dynamics of a lineage for a particular trait, the only additional step is to also evaluate fitness at the end of each replay replicate. In Chapter 4, I had to quantify fitness in order to determine if the replay replicates evolved learning. As such, these data already contain information on evolvability. Unfortunately, a full analysis of these data are out of scope for this dissertation. A preliminary examination is shown in Figure 6.3, exhibiting the long-term fitness of the replays from two lineages. One lineage, which had only a 20 percentage point increase in potentiation, sees little change in the long-term fitness at the potentiating step. The second lineage, 105 Figure 6.3: Two examples of evolvability data from the targeted replays of Chapter 4. A boxplot is shown for each depth replayed in the targeted replays, corresponding to the average fitness at the end of each of the 50 replicates replayed from that depth. The red boxplot shows the potentiating step. Two lineages are shown, one with a small potentiation increase (20 percentage points) and little change in long-term fitness, and another with a large potentiation increase (88 percentage points) and a stark increase in long-term fitness with the potentiating step. which had the largest single-step potentiation increase (88 percentage points), sees a substantial increase in long-term fitness. When we consider these results through the lens of potentiation, we see that the higher-fitness trajectories were always present, potentiating mutations are simply 106 Smallest potentiation increaseLargest potentiation increase010203040500.050.100.150.050.100.15Depth (relative to start of window)Average fitness biasing evolution toward these high-fitness areas, and thus, increasing evolvability. Future work should further explore these dynamics; my hypothesis is that the increase in potentiation and the increase in long-term fitness should be correlated in systems where the potentiation trait being measured represents a highly beneficial adaptation. 107 BIBLIOGRAPHY Ahlmann-Eltze, C. and Patil, I. (2021). Ggsignif: Significance Brackets for ’Ggplot2’. Allaire, JJ., Xie, Y., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., Wickham, H., Cheng, J., Chang, W., and Iannone, R. (2020). Rmarkdown: Dynamic Documents for R. Allen, M., Poggiali, D., Whitaker, K., Marshall, T. R., and Kievit, R. A. (2019). Raincloud plots: A multi-platform tool for robust data visualization. Wellcome Open Research, 4:63. Ancel, L. W. (2000). Undermining the Baldwin Expediting Effect: Does Phenotypic Plasticity Accelerate Evolution? Theoretical Population Biology, 58(4):307–319. Barrett, R. and Schluter, D. (2008). Adaptation from standing genetic variation. Trends in Ecology & Evolution, 23(1):38–44. Barton, N. H. (2000). Genetic hitchhiking. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 355(1403):1553–1562. Beaumont, H. J. E., Gallie, J., Kost, C., Ferguson, G. C., and Rainey, P. B. (2009). Experimental evolution of bet hedging. Nature, 462(7269):90–93. Bedhomme, S., Lafforgue, G., and Elena, S. F. (2013). Genotypic but not phenotypic historical contingency revealed by viral experimental evolution. BMC Evolutionary Biology, 13(1):46. Blattner, F. R., Plunkett III, G., Bloch, C. A., Perna, N. T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J. D., Rode, C. K., Mayhew, G. F., et al. (1997). The complete genome sequence of Escherichia coli K-12. science, 277(5331):1453–1462. Blount, Z. D., Barrick, J. E., Davidson, C. J., and Lenski, R. E. (2012). Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature, 489(7417):513–518. Blount, Z. D., Borland, C. Z., and Lenski, R. E. (2008). Historical contingency and the evolution of a key innovation in an experimental population of Escherichia coli. Proceedings of the National Academy of Sciences, 105(23):7899–7906. Blount, Z. D., Lenski, R. E., and Losos, J. B. (2018). Contingency and determinism in evolution: Replaying life’s tape. Science, 362(6415):eaam5979. Bohm, C., Ragusa, V. R., Ofria, C., Lenski, R. E., and Adami, C. (2024). Reduced selection during sweeps lead to adaptive momentum on rugged landscapes. bioRxiv : the preprint server for biology. Boyer, S., Hérissant, L., and Sherlock, G. (2021). Adaptation is influenced by the complexity of en- vironmental change during evolution in a dynamic environment. PLOS Genetics, 17(1):e1009314. Braught, G. and Dean, A. (2007). The effects of learning on the roles of chance, history and In Proceedings of the 3rd Australian Conference on adaptation in evolving neural networks. Progress in Artificial Life, ACAL’07, pages 201–211, Berlin, Heidelberg. Springer-Verlag. Bryson, D. M. (2012). The Evolutionary Potential of Populations on Complex Fitness Landscapes. PhD thesis, Michigan State University. 108 Bundy, J. N., Ofria, C., and Lenski, R. E. (2021). How the footprint of history shapes the evolution of digital organisms. Preprint, Evolutionary Biology. Burke, E. K., Newall, J. P., and Weare, R. F. (1998). Initialization Strategies and Diversity in Evolutionary Timetabling. Evolutionary Computation, 6(1):81–103. Buskirk, S. W., Peace, R. E., and Lang, G. I. (2017). Hitchhiking and epistasis give rise to cohort dy- namics in adapting populations. Proceedings of the National Academy of Sciences, 114(31):8330– 8335. Canino-Koning, R., Wiser, M. J., and Ofria, C. (2016). The Evolution of Evolvability: Changing Environments Promote Rapid Adaptation in Digital Organisms. In Proceedings of the Artificial Life Conference 2016, pages 268–275, Cancun, Mexico. MIT Press. Canino-Koning, R., Wiser, M. J., and Ofria, C. (2019). Fluctuating environments select for short-term phenotypic variation leading to long-term exploration. PLOS Computational Biol- ogy, 15(4):e1006445. Canty, A. and Ripley, B. D. (2019). Boot: Bootstrap R (S-plus) Functions. Card, K. J., LaBar, T., Gomez, J. B., and Lenski, R. E. (2019). Historical contingency in the evo- lution of antibiotic resistance after decades of relaxed selection. PLOS Biology, 17(10):e3000397. Card, K. J., Thomas, M. D., Graves, J. L., Barrick, J. E., and Lenski, R. E. (2021). Genomic evolu- tion of antibiotic resistance is contingent on genetic background following a long-term experiment with Escherichia coli. Proceedings of the National Academy of Sciences, 118(5):e2016886118. Chadwick, W. and Little, T. J. (2005). A parasite-mediated life-history shift in Daphnia magna. Proceedings of the Royal Society B: Biological Sciences, 272(1562):505–509. Chan, B. K., Sistrom, M., Wertz, J. E., Kortright, K. E., Narayan, D., and Turner, P. E. (2016). Phage selection restores antibiotic sensitivity in MDR Pseudomonas aeruginosa. Scientific Re- ports, 6(1):26717. Chevin, L.-M. and Lande, R. (2010). When do adaptive plasticity and genetic evolution prevent ex- tinction of a density-regulated population? Evolution; international journal of organic evolution, 64(4):1143–1150. Chevin, L.-M., Lande, R., and Mace, G. M. (2010). Adaptation, Plasticity, and Extinction in a Changing Environment: Towards a Predictive Theory. PLoS Biology, 8(4):e1000357. Childs, D. Z., Metcalf, CJE., and Rees, M. (2010). Evolutionary bet-hedging in the real world: Em- pirical evidence and challenges revealed by plants. Proceedings of the Royal Society B: Biological Sciences, 277(1697):3055–3064. Clune, J., Ofria, C., and Pennock, R. T. (2007). Investigating the Emergence of Phenotypic Plas- ticity in Evolving Digital Organisms. In Almeida e Costa, F., Rocha, L. M., Costa, E., Harvey, I., and Coutinho, A., editors, Advances in Artificial Life, volume 4648, pages 74–83. Springer Berlin Heidelberg, Berlin, Heidelberg. Covert III, A. W., Lenski, R. E., Wilke, C. O., and Ofria, C. (2013). Experiments on the role of deleterious mutations as stepping stones in adaptive evolution. Proceedings of the National Academy of Sciences, 110(34):E3171–E3178. 109 Crispo, E. (2007). The Baldwin effect and genetic assimilation: Revisiting two mechanisms of evolutionary change mediated by phenotypic plasticity. Evolution; international journal of organic evolution, 61(11):2469–2479. Cummins, E. A., Snaith, A. E., McNally, A., and Hall, R. J. (2021). The role of potentiating mutations in the evolution of pandemic Escherichia coli clones. European Journal of Clinical Microbiology & Infectious Diseases. Darwin, C. (1859). On the Origin of Species by Means of Natural Selection. Murray, London. Dolson, E., Lalejini, A., Jorgensen, S., and Ofria, C. (2020). Interpreting the Tape of Life: Ancestry- Based Analyses Provide Insights and Intuition about Evolutionary Dynamics. Artificial Life, 26(1):58–79. Douglas, S. M., Chubiz, L. M., Harcombe, W. R., and Marx, C. J. (2017). Identification of the potentiating mutations and synergistic epistasis that enabled the evolution of inter-species coop- eration. PLOS ONE, 12(5):e0174345. Dunlap, A. S. and Stephens, D. W. (2014). Experimental evolution of prepared learning. Proceedings of the National Academy of Sciences, 111(32):11750–11755. Dunn, A. M., Hogg, J. C., Kelly, A., and Hatcher, M. J. (2005). Two cues for sex determination in Gammarus duebeni : Adaptive variation in environmental sex determination? Limnology and Oceanography, 50(1):346–353. Ferguson, A. J. (2023). Supplemental material for chapters 3 and 4. Zenodo and Github. https: //doi.org/10.5281/zenodo.7953692. Ferguson, A. J. (2024). Supplemental material for chapter 5. Zenodo and Github. https://doi.org/ 10.5281/zenodo.11507982. Ferguson, A. J. and Ofria, C. (2023). Potentiating Mutations Facilitate the Evolution of Associative Learning in Digital Organisms. In ALIFE 2023: Ghost in the Machine: Proceedings of the 2023 Artificial Life Conference. MIT Press. Fisher, R. A. (1919). The Correlation between Relatives on the Supposition of Mendelian In- heritance. Earth and Environmental Science Transactions of The Royal Society of Edinburgh, 52(2):399–433. Flores-Moya, A., Rouco, M., García-Sánchez, M. J., García-Balboa, C., González, R., Costas, E., and López-Rodas, V. (2012). Effects of adaptation, chance, and history on the evolution of the toxic dinoflagellate Alexandrium minutum under selection of increased temperature and acidification. Ecology and Evolution, 2(6):1251–1259. Forsman, A. (2015). Rethinking phenotypic plasticity and its consequences for individuals, popula- tions and species. Heredity, 115(4):276–284. Fry, B. G., Vidal, N., Norman, J. A., Vonk, F. J., Scheib, H., Ramjan, S. F. R., Kuruppu, S., Fung, K., Blair Hedges, S., Richardson, M. K., Hodgson, W. C., Ignjatovic, V., Summerhayes, R., and Kochva, E. (2006). Early evolution of the venom system in lizards and snakes. Nature, 439(7076):584–588. 110 Ghalambor, C. K., Angeloni, L. M., and Carroll, S. P. (2010). Behavior as phenotypic plasticity. In Westneat, D. and Fox, C. W., editors, Evolutionary Behavioral Ecology, pages 90–107. Oxford University Press, New York, NY. Ghalambor, C. K., Hoke, K. L., Ruell, E. W., Fischer, E. K., Reznick, D. N., and Hughes, K. A. (2015). Non-adaptive plasticity potentiates rapid adaptive evolution of gene expression in nature. Nature, 525(7569):372–375. Ghalambor, C. K., McKay, J. K., Carroll, S. P., and Reznick, D. N. (2007). Adaptive versus non-adaptive phenotypic plasticity and the potential for contemporary adaptation in new envi- ronments. Functional Ecology, 21(3):394–407. Gibert, P., Debat, V., and Ghalambor, C. K. (2019). Phenotypic plasticity, global change, and the speed of adaptive evolution. Current Opinion in Insect Science, 35:34–40. Gibson, G. and Dworkin, I. (2004). Uncovering cryptic genetic variation. Nature Reviews Genetics, 5(9):681–690. Gifford, D. R., Furió, V., Papkou, A., Vogwill, T., Oliver, A., and MacLean, R. C. (2018). Identifying and exploiting genes that potentiate the evolution of antibiotic resistance. Nature Ecology & Evolution, 2(6):1033–1039. Goldsby, H. J., Knoester, D. B., Ofria, C., and Kerr, B. (2014). The Evolutionary Origin of Somatic Cells under the Dirty Work Hypothesis. PLoS Biology, 12(5):e1001858. Gould, S. J. (1990). Wonderful Life: The Burgess Shale and the Nature of History. WW Norton & Company. Gould, S. J. and Lewontin, R. C. (1979). The spandrels of San Marco and the Panglossian paradigm: A critique of the adaptationist programme. Proceedings of the royal society of London. Series B. Biological Sciences, 205(1161):581–598. Grabowski, L. M., Bryson, D. M., Dyer, F. C., Ofria, C., and Pennock, R. T. (2010). Early Evolution of Memory Usage in Digital Organisms. In ALIFE, pages 224–231. Gullberg, E., Cao, S., Berg, O. G., Ilbäck, C., Sandegren, L., Hughes, D., and Andersson, D. I. (2011). Selection of resistant bacteria at very low antibiotic concentrations. PLoS pathogens, 7(7):e1002158. Gunnell, G. F. and Simmons, N. B. (2005). Fossil Evidence and the Origin of Bats. Journal of Mammalian Evolution, 12(1):209–246. Gupta, A., Zaman, L., Strobel, H. M., Gallie, J., Burmeister, A. R., Kerr, B., Tamar, E. S., Kishony, R., and Meyer, J. R. (2022). Host-parasite coevolution promotes innovation through deformations in fitness landscapes. ELife, 11:e76162. Gupta, A. P. and Lewontin, R. C. (1982). A Study of Reaction Norms in Natural Populations of Drosophila pseudoobscura. Evolution; international journal of organic evolution, 36(5):934. Harrell Jr, F. E. (2020). Hmisc: Harrell Miscellaneous. Harrower, M. and Brewer, C. A. (2003). ColorBrewer.org: An Online Tool for Selecting Colour Schemes for Maps. The Cartographic Journal, 40(1):27–37. 111 Hendry, A. P. (2016). Key Questions on the Role of Phenotypic Plasticity in Eco-Evolutionary Dynamics. Journal of Heredity, 107(1):25–41. Heyduk, K., Moreno-Villena, J. J., Gilman, I. S., Christin, P.-A., and Edwards, E. J. (2019). The genetics of convergent evolution: Insights from plant photosynthesis. Nature Reviews Genetics, 20(8):485–493. Hintze, A., Edlund, J. A., Olson, R. S., Knoester, D. B., Schossau, J., Albantakis, L., Tehrani-Saleh, A., Kvam, P., Sheneman, L., and Goldsby, H. (2017). Markov brains: A technical introduction. arXiv preprint arXiv:1709.05601. Huey, R. B., Hertz, P. E., and Sinervo, B. (2003). Behavioral Drive versus Behavioral Inertia in Evolution: A Null Model Approach. The American Naturalist, 161(3):357–366. Jochumsen, N., Marvig, R. L., Damkiær, S., Jensen, R. L., Paulander, W., Molin, S., Jelsbak, L., and Folkesson, A. (2016). The evolution of antimicrobial peptide resistance in Pseudomonas aeruginosa is shaped by strong epistatic interactions. Nature Communications, 7(1):13002. Kashtan, N., Noor, E., and Alon, U. (2007). Varying environments can speed up evolution. Pro- ceedings of the National Academy of Sciences, 104(34):13711–13716. Kassambara, A. (2021). Rstatix: Pipe-friendly Framework for Basic Statistical Tests. Kawecki, T. J., Lenski, R. E., Ebert, D., Hollis, B., Olivieri, I., and Whitlock, M. C. (2012). Experimental evolution. Trends in Ecology & Evolution, 27(10):547–560. Keller, S. R. and Taylor, D. R. (2008). History, chance and adaptation during biological invasion: Separating stochastic phenotypic evolution from response to selection. Ecology Letters, 11(8):852– 866. Kimura, M. (1968). Evolutionary rate at the molecular level. Nature, 217:624–626. King, J. L. and Jukes, T. H. (1969). Non-Darwinian Evolution: Most evolutionary change in proteins may be due to neutral mutations and genetic drift. Science (New York, N.Y.), 164(3881):788–798. Kortright, K. E., Chan, B. K., Koff, J. L., and Turner, P. E. (2019). Phage Therapy: A Renewed Approach to Combat Antibiotic-Resistant Bacteria. Cell Host & Microbe, 25(2):219–232. Kruskal, W. H. and Wallis, W. A. (1952). Use of Ranks in One-Criterion Variance Analysis. Journal of the American Statistical Association, 47(260):583–621. Lahti, D. C., Johnson, N. A., Ajie, B. C., Otto, S. P., Hendry, A. P., Blumstein, D. T., Coss, R. G., Donohue, K., and Foster, S. A. (2009). Relaxed selection in the wild. Trends in Ecology & Evolution, 24(9):487–496. Lalejini, A. and Ferguson, A. J. (2021a). Data for chapter 2. Available via the Open Science Foundation. https://osf.io/sav2c/. Lalejini, A. and Ferguson, A. J. (2021b). Supplemental material for chapter 2. Zenodo and Github. https://doi.org/10.5281/zenodo.4642704. Lalejini, A., Ferguson, A. J., Grant, N. A., and Ofria, C. (2021). Adaptive phenotypic plasticity stabilizes evolution in fluctuating environments. Frontiers in Ecology and Evolution, page 550. 112 Lalejini, A. and Ofria, C. (2016). The Evolutionary Origins of Phenotypic Plasticity. In Proceedings of the Artificial Life Conference 2016, pages 372–379, Cancun, Mexico. MIT Press. Lalejini, A. and Ofria, C. (2018). Evolving Reactive Agents with SignalGP. In The 2018 Conference on Artificial Life, pages 368–369, Tokyo, Japan. MIT Press. Lande, R. and Arnold, S. J. (1983). The Measurement of Selection on Correlated Characters. Evolution; international journal of organic evolution, 37(6):1210. Lenski, R. E., Ofria, C., Collier, T. C., and Adami, C. (1999). Genome complexity, robustness and genetic interactions in digital organisms. Nature, 400(6745):661–664. Lenski, R. E., Ofria, C., Pennock, R. T., and Adami, C. (2003). The evolutionary origin of complex features. Nature, 423(6936):139–144. Lenski, R. E., Rose, M. R., Simpson, S. C., and Tadler, S. C. (1991). Long-term experimental evo- lution in Escherichia coli. I. Adaptation and divergence during 2,000 generations. The American Naturalist, 138(6):1315–1341. Leroi, A. M., Bennett, A. F., and Lenski, R. E. (1994). Temperature acclimation and competitive fitness: An experimental test of the beneficial acclimation assumption. Proceedings of the National Academy of Sciences, 91(5):1917–1921. Levis, N. A. and Pfennig, D. W. (2016). Evaluating ‘Plasticity-First’ Evolution in Nature: Key Criteria and Empirical Approaches. Trends in Ecology & Evolution, 31(7):563–574. Li, Y. and Wilke, C. O. (2004). Digital Evolution in Time-Dependent Fitness Landscapes. Artificial Life, 10(2):123–134. Losos, J. B., Jackman, T. R., Larson, A., de Queiroz, K., and Rodríguez-Schettino, L. (1998). Science, Contingency and determinism in replicated adaptive radiations of island lizards. 279(5359):2115–2118. Mayr, E. (1983). How to carry out the adaptationist program? The American Naturalist, 121(3):324–334. McGrath, K. M., Russell, S. J., Fer, E., Garmendia, E., Hosgel, A., Baltrus, D. A., and Kaçar, B. (2024). Fitness benefits of a synonymous substitution in an ancient EF-Tu gene depend on the genetic background. Journal of Bacteriology, 206(2):e00329–23. McGregor, S., Vasas, V., Husbands, P., and Fernando, C. (2012). Evolution of Associative Learning in Chemical Networks. PLOS Computational Biology, 8(11):e1002739. Mery, F. and Kawecki, T. J. (2002). Experimental evolution of learning ability in fruit flies. Pro- ceedings of the National Academy of Sciences, 99(22):14274–14279. Meyer, J. R., Dobias, D. T., Weitz, J. S., Barrick, J. E., Quick, R. T., and Lenski, R. E. (2012). Repeatability and contingency in the evolution of a key innovation in phage lambda. Science, 335(6067):428–432. Misevic, D., Ofria, C., and Lenski, R. E. (2010). Experiments with Digital Organisms on the Origin and Maintenance of Sex in Changing Environments. Journal of Heredity, 101(Supplement 1):S46–S54. 113 Moreno, M. A., Yang, C., Dolson, E., and Zaman, L. (2024). Trackable Agent-based Evolution Models at Wafer Scale. Muller, H. J. (1932). Some Genetic Aspects of Sex. The American Naturalist, 66(703):118–138. Nahum, J. R., West, J., Althouse, B. M., Zaman, L., Ofria, C., and Kerr, B. (2017). Improved adaptation in exogenously and endogenously changing environments. In Proceedings of the 14th European Conference on Artificial Life ECAL 2017, pages 306–313, Lyon, France. MIT Press. Nakazawa, M. (2019). Fmsb: Functions for Medical Statistics Book with Some Demographic Data. Neuwirth, E. (2014). RColorBrewer: ColorBrewer Palettes. Ofria, C., Bryson, D. M., and Wilke, C. O. (2009). Avida: A Software Platform for Research in Computational Evolutionary Biology. In Komosinski, M. and Adamatzky, A., editors, Artificial Life Models in Software, pages 3–35. Springer London, London. Ofria, C. and Wilke, C. O. (2004). Avida: A software platform for research in computational evolutionary biology. Artificial life, 10(2):191–229. Oke, K. B., Cunningham, C. J., Quinn, T. P., and Hendry, A. P. (2019). Independent lineages in a common environment: The roles of determinism and contingency in shaping the migration timing of even- versus odd-year pink salmon over broad spatial and temporal scales. Ecology Letters, 22(10):1547–1556. Paaby, A. B. and Rockman, M. V. (2014). Cryptic genetic variation: Evolution’s hidden substrate. Nature Reviews Genetics, 15(4):247–258. Paenke, I., Sendhoff, B., and Kawecki, T. J. (2007). Influence of Plasticity and Learning on Evolution under Directional Selection. The American Naturalist, 170(2):E47–E58. Payne, J. L. and Wagner, A. (2019). The causes of evolvability and their evolution. Nature Reviews Genetics, 20(1):24–38. Pennock, R. T. (2007). Models, simulations, instantiations, and evidence: The case of digital evolution. Journal of Experimental & Theoretical Artificial Intelligence, 19(1):29–42. Pigliucci, M. (2006). Phenotypic plasticity and evolution by genetic assimilation. Journal of Exper- imental Biology, 209(12):2362–2367. Pigliucci, M. (2008). Is evolvability evolvable? Nature Reviews Genetics, 9(1):75–82. Pontes, A. C., Mobley, R. B., Ofria, C., Adami, C., and Dyer, F. C. (2020). The evolutionary origin of associative learning. The American Naturalist, 195(1):E1–E19. Price, T. D., Qvarnström, A., and Irwin, D. E. (2003). The role of phenotypic plasticity in driving genetic evolution. Proceedings of the Royal Society of London. Series B: Biological Sciences, 270(1523):1433–1440. R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Ratcliff, W. C., Denison, R. F., Borrello, M., and Travisano, M. (2012). Experimental evolution of multicellularity. Proceedings of the National Academy of Sciences, 109(5):1595–1600. 114 Rice, W. R. (1989). Analyzing Tables of Statistical Tests. Evolution; international journal of organic evolution, 43(1):223. Santos-Lopez, A., Marshall, C. W., Haas, A. L., Turner, C., Rasero, J., and Cooper, V. S. (2021). The roles of history, chance, and natural selection in the evolution of antibiotic resistance. eLife, 10:e70676. Schaum, C. E. and Collins, S. (2014). Plasticity predicts evolution in a marine alga. Proceedings of the Royal Society B: Biological Sciences, 281(1793):20141486. Schlichting, C. D. (2008). Hidden Reaction Norms, Cryptic Genetic Variation, and Evolvability. Annals of the New York Academy of Sciences, 1133(1):187–203. Schlichting, C. D. and Wund, M. A. (2014). Phenotypic Plasticity and Epigenetic Marking: An Assessment of Evidence for Genetic Accommodation. Evolution; international journal of organic evolution, 68(3):656–672. Schmidt, M. D. and Lipson, H. (2009). Incorporating expert knowledge in evolutionary search: A study of seeding methods. In Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO ’09, pages 1091–1098, New York, NY, USA. Association for Computing Machinery. Schwander, T. and Leimar, O. (2011). Genes as leaders and followers in evolution. Trends in Ecology & Evolution, 26(3):143–151. Simões, P., Fragata, I., Seabra, S. G., Faria, G. S., Santos, M. A., Rose, M. R., Santos, M., and Matos, M. (2017). Predictable phenotypic, but not karyotypic, evolution of populations with contrasting initial history. Scientific Reports, 7(1):913. Smith, C. E., Smith, A. N. H., Cooper, T. F., and Moore, F. B.-G. (2022). Fitness of evolving bacterial populations is contingent on deep and shallow history but only shallow history creates predictable patterns. Proceedings of the Royal Society B: Biological Sciences, 289(1982):20221292. Smith, J. M. and Haigh, J. (1974). The hitch-hiking effect of a favourable gene. Genetical Research, 23(1):23–35. Spector, L. and Robinson, A. (2002). Genetic Programming and Autoconstructive Evolution with the Push Programming Language. Genetic Programming and Evolvable Machines, 3(1):7–40. Taylor, T., Bedau, M., Channon, A., Ackley, D., Banzhaf, W., Beslon, G., Dolson, E., Froese, T., Hickinbotham, S., Ikegami, T., McMullin, B., Packard, N., Rasmussen, S., Virgo, N., Agmon, E., Clark, E., McGregor, S., Ofria, C., Ropella, G., Spector, L., Stanley, K. O., Stanton, A., Timperley, C., Vostinar, A., and Wiser, M. (2016). Open-Ended Evolution: Perspectives from the OEE Workshop in York. Artificial Life, 22(3):408–423. Taylor, T. and Hallam, J. (1998). Replaying the tape: An investigation into the role of contingency in evolution. In Proceedings of Artificial Life VI, pages 256–265. MIT Press, Cambridge, MA. Travisano, M., Mongold, J. A., Bennett, A. F., and Lenski, R. E. (1995). Experimental tests of the roles of adaptation, chance, and history in evolution. Science, 267(5194):87–90. Turner, C. B., Blount, Z. D., and Lenski, R. E. (2015). Replaying evolution to test the cause of extinction of one ecotype in an experimentally evolved population. PLoS One, 10(11):e0142050. 115 Van den Bergh, B., Swings, T., Fauvart, M., and Michiels, J. (2018). Experimental Design, Popula- tion Dynamics, and Diversity in Microbial Experimental Evolution. Microbiology and Molecular Biology Reviews, 82(3):e00008–18, /mmbr/82/3/e00008–18.atom. Vignogna, R. C., Buskirk, S. W., and Lang, G. I. (2021). Exploring a local genetic interaction network using evolutionary replay experiments. Molecular biology and evolution, 38(8):3144–3152. Wagenaar, D. A. and Adami, C. (2004). Influence of Chance, History, and Adaptation on Digital Evolution. Artificial Life, 10(2):181–190. Wagner, A. (2023). Evolvability-enhancing mutations in the fitness landscapes of an RNA and a protein. Nature Communications, 14(1):3624. Wagner, R. A. and Fischer, M. J. (1974). The String-to-String Correction Problem. Journal of the ACM, 21(1):168–173. Wennersten, L. and Forsman, A. (2012). Population-level consequences of polymorphism, plasticity and randomized phenotype switching: A review of predictions. Biological Reviews, 87(3):756–767. West-Eberhard, M. J. (2003). Developmental Plasticity and Evolution. Oxford University Press. West-Eberhard, M. J. (2005). Developmental plasticity and the origin of species differences. Pro- ceedings of the National Academy of Sciences, 102(Supplement 1):6543–6549. West-Eberhard, M. J. (2008). Phenotypic Plasticity. In Jørgensen, S. E. and Fath, B. D., editors, Encyclopedia of Ecology. Elsevier. Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo, K., and Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43):1686. Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., Woo, K., Yutani, H., and Dunnington, D. (2020). Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. Wickham, H., François, R., Henry, L., and Müller, K. (2022). Dplyr: A Grammar of Data Manip- ulation. Wickham, H. and Girlich, M. (2022). Tidyr: Tidy Messy Data. Wickham, H. and Seidel, D. (2020). Scales: Scale Functions for Visualization. Wilcoxon, F. (1992). Individual Comparisons by Ranking Methods. In Breakthroughs in Statistics, pages 196–202. Springer New York, New York, NY. Wilke, C. O. (2020). Cowplot: Streamlined Plot Theme and Plot Annotations for Ggplot2. Wilke, C. O. and Adami, C. (2002). The biology of digital organisms. Trends in Ecology & Evolution, 17(11):528–532. Wilke, C. O., Wang, J. L., Ofria, C., Lenski, R. E., and Adami, C. (2001). Evolution of digital organisms at high mutation rates leads to survival of the flattest. Nature, 412(6844):331–333. 116 Winger, B. M., Auteri, G. G., Pegan, T. M., and Weeks, B. C. (2019). A long winter for the Red Queen: Rethinking the evolution of seasonal migration. Biological Reviews, 94(3):737–752. Woods, R. J., Barrick, J. E., Cooper, T. F., Shrestha, U., Kauth, M. R., and Lenski, R. E. (2011). Second-order selection for evolvability in a large Escherichia coli population. Science, 331(6023):1433–1436. Wund, M. A. (2012). Assessing the Impacts of Phenotypic Plasticity on Evolution. Integrative and Comparative Biology, 52(1):5–15. Xie, Y. (2020). Bookdown: Authoring Books and Technical Documents with R Markdown. Yedid, G., Ofria, C. A., and Lenski, R. E. (2008). Historical and contingent factors affect re- evolution of a complex feature lost during mass extinction in communities of digital organisms. Journal of evolutionary biology, 21(5):1335–1357. Zheng, J., Payne, J. L., and Wagner, A. (2019). Cryptic genetic variation accelerates evolution by opening access to diverse adaptive peaks. Science (New York, N.Y.), 365(6451):347–353. Zimmer, C. and Emlen, D. J. (2013). Evolution: Making Sense of Life. Roberts and Company Publishers, Greenwood Village, CO. 117 APPENDIX Plots of potentiation over time Figure 4: Potentiation over time plots from Chapter 4. Left panes show exploratory replays, with a green rectangle indicating the 50-step window with the largest increase in potentiation. Right panes show targeted replays, with the blue rectangle highlighting the largest single-step potentiation increase. Plots are ordered by the maximum single-step potentiation difference. 118 ExploratoryTargetedRank 1Max diff: 20Rank 2Max diff: 20Rank 3Max diff: 22Rank 4Max diff: 22Rank 5Max diff: 24Rank 6Max diff: 24Rank 7Max diff: 26Rank 8Max diff: 26Rank 9Max diff: 26Rank 10Max diff: 26−2000−1500−1000−5000010203040500255075100025507510002550751000255075100025507510002550751000255075100025507510002550751000255075100Depth relative to learning Depth in selected windowPotentiation Figure 5: Potentiation over time plots from Chapter 4. Left panes show exploratory replays, with a green rectangle indicating the 50-step window with the largest increase in potentiation. Right panes show targeted replays, with the blue rectangle highlighting the largest single-step potentiation increase. Plots are ordered by the maximum single-step potentiation difference. 119 ExploratoryTargetedRank 11Max diff: 26Rank 12Max diff: 28Rank 13Max diff: 28Rank 14Max diff: 28Rank 15Max diff: 28Rank 16Max diff: 30Rank 17Max diff: 32Rank 18Max diff: 32Rank 19Max diff: 32Rank 20Max diff: 34−1250−1000−750−500−250002550751000255075100025507510002550751000255075100025507510002550751000255075100025507510002550751000255075100Depth relative to learning Depth in selected windowPotentiation Figure 6: Potentiation over time plots from Chapter 4. Left panes show exploratory replays, with a green rectangle indicating the 50-step window with the largest increase in potentiation. Right panes show targeted replays, with the blue rectangle highlighting the largest single-step potentiation increase. Plots are ordered by the maximum single-step potentiation difference. 120 ExploratoryTargetedRank 21Max diff: 36Rank 22Max diff: 38Rank 23Max diff: 38Rank 24Max diff: 38Rank 25Max diff: 38Rank 26Max diff: 38Rank 27Max diff: 42Rank 28Max diff: 44Rank 29Max diff: 46Rank 30Max diff: 46−1500−1000−500002550751000255075100025507510002550751000255075100025507510002550751000255075100025507510002550751000255075100Depth relative to learning Depth in selected windowPotentiation Figure 7: Potentiation over time plots from Chapter 4. Left panes show exploratory replays, with a green rectangle indicating the 50-step window with the largest increase in potentiation. Right panes show targeted replays, with the blue rectangle highlighting the largest single-step potentiation increase. Plots are ordered by the maximum single-step potentiation difference. 121 ExploratoryTargetedRank 31Max diff: 48Rank 32Max diff: 48Rank 33Max diff: 50Rank 34Max diff: 50Rank 35Max diff: 50Rank 36Max diff: 52Rank 37Max diff: 52Rank 38Max diff: 52Rank 39Max diff: 54Rank 40Max diff: 56−2000−1500−1000−5000010203040500255075100025507510002550751000255075100025507510002550751000255075100025507510002550751000255075100Depth relative to learning Depth in selected windowPotentiation Figure 8: Potentiation over time plots from Chapter 4. Left panes show exploratory replays, with a green rectangle indicating the 50-step window with the largest increase in potentiation. Right panes show targeted replays, with the blue rectangle highlighting the largest single-step potentiation increase. Plots are ordered by the maximum single-step potentiation difference. 122 ExploratoryTargetedRank 41Max diff: 56Rank 42Max diff: 56Rank 43Max diff: 56Rank 44Max diff: 60Rank 45Max diff: 62Rank 46Max diff: 66Rank 47Max diff: 68Rank 48Max diff: 72Rank 49Max diff: 82Rank 50Max diff: 88−600−400−200002550751000255075100025507510002550751000255075100025507510002550751000255075100025507510002550751000255075100Depth relative to learning Depth in selected windowPotentiation Figure 9: Potentiation over time plots from Chapter 4 for replayed replicates that did not originally evolve learning. Lines show the potentiation of each possible behavior. One replicate is shown per row, and rows are ordered by final evolved behavior. Acronyms: BH: bet-hedged / bet hedging; EC: error correction. Note that seed 3,440 evolved bet-hedged error correction, even though it was a very low probability at the latest replay point (0 of 50 replay replicates). 123 Seed 2892BH learningSeed 3998BH learningSeed 3804ECSeed 907ECSeed 2491BH ECSeed 3440BH ECSeed 2808Mixed BHSeed 2994Mixed BHSeed 3220Low activitySeed 896Low activity0200040000255075100025507510002550751000255075100025507510002550751000255075100025507510002550751000255075100Lineage depthPotentiationBehaviorLearningBet−hedged learningMixed bet hedgingError correctionBet−hedged error correctionLow activity