ON THE CONSTRUCTIVE POWER OF ECOLOGY IN OPEN-ENDED EVOLVING

SYSTEMS

By

Emily L. Dolson

A DISSERTATION

Submitted to

Michigan State University

in partial fulﬁllment of the requirements

for the degree of

Computer Science - Doctor Of Philosophy

Ecology, Evolutionary Biology, and Behavior - Dual Major

2019

ON THE CONSTRUCTIVE POWER OF ECOLOGY IN OPEN-ENDED EVOLVING

SYSTEMS

ABSTRACT

By

Emily L. Dolson

Ecology is a powerful force for unlocking the full potential of evolution. Ecological in-

teractions create feedback loops that promote diversiﬁcation, both in natural ecosystems

and in more applied evolutionary computation frameworks. However, we currently lack a

strong theoretical framework that predicts how a given ecological community will evolve.

Such a framework would allow us to better understand and anticipate change in evolving

systems and facilitate the harnessing of ecology as a tool for guiding evolution. In this work,

I develop tools to begin the development of such theory in computational systems. Using

these tools, I show that diﬀerent mechanisms for creating ecology (e.g. spatial structure and

varied competition schemes) produce radically diﬀerent community structures and evolution-

ary outcomes. I explore the implications of these diﬀerences in the context of evolutionary

computation and biology.

Copyright by
EMILY L. DOLSON
2019

This dissertation is dedicated to my parents, who made this work possible by fostering my

love of science for the past almost 28 years.

iv

ACKNOWLEDGEMENTS

There are a vast number of people to whom I am immensely grateful for their help and

support over the course of my Ph.D. First, I thank my parents, Mark and Laura Dolson,

and my best friend, Elana Bogdan, for being there for me through the ups and downs of

research, for constantly helping me reﬁne my ideas by demanding to hear about them, and

for forcing me to have some balance in my life by whisking me oﬀ on exciting adventures.

Next, I thank my advisor, Charles Ofria, for his guidance on my research, academia, and

life. I cannot imagine a better mentor. If I am so fortunate as to have my own grad students

one day, they will beneﬁt dramatically from the fact that I had such a good role model.

Moreover, I am grateful to Charles for creating such a supportive lab environment; I thank

all of the members of the Devolab for their help, research suggestions, and for being nice

people who I enjoy being around. In particular, I extend my thanks to Anya Vostinar, Alex

Lalejini, and Mike Wiser who have been fantastic friends and collaborators. I also thank the

rest of my committee members, Chris Adami, Bill Punch, and Gideon Bradburd, for their

feedback and guidance on my research.

Additionally, I thank Joshua Nahum (who gets his own paragraph because he belongs

in both of the previous two) for making sure I eat and sleep, being a constant source of

support and happiness, and reading the entirety of this dissertation. I also thank our ferrets,

Racetrack, Crashdown, Ghost, and Daenerys, for breaking me out of many a writing block.

As I have learned about what I can do to increase diversity among the scientists in my

ﬁeld, it has become increasingly apparent to me that the experiences that played the largest

role in getting me to become a computer scientist were not, as I had previously thought,

the result of chance. They were the result of a lot of people putting in a lot of thought

and eﬀort to make me feel welcome in this ﬁeld. I will be eternally grateful to those people,

who include: Jane Orbuch, Amy Graves, and the Swarthmore College Computer Science

v

Department (particularly Lisa Meeden).

vi

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Why use digital evolution to understand eco-evolutionary dynamics in biology?
1.1.1 Digital Evolution Systems are Individual-based . . . . . . . . . . . . .
1.1.2 Digital Evolution Systems can be Generalized . . . . . . . . . . . . . .
1.1.3 Digital Evolution Systems have Realistic, yet Tractable Fitness

Landscapes

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Previous eco-evolutionary dynamics research using digital evolution . . . . . .
1.2.1 Competition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2 Facilitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.3 Predation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.4 Parasitism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.5 Perturbations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.6 Interaction networks
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Eco-evolutionary dynamics in evolutionary computation . . . . . . . . . . . .
1.3.1 Limited resources and Eco-EA . . . . . . . . . . . . . . . . . . . . . . .
1.3.2 Complexifying environments . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.1 Part 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.2 Part 2: Understanding eco-evolutionary dynamics in biology . . . . . .
1.5.3 Part 3: Harnessing eco-evolutionary dynamics for evolutionary

computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.5.4 Part 4: Developing tools to identify and understand complex digital

ecologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Chapter 2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Study systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 NK Landscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.2 Real-valued function optimization . . . . . . . . . . . . . . . . . . . . .
2.1.2.1 The N-dimensional box problem . . . . . . . . . . . . . . . . .
2.1.2.2 Niching Competition Benchmark Problems . . . . . . . . . . .
2.1.3 Linear genetic programming . . . . . . . . . . . . . . . . . . . . . . . .
2.1.3.1 ScopeGP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.3.2 Avida . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Selection schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Tournament selection . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii

1
4
5
5

7
9
9
11
14
16
17
19
20
25
26
27
27
27
28

28

29

31
31
31
32
32
33
33
35
35
37
38

2.2.2 Fitness sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Lexicase selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.4 Eco-EA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Statistical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Code Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Chapter 3 Spatial resource heterogeneity increases diversity and

evolutionary potential

. . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Evolutionary potential
. . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Quantifying environmental heterogeneity . . . . . . . . . . . . . . . . .
3.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Niche stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Eﬀect of spatial heterogeneity on phenotypic diversity . . . . . . . . . .
3.2.3 Spatial heterogeneity and evolutionary potential . . . . . . . . . . . . .
3.2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Niche stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 Fixed layout experiments . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.3 Randomized patch experiments
. . . . . . . . . . . . . . . . . . . . . .
3.3.4 Code and Data Availability . . . . . . . . . . . . . . . . . . . . . . . .

Chapter 4 Spatial resource heterogeneity creates local hotspots of

evolutionary potential

. . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Biological Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Experimental design . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2 Statistical Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.3 Code and data availability . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Existence of evolutionary hotspots . . . . . . . . . . . . . . . . . . . . .
4.4.2 Potential drivers of hotspots . . . . . . . . . . . . . . . . . . . . . . . .
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Chapter 5 Ecological Theory Provides Insights into Evolutionary

Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 Ecological theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.2 Empirical ecological techniques
. . . . . . . . . . . . . . . . . . . . . .
5.1.2.1 Phylogenetic analysis . . . . . . . . . . . . . . . . . . . . . . .
5.1.2.2 Interaction networks
. . . . . . . . . . . . . . . . . . . . . . .
5.2 Ecology in EC systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Tournament selection . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.2 Fitness sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

viii

38
39
39
41
42

43
43
46
47
48
48
51
54
60
62
62
63
64
65

66
66
68
71
71
71
74
75
75
76
79

82
82
85
87
88
89
90
90
91

5.2.3 Lexicase selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.4 Eco-EA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Empirical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 Interaction networks
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.2 Phylogenetic analysis of evolved populations . . . . . . . . . . . . . . .
5.3.3 Code availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .

93
95
96
97
97
98
99
99
5.4.1 Interaction networks
99
5.4.2 Phylogenetic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Chapter 6 Applying ecological principles to genetic programming

. . . . . 105
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2.1 Conﬁguration details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.2.2 Statistical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.2.3 Code availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.4 Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Chapter 7 The MODES Toolbox: Measurements of Open-ended Dynamics

7.3 Metrics

in Evolving Systems

. . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.2.1 Evolutionary Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.2.2 Prior work using MODES . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.2.3 Applying MODES to biology . . . . . . . . . . . . . . . . . . . . . . . . 124
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.3.1 Overarching techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
. . . . . . . . . . . . . . . . . . . . . . . . 126
7.3.1.1 Filtering out noise
7.3.1.2 Identifying meaningful sites in genomes . . . . . . . . . . . . . 129
7.3.1.3 Determining boundedness
. . . . . . . . . . . . . . . . . . . . 130
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
. . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.3.2.1 Change Metric
7.3.2.2 Novelty Metric
. . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.3.2.3 Complexity Metric . . . . . . . . . . . . . . . . . . . . . . . . 133
7.3.2.4 Ecological Metric . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.4 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.4.1 NK Landscape
7.4.2 Avida
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.4.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.4.4 Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.4.5 Code Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

7.3.2 MODES Metrics

ix

7.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.5.1 Change Metric
7.5.2 Novelty Metric
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.5.3 Complexity Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.5.4 Ecology Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

Chapter 8 Interpreting the Tape of Life: Ancestry-based Analyses Provide

8.2.2 Phylogeny Metrics

Insights and Intuition about Evolutionary Dynamics . . . . . . . 149
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
8.2 Metrics
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.2.1 Lineage Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.2.1.1 Lineage Length . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.2.1.2 Mutation Accumulation . . . . . . . . . . . . . . . . . . . . . 155
8.2.1.3 Phenotypic Volatility . . . . . . . . . . . . . . . . . . . . . . . 156
8.2.1.4 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . 156
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.2.2.1 Depth of Most-Recent Common Ancestor . . . . . . . . . . . . 158
8.2.2.2 Phylogenetic Richness
. . . . . . . . . . . . . . . . . . . . . . 159
8.2.2.3 Phylogenetic Divergence . . . . . . . . . . . . . . . . . . . . . 159
8.2.2.4 Phylogenetic Regularity . . . . . . . . . . . . . . . . . . . . . 159
8.3 Visualizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.3.1 State Sequence Visualizations . . . . . . . . . . . . . . . . . . . . . . . 161
. . . . . . . . . . . . . . . . . . . . . . . . 161
8.3.2 Fitness Landscape Overlays
8.3.3 Phylogenetic Trees
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
8.3.4 Muller Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.4 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.4.1 Niching Competition Benchmark Problems . . . . . . . . . . . . . . . . 165
8.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.5.1 Niching Competition Benchmark Problems . . . . . . . . . . . . . . . . 167
8.5.2 Avida
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

Chapter 9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
9.2.1 Evolutionary Computation . . . . . . . . . . . . . . . . . . . . . . . . . 180
9.2.2 Biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

x

LIST OF TABLES

Table 3.1: Regression table for model predicting phenotypic diversity,

measured as Shannon entropy of viable organisms on the last measured
update before EQU evolved (or the last update if EQU never evolved),
from spatial heterogeneity, as measured by Shannon entropy of niches
in the environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table 3.2: Regression table for linear model predicting entropy of the
environment from the mean and skew (and their interaction) of the
number of resource types per cell. . . . . . . . . . . . . . . . . . . . . .

Table 3.3: Logistic regression table for linear model predicting whether
or not EQU evolved from phenotypic diversity, mean resource types
per cell, and skew of resource types per cell, as well as all interactions
between these variables.

. . . . . . . . . . . . . . . . . . . . . . . . . .

54

59

60

Table 7.1: Table combining all previously described classes of

evolutionary dynamics as measured with evolutionary
activity statistics. For each class, I show the response of all
quantities measured for evolutionary activity statistics and in these
proposed metrics (novelty and diversity should behave equivalently
between the two systems). Note that I expect bounded evolutionary
activity to imply bounded complexity, as any scenario in which
complexity is growing without bound should imply that evolutionary
activity is too. Question marks indicate that the value of a given
metric is not speciﬁed in the description for a class of evolutionary
activity. Higher-numbered classes are generally believed to fall further
along the continuum of open-endedness than lower-numbered classes.
In principle, classes 4b and 4c could each be further split into
sub-classes based on whether complexity is bounded or unbounded.
Likewise, classes 1 and 2 could be further sub-divided based on
whether change is 0 or positive. In the absence of further data on the
behavior of real-world systems, it is unclear how helpful such increased
precision would be.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

xi

LIST OF FIGURES

Figure 2.1: The ﬁtness landscapes used in this experiment: A)

Himmelblau, B) Six-humped Camel Back, C) Shubert, and D)
Composition Function 2. Interactive versions available at https:
//emilydolson.github.io/fitness_landscape_visualizations. . .

Figure 3.1: Spatial arrangement of phenotypes at the end of each

experiment in worlds containing patches of two diﬀerent
resources, NOT and ORN, each radius 14. Patch locations are
indicated by circles - gold represents the NOT resource, green
represents the ORN resource, and blue represents both resources
overlapping. The color of each cell in the grid indicates the most
common phenotype in that cell across the 30 replicates. A: Patch
centers are 0 cells apart, creating complete overlap. B: Patch centers
are 14 cells apart, leading to three coexisting phenotypes. C: Patch
centers are 30 cells apart, resulting in complete patch separation and
no phenotype capable of both functions.

. . . . . . . . . . . . . . . . .

Figure 3.2: Proportion of populations (out of 30) at each inter-patch

distance that contained organisms performing both functions
(NOT and ORN) after 200,000 updates (approximately 4,000
generations). Error bars represent estimates of the standard deviation
under the assumption that the underlying pattern resembles a Poisson
distribution. Number of cells in the area where the patches overlap is
speciﬁed in parenthesis for each distance. . . . . . . . . . . . . . . . . .

xii

34

49

50

Figure 3.3: Shannon entropy (representing both phenotypic Shannon
diversity and environmental entropy) across resource patch
radii. The blue line shows environmental entropy at each radius
(mean and standard deviation, calculated from 100 randomly
generated environments at each radius). Environmental entropy peaks
at patch radius 18. In environments with patch radii less than this
peak (see sub-ﬁgures a and b), resources are more spatially isolated;
conversely, in environments with patch radii greater than this peak,
resources increasingly overlap (see sub-ﬁgure c). The high-radius
extreme represents a homogeneous control environment (see sub-ﬁgure
d), as all patches overlap and completely ﬁll the world. Final
phenotypic diversity at each radius condition tested in Avida is shown
by the boxplots (n=30 for each radius; boxes show 25%, 50%, and 75%
quantiles, with whiskers extending up to 1.5*IQR). Radii that do not
share a letter have signiﬁcantly diﬀerent distributions of phenotypic
diversity (pairwise two-sided Wilcoxon tests with Bonferroni
adjustment, p<0.05). Sub-ﬁgures a-d show representative
environments for the radius sizes indicated. Phenotypic diversity
appears to peak at a patch radius somewhere between 13 and 23, an
interval which includes the peak in environmental entropy.

. . . . . . .

Figure 3.4: Shannon entropy (representing both phenotypic Shannon

diversity and environmental entropy) across diﬀerent
resource inter-patch distances. The blue line shows environmental
entropy at each distance (mean and standard deviation, calculated
from 100 randomly generated environments at each distance; note that
the standard deviation is small and hard to see). Environmental
entropy peaks at inter-patch distance 21 (sub-ﬁgure c). In
environments with inter-patch distances greater than this peak,
resources are more spatially isolated (sub-ﬁgure d); conversely, at
inter-patch distances lower than this peak, resources increasingly
overlap (sub-ﬁgures a and b). Distance 0 represents a homogeneous
control environment, as all resources overlap completely. Boxplots
show ﬁnal phenotypic diversity at each level tested in Avida (n=30 for
each treatment; boxes show 25%, 50%, and 75% quantiles, with
whiskers extending up to 1.5*IQR). Distances that do not share a
letter have signiﬁcantly diﬀerent distributions of phenotypic diversity
(pairwise two-sided Wilcoxon tests with Bonferroni adjustment,
p<0.05). Sub-ﬁgures a-d show representative environments for the
inter-patch distances indicated. Phenotypic diversity appears to peak
at an inter-patch distance somewhere between 14 and 21, an interval
which includes the peak in environmental entropy. . . . . . . . . . . . .

52

53

xiii

Figure 3.5: Results from randomly generated environments. A)

Scatterplot of environmental entropy vs phenotypic Shannon diversity
across diﬀerent randomly generated environments (Method 3)
(n=500). Line represents the linear model shown in Table 1, and
shaded area represents 95% conﬁdence interval for the slope. B) and
C) are two examples of randomly generated environments.

. . . . . . .

Figure 3.6: Proportion of populations that evolved EQU across diﬀerent
patch radius sizes (n=30 per treatment). Error bars are estimates of
standard deviation based on the square root of the count of
populations evolving EQU for each condition. Radii that do not share
a letter have signiﬁcantly diﬀerent probabilities of evolving EQU
(pairwise two-sided Fisher’s exact tests with Bonferroni adjustment,
p<0.05). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 3.7: Proportion of populations that evolved EQU across diﬀerent

inter-patch distances (n=30 per treatment). Error bars are
estimates of standard deviation based on the square root of the count
of populations evolving EQU for each condition. Inter-patch distances
that do not share a letter have signiﬁcantly diﬀerent probabilities of
evolving EQU (pairwise two-sided Fisher’s exact tests with Bonferroni
adjustment, p<0.05). . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 4.1: Hotspot location technique. A) An example point pattern,

generated from 100 runs of Avida within a single environment. This
pattern corresponds to the most complex trait, EQU. Each point
represents the location in space of the ﬁrst organism within one of the
100 runs that was able to perform EQU. B) Kernel intensity map
generated from the point pattern in part A. C) Regions within that
kernel intensity map that are substantially more intense than observed
in the Monte Carlo simulations (i.e. the hotspot). . . . . . . . . . . . .

Figure 4.2: Results of performing the test for spatial clumping on the

point pattern from Figure 4.1A. Observed values of the test
statistic (black line) are well above the values observed in the Monte
Carlo simulations (shaded area), indicating a clumping in this pattern.
The red line indicates the theoretically derived expected value for a
random pattern.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xiv

55

57

58

72

73

Figure 4.3: Hotspots across all traits and all environments. Each panel
represents a diﬀerent environment. Background colors represent the
diﬀerent combinations of resources that are present in each location;
each color represents a diﬀerent combination, with colors assigned such
that higher hue values roughly correspond to more complex
environments (either in number of resources or complexity of traits
associated with resources). Each polygon color represents the outline
of hotspots for the trait speciﬁed in the legend. Note that only
hotspots signiﬁcant at an alpha-level of .00001 are shown. This
conservative value was necessary to be certain that hotspots are
present despite the large number of hypothesis tests performed.
However, in practice, it likely translates to an underestimation of the
true size of many of the hotspots. . . . . . . . . . . . . . . . . . . . . .

Figure 4.4: Quantile-quantile plot comparing the distribution observed
diversity percentiles for the most complex trait (EQU) to a
uniform distribution (Oldford, 2016). The shaded area represents
the 95% conﬁdence interval for where I would expect this line to fall if
these distributions were identical. Since the line is mostly within this
area, I conclude that the distribution of ranks does not deviate
substantially from a uniform distribution. This ﬁnding suggests that
there is nothing special about the level of diversity in the regions
where traits are more likely to evolve. Results for other complex traits
are similar.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 4.5: Spatial paths taken by the ﬁrst lineages to evolve the most
complex trait (EQU) from ﬁve arbitrarily-selected replicates
from each environment. The path from each replicate is colored a
diﬀerent shade of gray. As in Figure 4.3, background colors represent
the set of resources in each part of the environment. I) shows the
homogeneous control environment.

. . . . . . . . . . . . . . . . . . . .

Figure 5.1: Conceptual illustration of phylogenetic diversity metrics.

The letters around the outside represent the full set of extant taxa and
the circle in the middle indicates their most recent common ancestor.
Branching points indicate the locations of intermediate taxa (note that
in biology these have to be inferred, but in EC we have perfect
information). The three panels indicate the three diﬀerent facets of
phylogenetic diversity: richness, divergence, and regularity.
Reproduced from (Tucker et al., 2017). . . . . . . . . . . . . . . . . . .

74

78

81

88

xv

Figure 5.2: Example interaction network with three nodes. Red arrows

denote harmful interactions and blue arrows denote beneﬁcial
interactions. Line width and color darkness indicate magnitude of
eﬀect. In this example, A reduces B’s ﬁtness a lot, B reduces A’s
ﬁtness a little, and B reduces C’s ﬁtness a lot. A increases C’s ﬁtness a
lot. A plausible mechanism for this positive interaction is that, by
harming B, A reduces B’s harmful eﬀect on C. . . . . . . . . . . . . . .

Figure 5.3: Depiction of a Lexicase decision tree. A, B, and C are

test-cases, and the internal nodes represent the order in which they are
picked. Each possible path through the tree leads to a diﬀerent leaf
node (“island”). The individuals on that “island” are the ones that have
the potential to be selected for reproduction. . . . . . . . . . . . . . . .

Figure 5.4: The probability of long-term survival under lexicase

selection across diﬀerent population sizes (S) and lengths of
evolutionary time (G).

. . . . . . . . . . . . . . . . . . . . . . . . .

90

93

95

Figure 5.5: Interaction networks for the same community under four

diﬀerent selection schemes: tournament selection, ﬁtness sharing,
Eco-EA, and lexicase selection. a) Shows the networks for a single
population. Red edges indicate harmful interactions, blue edges
indicate beneﬁcial interactions. Edge width denotes interaction
strength. An interactive version of this chapter is available (Dolson,
2018a). b) Shows boxplots summarizing the number of beneﬁcial and
harmful interactions observed under each selection scheme across 100
randomly generated populations.

. . . . . . . . . . . . . . . . . . . . . 100

Figure 5.6: Phenotypic and phylogenetic diversity over time for each
problem. Shaded area is the bootstrapped 95% conﬁdence interval
around the mean. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Figure 5.7: Phenotypic and phylogenetic diversities in ﬁtness sharing

across various values of sharing threshold (σshare) in the context of the
Collatz problem. Shaded areas are bootstrapped 95% conﬁdence
intervals around the mean. . . . . . . . . . . . . . . . . . . . . . . . . . 103

Figure 6.1: Impact of good and bad advice on Eco-EA and Lexicase.

Heat maps for each algorithm show the success rate of that algorithm
in the presence of varying quantities of good and bad advice. Note
that tournament selection is not actually capable of receiving hints; it
is presented here as a control. Each cell in the heat maps represents
the proportion of runs (out of 10) that successfully found the optimal
solution to the 10-dimensional box problem.

. . . . . . . . . . . . . . . 114

xvi

Figure 7.1: Relationships between the metrics. Originally published in

(Dolson et al., 2015). Solid lines with arrows indicate metrics which
are prerequisites for other metrics.

. . . . . . . . . . . . . . . . . . . . 117

Figure 7.2: An illustrative example of how I ﬁlter components for

persistent lineages. At time point A, the purple lineage has proven
to be persistent and therefore the original component from A-t will be
considered meaningful. Similarly, the green and blue lineages persist to
time point A+t and so the original green and blue components will be
considered meaningful as they existed at time point A.

. . . . . . . . . 126

Figure 7.3: Amount of change at over time in varying NK Landscape

environments. Fitness sharing increases the amount of change in the
population over time. Conversely, a routinely changing environment
leads to spikes in change that quickly drop as the population converges
again. Shaded region represents a bootstrapped 95% conﬁdence
interval around the mean.

. . . . . . . . . . . . . . . . . . . . . . . . . 139

Figure 7.4: Rain-cloud plot of change at ﬁnal generation across NK

Landscape treatments. Environmental conditions that increase the
amount of change at the ﬁnal time point include: increasing the
mutation rate, decreasing the population size, and implementing
ﬁtness sharing. Black circle and line indicate mean and bootstrapped
95% conﬁdence interval.

. . . . . . . . . . . . . . . . . . . . . . . . . . 140

Figure 7.5: Rain-cloud plot of change at ﬁnal generation across multiple

population sizes and ﬁlter lengths in Avida in the empty
environment. Labels along the top indicate population size. Black
circle and lines indicate mean and bootstrapped 95% conﬁdence
interval. Horizontal bar indicates change = 1, the expected average
change in the empty environment.

. . . . . . . . . . . . . . . . . . . . 140

Figure 7.6: Change over time across diﬀerent environments and

population sizes in Avida. Note that y axes have diﬀerent scales.
In general, change is much higher in the Logic-9 environment. Filter
length, t, is equal to population size.

. . . . . . . . . . . . . . . . . . . 142

Figure 7.7: Amount of novelty over time in NK Landscape with varying

mutation rate. The novelty metric measures the number of
completely new meaningful genomes that have lineages that persisted
since the previous time point. As the mutation rate increases, more
novelty is continuously produced. However, at all mutation rates the
novelty decreases over time. Shaded region represents a bootstrapped
95% conﬁdence interval around the mean.

. . . . . . . . . . . . . . . . 143

xvii

Figure 7.8: Rain-cloud plot of novelty at ﬁnal generation across NK

Landscape treatments. . Black circle and lines indicate mean and
bootstrapped 95% conﬁdence interval. At the ﬁnal time point, little
meaningful novelty is found in the baseline populations. Only
increasing the mutation rate increases novelty, implying that other
conditions with high change are simply promoting cycling between a
ﬁxed set of genotypes.

. . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Figure 7.9: Rain-cloud plot of complexity at ﬁnal generation across NK

Landscape treatments. Black circle and lines indicate mean and
bootstrapped 95% conﬁdence interval. Note that I have excluded the
high N condition from this graph because it throws oﬀ the axes. Most
of the treatments reach the maximum complexity allowed by the
genome length (20 or 100) and cannot continue to increase. High
mutation rate and smaller populations decrease the ﬁnal complexity
achieved by the populations on average.

. . . . . . . . . . . . . . . . . 145

Figure 7.10: Complexity in Avida over time across diﬀerent

environments and population sizes. Note that y axes have
diﬀerent scales. In general, complexity appears to continue increasing
in the Logic-9 environment, whereas it drops and then stabilizes in the
empty environment. Filter length, t, is equal to population size.

. . . 146

Figure 7.11: Rain-cloud plot of ecology at ﬁnal generation across NK
Landscape treatments. Black circle and lines indicate mean and
bootstrapped 95% conﬁdence interval. Fitness sharing is the only
condition that reliably produces ecology.

. . . . . . . . . . . . . . . . . 147

Figure 8.1: Four methods of representing a lineage. This example lineage

has accumulated three mutations (one reverse mutation and two
substitutions) and gone through three distinct phenotypes. In (A),
each state along the lineage represents a single individual; lineage
length is the number of generations spanned by the lineage (eight). In
(B), states represent the sequence of genotypes along the lineage,
reducing lineage length to four. In (C) states represent the sequence of
phenotypes along the lineage; lineage length is the number of times a
diﬀerent phenotype is expressed (three). In (D), states are a particular
phenotypic characteristic; here, lineage length is two.

. . . . . . . . . . 153

Figure 8.2: Schematic representation of state sequence visualization.

Colors indicate diﬀerent states. The optional legend along the side can
be used to indicate any information relevant to understanding the
drivers of the state changes.

. . . . . . . . . . . . . . . . . . . . . . . . 160

xviii

Figure 8.3: A lineage drawn on top of a ﬁtness landscape . . . . . . . . . . 161

Figure 8.4: The same phylogeny depicted in two diﬀerent ways. A)

Depicts a standard phylogeny, where nodes represent taxa and edges
indicate parent-child relationships. In this phylogeny, node position
along the time axis represents time of birth. Nodes are colored to
distinguish taxa that are extant in the present (blue) from extinct taxa
(black). Note that, because this is a pruned phylogeny, all leaf nodes
are currently extant. Edges can be colored to convey whatever
information is most useful. B) Depicts the same phylogenetic tree but
with boxes indicating taxon lifespans in place of circular nodes.

. . . . 164

Figure 8.5: A Muller plot depicting the same phylogeny as the

visualizations in Figure 8.4. The red region represents the root
node. As it gradually goes extinct, the proportion of the ﬁgure it takes
up gradually diminishes. Its three oﬀspring are shown in purple, green,
and blue.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

Figure 8.6: A close-up on two adjacent peaks in the Shubert function

ﬁtness landscape. Lineages are depicted as paths fading from white
to black over evolutionary time. The lineages shown here evolved
under a mutation rate of 0.01. A) Was evolved using a tournament size
of 2, whereas B) was evolved using a tournament size of 16. These
ﬁgures neatly illustrate how increased tournament size keeps the
lineage near the tops of the peaks.

. . . . . . . . . . . . . . . . . . . . 168

Figure 8.7: Values of example metrics across diﬀerent mutation rates for
each of the four problems. All lineage-based metrics are calculated
on the lineage of the ﬁttest organism at the ﬁnal time point;
population-level means behaved similarly. All experiments shown here
used a tournament size of 4. Circles are medians, vertical lines show
inter-quartile range, and shaded area is a bootstrapped 95% conﬁdence
interval around the mean. Note that both axes are on log scales. . . . . 169

Figure 8.8: Values of example metrics across diﬀerent tournament sizes
for each of the four problems. All experiments shown here used a
mutation rate of 0.001. All lineage-based metrics are calculated on the
lineage of the ﬁttest organism at the ﬁnal time point; population-level
means behaved similarly. Circles are medians, vertical lines show
inter-quartile range, and shaded area is a bootstrapped 95% conﬁdence
interval around the mean. Note that both axes are on log scales. . . . . 170

xix

Figure 8.9: Values of example metrics across diﬀerent mutation rates for

each of the four problems under a diversity-preserving
selection regime, Eco-EA. All lineage-based metrics are calculated
on the lineage of the ﬁttest organism at the ﬁnal time point;
population-level means behaved similarly. All experiments shown here
used a tournament size of 4. Circles are medians, vertical lines show
inter-quartile range, and shaded area is a bootstrapped 95% conﬁdence
interval around the mean. Note that both axes are on log scales. . . . . 171

Figure 8.10: A sampling of informative lineage metrics from Avida

calculated after approximately 10,000 generations. Mean pairwise
distance is a measurement of phylogenetic divergence; higher values
imply greater phylogenetic diversity. MRCA generation is the
generation at which the current most recent common ancestor
occurred (lower numbers mean it was longer ago). Phenotypic
volatility is the number of changes in phenotype along the dominant
(most plentiful) lineage. Unique phenotypes is the count of unique
phenotypes that occurred along the dominant lineage.

. . . . . . . . . 173

Figure 8.11: Muller plot from a representative run of Avida in the

limited resources environment. This plot shows the population
dynamics of diﬀerent phenotypes over 8,000,000 updates (a unit of
time in Avida that is proportional to the number of CPU cycles used
thus far). Only phenotypes that make up at least 5% of the population
at at least one time point are shown. Note the region in the middle of
the plot where we can see repeated selective sweeps that are contained
to a single niche.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

Figure 8.12: Mutations along the dominant lineage across treatments in
Avida, calculated after approximately 10,000 generations. The upper
left shows the total count of mutations of all types. The other plots
show the quantity of each mutation type as a proportion of the total.

. 175

Figure 8.13: State sequence visualization of phenotypes along the

dominant lineage in the changing environment condition. The
key along the left indicates the environment’s state over time. Each
column is one replicate. From this visualization, we can see that most
successful lineages are able to adapt to a change in the environment
relatively quickly. We can also see that there is a fair amount of
variation in the precise timings of these changes and whether they
occur for every change of the environment. . . . . . . . . . . . . . . . . 176

xx

Chapter 1

Introduction

Over the past decade, there has been an increasing awareness of the profound impact

that ecological and evolutionary dynamics have on each other (Schoener, 2011; Hendry,

2016). Ecological interactions shape the underlying ﬁtness landscape that is traversed by

evolving populations. At the same time, evolutionary forces continually adjust the compo-

sition of organisms that make up that ecosystem. As such, ever-shifting ﬁtness landscapes

and ﬂuctuating ecosystems continually fuel each other’s change.

At the heart of these dynamics is diversity. Evolution requires diversity within a group

for selection to act on; low-diversity populations typically have low evolutionary potential

(Lavergne and Molofsky, 2007; McDonald and Linde, 2002; Walker and Ofria, 2012). Eco-

logical communities are, by deﬁnition, diverse groups of species and one of the fundamental

questions in ecology is how so many species can stably coexist; communities that are more

diverse tend to be more stable, productive, and resilient (Hooper et al., 2005; Tilman et al.,

2014). In both ecology and evolution, there are some processes that should clearly reduce di-

versity and some that should promote – or at least preserve – it. Improved understanding of

these processes is critical to creating theory to predict how biological systems will respond to

perturbations, particularly as humans continue to drastically change the environment around

us, and our attempts at remediation often backﬁre due to ill-formed theoretical foundations.

1

Diversity is also of critical importance to evolutionary computation, a machine learning

technique that draws inspiration from biology with the goal of ﬁnding diverse and eﬀective

solutions to challenging computational problems. The core idea behind evolutionary algo-

rithms is that a population of potential solutions is maintained, where the highest quality

ones are replicated (replacing lower-quality solution) and varied (typically through mutation

or crossover). As a result, better solutions evolve. However, non-diverse populations can get

stuck on good, but suboptimal solutions (local ﬁtness peaks) from which all paths to the

global optimum require deleterious mutations, a phenomenon is called premature conver-

gence. Generating and maintaining diversity, in essence, is seen as a technique for increasing

the amount of exploration in a system and avoiding premature convergence. While evolu-

tionary computation researchers have developed a wide range of techniques for promoting

diversity, the ﬁeld currently lacks a theoretical framework for understanding how they relate

to each other or predicting which would be most appropriate in a given scenario.

Criss-crossing biological and computational areas of research is the work of scientists

studying artiﬁcial life. In this ﬁeld, researchers create artiﬁcial lifelike constructs in the form

of computer programs, robots, or synthetic organisms. These systems make it possible to ad-

dress questions that would be intractable in natural systems. In particular, artiﬁcial systems

make it possible to perform experiments at spatiotemporal scales that would otherwise be

infeasible in the lab or ﬁeld. As a result, scientists have used artiﬁcial life to understand the

hallmarks of living systems in nature, such as long-term ecosystem development, open-ended

evolutionary dynamics, and the origins of complex traits and behaviors. Complex ecologies

appear to be a critical driver of each of these phenomena, helping to generate novelty that

gets incorporated back into the system.

Untangling eco-evolutionary dynamics has proven challenging across all of these ﬁelds.

These dynamics are governed by a web of feedback loops, stochastic events, and emergent

properties. Often simply adding a property, such as spatial structure, to a system will com-

pletely change the outcome. To add further complications, these processes often play out

2

over vast spatiotemporal scales, making them diﬃcult to manipulate in a controlled exper-

iment. As such, digital systems are a valuable tool for understanding the behavior of these

complex systems (Adami, 2002). They allow us to untangle feedback loops by performing

experiments in which we disable some processes. For instance, sometimes it would be use-

ful to separate the eﬀects of ecological and evolutionary dynamics from each other so that

we can understand what the exact drivers of a result are; such understanding is important

for predicting how our results map to other systems, in which the relative rates of these

two processes may be diﬀerent. In a wet lab or ﬁeld system, such an experiment is eﬀec-

tively impossible to carry out cleanly. Another obstacle to understanding eco-evolutionary

dynamics is that evolution experiments often require thousands of generations. Even for

rapidly-reproducing organisms like bacteria, it takes most of a year just to reach 2000 gen-

erations. Similarly, collecting ecological data at a high enough resolution to answer many

questions is often incredibly labor-intensive.

These problems can be alleviated by performing parallel digital evolution experiments.

Since digital systems give us perfect control over our experimental design, tasks such as

isolating ecological and evolutionary dynamics are possible; we can simply disable mutations

and prevent any evolution from occurring, leaving only the purely ecological eﬀects. Such

controls make it possible for us to delve into mechanisms that cannot be addressed in other

systems. Digital systems also generally enable evolution to happen substantially faster than

natural systems. In Avida (discussed below), for example, 2,000 generations take less than

24 hours. This speed makes digital evolution a particularly valuable tool for performing

exploratory digital experiments to develop theory that can then be tested in a natural system

once we have determined the precise questions that it is worth investing the time to study

in the lab or ﬁeld.

3

1.1 Why use digital evolution to understand eco-

evolutionary dynamics in biology?

A wide range of digital systems for studying evolution exist. Perhaps the most popular

of these systems, and the one that I will use here, is the Avida Digital Evolution Platform

(Ofria and Wilke, 2004). Avida has all of the properties that I will argue are critical for the

study of open-ended eco-evolutionary dynamics. Each organism in Avida is a self-replicating

computer program, with a genome made up of computer instructions. Experiments generally

begin by seeding the population with a single program containing a sequence of code that

copies each instruction in its genome into its oﬀspring. Every time an instruction is copied,

there is a small probability that it will mutate to be a diﬀerent instruction. Organisms in

Avida inhabit a lattice of cells, with each cell containing at most one organism. When an

organism is done copying its program, the resulting oﬀspring is placed in an adjacent cell,

overwriting any organism previously occupying that cell. This dynamic creates competition

for space, which incentivizes organisms to copy themselves as fast as possible to reduce the

chance of being overwritten before they reproduce. Organisms can increase their reproduc-

tive rate either by evolving to be more eﬃcient or by evolving to perform computational

tasks that the experimenter has associated with a bonus to CPU speed. When organisms

successfully complete these tasks, they receive bonus CPU cycles each update relative to the

other members of the population, meaning that they are able to execute more code per unit

time. Since the Avida system has inheritance, variation, and selection (some organisms may

have more oﬀspring in their lifetime than others), these populations of digital organisms can

evolve by natural selection.

Note that the conditions that are necessary and suﬃcient for evolution are not suﬃcient

for the formation of ecological communities. Such communities require that there be multiple

potential strategies for survival (i.e. niches) and that it be possible for groups of organisms

using diﬀerent strategies to coexist. There are a vast number of ways to add this property

4

to a system, many of which will be discussed later in this chapter.

We argue that digital evolution systems have a number of properties that make them

well-suited to the task of studying eco-evolutionary dynamics:

1.1.1 Digital Evolution Systems are Individual-based

Because the digital evolution systems that I am working with are instantiations of

evolution (as opposed to merely simulating evolutionary dynamics), they are necessarily

individual-based. Each member of the population is an individual with some amount of

agency, allowing for a broad range of survival traits to evolve. Experiments are conducted by

modifying the rules governing the behavior of individuals or their interactions with each other

and the environment. This approach is in contrast to systems that represent the population

at a more abstract level, such as a Gaussian distribution of trait values.

In these non-

individual-based systems, experiments are more often performed by changing the equations

that govern how the population changes over time. These two approaches generally target

diﬀerent levels of abstraction. Because digital evolution systems try to avoid introducing

any preconceptions about the range of potential outcomes that could result from a given

set of starting conditions, individual-based models are more appropriate in cases where the

cumulative impact of individual behavior can lead to complex and nuanced population-level

eﬀects. Most open questions in eco-evolutionary dynamics fall squarely into this category, as

they largely concern the large-scale trends that result from simple rules governing interactions

between a large number of individuals.

1.1.2 Digital Evolution Systems can be Generalized

The underlying genetic representation of organisms in most digital evolution systems

are computer programs, making them fundamentally diﬀerent from DNA-based organisms.

While this fact means that digital evolution is generally not a good ﬁt for modeling any

speciﬁc system on earth, it also provides a valuable framework for producing insights that

5

generalize across systems (Maynard Smith, 1992; Kawecki et al., 2012; Ofria and Wilke, 2004;

MacPherson and Gras, 2016). Traditionally, general inferences about biology are drawn by

observing the same behavior across multiple model systems (e.g. Drosophila, E. coli, and

Arabadopsis). The more distinct these systems are from each other the better, as any

diﬀerences reduce the chance that the observed results were caused by an idiosyncrasy of a

speciﬁc subset of the phylogeny of life on earth. However, all of life on earth shares some

common ancestry (Maynard Smith, 1992). Thus, even results that can be replicated across

many model systems are not necessarily unavoidable outcomes of evolution that will always

be true. If we observe a pattern in both digital and more natural systems, however, that is

strong evidence that it will be true across most evolving systems.

The problem of generalizability has historically been even more challenging to overcome

in ecology than in evolutionary biology, as evolution’s set of three necessary and suﬃcient

conditions provides a useful starting point for building ground-up conceptual models. As a

result, ecologists are often forced to take a more top-down approach, trying to understand

the complex ecosystems that exist on earth. This work is critical, but the process of ﬁnding

general trends across ecosystems is arduous, and attempting to understand the mechanisms

driving those trends is even more so. We can speed this process up by starting with compar-

isons to a digital system. This technique will allow us to more quickly determine whether

results are likely to generalize before putting in the substantial work that would be required

to conduct an comparable ﬁeld study. More importantly, if we do not observe the same re-

sults in biology as we do in digital evolution, that is an indication that there is an important

factor at play that we have not noticed. Such disparities will help us identify mistaken pre-

conceptions and either help us build more realistic digital environments or point us toward

critical areas where ecological theory may need further development.

6

1.1.3 Digital Evolution Systems have Realistic, yet Tractable Fit-

ness Landscapes

Since evolution is a key component of eco-evolutionary dynamics, it is important to

consider the adaptive landscape in which that evolution is occurring. Fitness landscapes, a

common idea in evolutionary biology, map the ﬁtnesses associated with various genotypes

and the ease of mutating between those genotypes (Wright, 1932). When visualized in three

dimensions, values of two traits are generally arranged along the x- and y-axes, with the

ﬁtness corresponding to each of these values shown as a surface on the z-axis. This surface

generally has peaks that we expect the population to climb over time, and valleys that we

expect will be challenging to cross. In reality, ﬁtness landscapes exist in many more than

three dimensions, which may impact the validity of the intuitions they provide (Gavrilets

and Vose, 2005), but they are generally believed to be a useful mental model nonetheless.

The ﬁtness landscape on which a population is evolving plays a large role in determining the

population’s adaptive trajectory, often in relatively complex ways. Having a well-understood

ﬁtness landscape is critical, or at least helpful, for most research on evolutionary theory. The

ﬁtness landscape determines which mutations will be beneﬁcial, how easy it is to arrive at

the most ﬁt phenotype, and, to some extent, the strength of selection – factors that are often

critical to evolution research. When we study eco-evolutionary dynamics, we essentially layer

the complexity of ecology on top of this landscape. Thus, we can maximize our chances of

understanding why we observe the results that we do by studying eco-evolutionary dynamics

in systems that are already well-understood in the absence of ecology. For instance, we may

want to study which ecotypes are best at invading a given niche. If we already know that a

given ecotype is a common stepping stone on the way to the adaptation associated with the

niche, we have a much better sense of what results we should be expecting to see. Thus, if

we observe diﬀerent results, we will be able to recognize that there is something unexpected

going on there and explore it in more depth.

It would be easy to create ﬁtness landscapes that we understand well because they are

7

so simple. However, ﬁtness landscapes in biology tend to be relatively complex, and some

evolutionary dynamics are driven by this complexity. Thus, in order to investigate these

dynamics, we need a somewhat complex ﬁtness landscape. There are a variety of techniques

for creating complex ﬁtness landscapes for computational systems. Many of these techniques

have been studied enough that we have a reasonably good intuition for the type of ﬁtness

landscapes that they create. NK models are a prime example (Kauﬀman and Levin, 1987;

Ochoa et al., 2008). However, these ﬁtness landscapes are generally either static, or being

changed by some abiotic, exogenous factor. Thus, by deﬁnition, there is no ecology. Most of

ecology can be thought of as a ﬁtness landscape that is largely based on the other members

of the population. As such, ﬁtness landscapes in systems with ecology are subject to con-

tinuous endogenous change. This dynamic nature can make such ﬁtness landscapes rather

unwieldy. Every action that any organism takes changes the ﬁtness landscape in which all

other organisms are evolving. These changes last for an indeﬁnite length of time. As a re-

sult, attempting to mathematically characterize these ﬁtness landscapes would be incredibly

challenging. Nevertheless, these ever-shifting ﬁtness landscapes are a useful mathematical

construct for the purposes of conceptualizing the complexity of adaptive dynamics in these

ecosystems. In research on eco-evolutionary dynamics, this biotically-mediated ﬁtness land-

scape is often layered on top of a static ﬁtness landscape representing abiotic conditions.

The static underlying ﬁtness landscape of Avida has been studied extensively under

various conﬁgurations; this is true of other digital evolution systems as well, but for simplicity

the rest of this section will focus on Avida. For research that requires a more complex

ﬁtness landscape, Avida includes the option to reward organisms with extra CPU cycles

for completing various computational tasks. The most widely-used tasks are the set of all

eight two-input Boolean logic functions in addition to the logical not function (hereafter

referred to as logic-9). The ﬁtness landscape of the logic-9 environment is well-studied, as

it has been used in a large number of experiments (many of which are described later in

this chapter). Moreover, it has some important structural similarities to ﬁtness landscapes

8

in which biological organisms are evolving, in that the simpler tasks are somewhat related

to the more complex tasks. While evolving the ability to do the simpler tasks is useful in

its own right, the simpler tasks can also serve as building blocks for the more complex tasks

(Lenski et al., 2003). Thus, Avida meets the criteria of having a complex-but-well-studied

static ﬁtness landscape making it well-suited for use in studying eco-evolutionary dynamics

in a digital system.

Avida also makes it easy to incorporate ecology by layering on a ﬁtness landscape based

on biotic interactions. This layering is simplest in the case of competitive interactions. As

previously mentioned, digital organisms in Avida can evolve to perform various computa-

tional tasks. Akin to evolving the ability to metabolize a new resource in biology, evolving a

new task in Avida allows for the exploitation of a novel resource. By default, the resources

associated with tasks in Avida are unlimited. However, we can introduce competitive inter-

actions between organisms by limiting the quantity of these resources. This scarcity creates

negative frequency-dependent selection as organisms compete for the resources. Any time an

organism uses a resource it consumes a portion of it, making the ability to use that resource

less valuable or all organisms. Avida also includes a number of other mechanisms by which

organisms can inﬂuence each other’s ﬁtness landscapes, such as parasitism, predation, and

cooperation.

1.2 Previous eco-evolutionary dynamics research using

digital evolution

1.2.1 Competition

While the ﬁrst major discoveries to come out of digital evolution research fell squarely

into the category of pure evolutionary theory(Wilke et al., 2001; Lenski et al., 2003), simple

ecological dynamics were soon explored. Cooper and Ofria demonstrated that limiting the

9

quantity of resources available was suﬃcient to evolve a stable and diverse ecosystem(Cooper

and Ofria, 2003). They set up this experiment by associating each of the tasks from the logic-

9 environment with a limited resource. Every time an organism performed the task and

received the reward, it used up some of the resource, resulting in a situation where digital

organisms performing tasks is eﬀectively equivalent to biological organisms metabolizing

chemical resources. This arrangement created negative frequency dependence; tasks being

performed by fewer organisms were more valuable, just as inhabiting any underutilized niche

in an ecosystem is advantageous.

In prior experiments, where resources were unlimited,

some phenotypes in the population were simply more ﬁt than the others; the ﬁtnesses did

not change over time. Layering the shifting biotic ﬁtness landscape on top of the static abiotic

landscape created pressure for adaptive radiation and subsequent stabilizing selection on the

diﬀerent ecotypes.

Chow et al. expanded on this work, exploring the continuum from environments with

very scarce resources to environments with resources so plentiful as to be eﬀectively unlim-

ited(Chow et al., 2004). They found that ecotypic diversity peaks at intermediate levels of

resource availability. Resource availability, in this context, is roughly analogous to primary

productivity; it determines the carrying capacity of each niche. Previous ﬁeld research had

found that species richness tends to be highest in environments with intermediate productiv-

ity, but there had been a lack of consensus over which properties of environments were driving

this eﬀect. Chow et al. replicated this ﬁnding in Avida, purely by varying the amounts of

each resource that were present in the environment. This result demonstrated that limited

resources alone are suﬃcient to explain the productivity-diversity relationship. This abil-

ity to isolate potential drivers of a phenomenon and determine which ones are suﬃcient to

produce a given result is a major strength of individual-based digital systems.

These initial digital eco-evolutionary dynamics papers tell a compelling story about how

evolution guides the long-term assembly of ecosystems. However, ecology also eﬀects long-

term evolutionary trends. Walker and Ofria explored how continued evolutionary innovation

10

plays out diﬀerently within ecological communities assembled at diﬀerent levels of produc-

tivity (Walker and Ofria, 2012). Previous research on the connection between population

diversity and evolutionary potential suggested that there is generally a positive correlation

between these two variables. This ﬁnding makes sense, as a more diverse population is nec-

essarily more spread out across the ﬁtness landscape, increasing the probability that some

members of the population will be close to an adaptive peak. However, Walker and Ofria

found that the relationship between diversity and evolutionary potential is more complicated;

evolutionary potential peaked at a diﬀerent level of productivity than diversity did. This

observation is an example of the diﬃculty of predicting eco-evolutionary feedbacks a priori.

Digital systems give us the ﬁne-grained control necessary to ﬁnd such deviations from our

expectations. In chapter 3, I present some follow-up work to these ﬁndings.

1.2.2 Facilitation

So far, I have discussed ecosystems assembled purely on the basis of competitive in-

teractions. While these model ecosystems are useful for gaining basic insights into eco-

evolutionary dynamics, ecosystems in nature involve a wide variety of diﬀerent types of

interactions among organisms. These include various forms of facilitation, that is species

making it easier for other species to survive.

Perhaps the simplest of facilitation interactions is cross-feeding (also called syntrophy),

wherein organisms of one ecotype feed oﬀ of byproducts generated by organisms of a dif-

ferent ecotype. Yet, even such a simple interaction can have unpredictable eco-evolutionary

dynamics. Johnson and Wilke built an incredibly simple cross-feeding ecosystem in Avida

by providing two resources, each of which created the other as a byproduct when it was me-

tabolized (Johnson and Wilke, 2004). This type of environment produces an even stronger

form of stabilizing selection than occurs in the purely competitive interactions in the logic-

9 environment, as increased competition for one resource is combined with a decrease in

the production of that resource (assuming a ﬁxed population size). The result is a pair of

11

ecotypes that exhibit either Lotka-Volterra-like oscillatory dynamics or a more stable equi-

librium. Which one of these regimes occurs appears to be largely determined by stochastic

factors. More complex hierarchical cross-feeding food webs have also been constructed in

Avida (Yedid et al., 2008, 2009, 2012).

Cross-feeding is a relatively simple form of facilitation as it is not susceptible to cheating.

Cooperation, in which cheating is a possibility, poses a more challenging evolutionary prob-

lem. Nevertheless, cooperative species interactions play a critical role in many ecosystems

(Goudard and Loreau, 2008; Frederickson et al., 2005). Understanding what evolutionary

pressures can lead to such scenarios is critical to understanding this component of ecosystem

assembly, but is, again, challenging to address in vitro or in vivo. Digital evolution has, in

many cases, facilitated a deeper exploration of the intricate mechanisms that allow for the

evolution of cooperative behaviors.

For example, based on inclusive ﬁtness theory we would expect that genes that code

for altruistic behavior should be more evolutionarily successful if they more eﬀectively direct

that altruistic behavior at other individuals with the same gene. Clune et al. tested this

hypothesis by providing various altruistic instructions that could be included in organisms’

genomes by mutation (Clune et al., 2010). When organisms executed these instructions,

they would lose some of their CPU cycles and a neighboring organism would receive some

number of CPU cycles. The neighbor that received this beneﬁt was determined by the speciﬁc

altruistic instruction that was used. The available instructions fell along a continuum of how

accurately they could target the beneﬁt at other altruistic organisms. The least accurate

(other than the controls) was kin targeting, which donated the CPU cycles to a parent or

oﬀspring. Slightly more accurate was similarity targeting, in which the organism receiving

the CPU cycles had to have a genome that was similar to the donating organism (edit distance

below a certain threshold). Lastly, there was greenbeard targeting, named for the idea that

altruistic organisms can display visual markers (like a green beard) that is tied to their

altruism and thus provides a reliable signal to other organisms (Hamilton, 1964; Dawkins,

12

1976). With this method, only other organisms that have previously executed the greenbeard

altruism instruction can recieve beneﬁt from an organism executing it. Theoretically, this

greenbeard approach should be the most accurate form of altruism targeting, as, in the Avida

implementation, it literally allows organisms to help only those that express the same gene.

Surprisingly, however, while similarity targeting was favored over kin targeting, greenbeard

targeting was not selected over the other forms. Further experimentation revealed that the

key diﬀerence was that greenbeard targeting did not provide a way to distinguish mildly

altruistic organisms from highly altruistic organisms (those that executed the instruction

repeatedly). This result may explain the fact that most greenbeard altruism observed in

nature tends to relate to binary rather than continuous forms of altruism.

Indeed, when

the greenbeard instruction would only beneﬁt other organisms exhibiting the same level of

altruism, it did become the preferred mechanism.

Some traits, such as releasing a chemical into the surrounding environment, do not have

a speciﬁc other organism being targeted. For these traits to be evolutionarily favored, there

must be some mechanism for ensuring that their beneﬁt primarily goes to kin.

In many

cases, this guarantee can come from living in a suﬃciently spatially-structured environment,

such that the majority of neighboring organisms are kin. Another option is quorum-sensing,

wherein altruistic organisms signal their presence to each other, allowing them to base their

actions on the number of kin that are present (Diggle et al., 2007). Beckmann and McKinley

demonstrated that digital organisms in Avida are capable of evolving quorum sensing if

allowed to pass messages to each other (Beckmann and McKinley, 2009). In collaboration

with biologists, they were then able to use this system to understand how bacteria might

evolve resistance to eﬀorts to interfere with their quorum sensing (Beckmann et al., 2012).

Vostinar (née Johnson) et al. expanded on this research, ﬁnding that not only is the evolution

of quorum sensing in Avida possible, it also increases the range of conditions in which altruism

will evolve and the extent to which it is beneﬁcial (Johnson et al., 2014). This work provided

the basis for them to build a more precisely targeted digital system to directly model quorum

13

sensing in a species of bacteria (Vostinar et al., 2018).

1.2.3 Predation

Predation is a fundamentally diﬀerent interaction than any that I have discussed so far,

and a critical component of many ecological communities in nature. There is an asymmetry in

predator-prey relationships that is not present with competition or cross-feeding (one species

consumes the other, not the other way around), which results in a system with fundamentally

diﬀerent behavior (Johnson and Wilke, 2004). Studying predation is complicated by the fact

that most predators need to be able to move in order to reach enough prey to sustain

themselves. As such, predation research in Avida generally allows organisms to execute

instructions that cause them to move around the world as well as instructions that give

them sensory information about the landscape, making for far more complex (and thus

slower) experimental worlds. Prey organisms consume abiotic resources from their cells in

the environment and store them (initial research on prey foraging behavior in the absence of

predators was conducted by Walker (Walker, 2011)). Predators can attack prey and receive

a portion of their stored resources by executing a speciﬁc instruction while facing them.

Because of the huge impact that allowing movement has on the strategies evolved by

digital organisms, a lot of predation research in Avida thus far deals with the evolution of

movement strategies and sensory behaviors. As it turns out, these behaviors profoundly

aﬀect and are aﬀected by the presence of predators. Wagner, et al. compared the behavior

of prey evolved with and without a co-evolving population of predators and found that

predators force the prey to explore a much wider space of strategies (Wagner et al., 2013).

They also were able to evolve to make much better use of their sensory capabilities, which

in turn made them better at foraging as well. I hypothesize that the fact that ecosystems

often include a variety of related-but-diﬀerent survival pressures, many of which consistently

become more challenging as a population adapts to them, makes them particularly eﬀective

at promoting the evolution of more complex organisms.

14

O’Donnell et al. expanded on some of these questions by investigating properties of

prey communities that make them better at evolving to avoid novel predators (O’Donnell

et al., 2014). They looked at two diﬀerent factors: the amount of standing genetic variation

in a prey population, and the evolutionary history of that population (i.e. whether the

lineages had been exposed to diﬀerent predators in the past). Surprisingly, they found that

standing genetic variation had very little impact on the extent to which the prey population

was able to evolve resistance to predators; it had a slight positive eﬀect among populations

without predators in their evolutionary history. Evolutionary history, on the other hand,

had a dramatic eﬀect. Despite the fact that the predators used in the experimental phase

were from a completely diﬀerent population than the ones that the prey had co-evolved with,

prey populations that had historically been exposed to predators were substantially more

likely to evolve resistance to novel predators.

Further research on the speciﬁc techniques used to avoid predators has addressed the

evolution of mimicry and prey grouping. Mimicry requires a fascinating and complex set

of ecological interactions: a poisonous prey species that signals its toxicity to potential

predators, a predator species that responds to this signal, and an additional non-poisonous

prey species that mimics the signal so that predators will avoid it as well. This set-up creates

an interesting form of density-dependence, where the eﬃcacy of the mimicry decreases as the

relative abundance of the mimic species increases. Lehmann et al. created such an ecosystem

in Avida, and studied the scenarios under which it is stable (Lehmann et al., 2014). They

found that the degree to which the toxin harms predators and the accuracy of the mimicry

are both important variables in the evolution of a stable population of mimics. The more

harmful the toxin, the greater the density of mimics the system can support.

Prey grouping is a more complex predator avoidance strategy, as it requires some degree

of coordination among prey organisms. Biswas et al. used Avida to untangle the mechanisms

that drive the evolution of such behavior (Biswas et al., 2014). Two diﬀerent potential

mechanisms had previously been proposed: dilution and predator confusion. Dilution is the

15

simple observation that being in a group can lower an individual’s chances of being selected by

predators for attack, while predator confusion is the idea that predators may have trouble

visually perceiving individual prey to hunt if those prey are in a group. To study this

phenomenon, the authors observed the extent to which prey organisms evolved grouping

behavior under various conditions. While the presence of predators did signiﬁcantly increase

prey grouping behavior, the eﬀect was no diﬀerent when predators’ visual sensors were

inhibited. This result suggests that the dilution eﬀect is suﬃcient to evolve prey grouping

behavior, and that the more complex explanation of predator confusion is unnecessary.

1.2.4 Parasitism

There is increasing evidence that parasites play a critical role in many ecological commu-

nities (Laﬀerty et al., 2006). Conveniently, parasites ﬁt neatly into Avida. In fact, parasites

evolved naturally and unexpectedly in Avida’s precursor, Tierra (Ray, 1994). Since the

presence of parasites in Tierra led to various side eﬀects that made for much more complex

dynamics, Avida implements parasites more intentionally. Like normal organisms in Avida,

parasites are a sequence of simple computer code instructions. Unlike normal organisms,

parasites cannot survive in a cell on their own.

Instead, they inhabit the same cell as a

normal organism and steal its CPU cycles (i.e. when the host organism should have been

allowed to execute an instruction, the parasite instead executes one from its own genome).

For most parasite experiments in Avida, the normal organisms (hosts) are required to per-

form at least one task in order to replicate. Parasites are able to infect hosts that perform

the same computational tasks as them. Thus, there is pressure for hosts to evolve the ability

to perform new tasks and lose the ability to perform the old tasks. This set-up promotes the

generation and maintenance of diversity (Zaman et al., 2011), and ultimately leads the hosts

to evolve to be more complex than they would otherwise have been (Zaman et al., 2014).

The parasitism-mutualism continuum has been addressed in other digital systems as

well. For example, Symbulation is a platform that is designed to simplify the study of

16

these topics by reducing the complexity introduced in other systems (Vostinar and Ofria,

2019). Notably, Symbulation was recently used to demonstrate that spatial structure has

an entirely diﬀerent impact on host-parasite co-evolution than it does on other forms of

potentially-cooperative co-evolution (Vostinar and Ofria, 2019).

1.2.5 Perturbations

Most of our most pressing concerns about ecosystems on Earth stem from the fact that

many of them are currently facing novel perturbations, often human-driven. Biological data

tend to be too limited to allow us to fully understand how an ecological system will be

aﬀected by particular perturbations. Digital systems like Avida, on the other hand, cannot

easy model a speciﬁc, real-world ecosystem.

In conjunction, however, they can help us

dramatically improve our expectations about the results of a particular perturbation in a

natural environment.

The most dramatic types of perturbations an ecosystem can experience are mass extinc-

tion events. Mass extinctions are divided into two categories: press and pulse (Erwin, 1998).

Press extinctions are those that are induced by a prolonged stressor, to which adaptation is

theoretically possible. In Avida, press extinctions can be induced by lowering resources to

low levels (Yedid et al., 2008). Pulse extinctions, on the other hand, are those that occur

too quickly and too intensely for populations to adapt to. In Avida, pulse extinctions can

be induced by killing a high percentage of the population (Yedid et al., 2009).

Part of the reason that extinctions can be so damaging to ecosystems is that most eco-

logical communities are highly interdependent (Roopnarine et al., 2007). While communities

evolved via adaptive radiation in the logic-9 environment would almost certainly change if

they lost a member, they lack the level of interdependence that a food web creates. As

such, understanding the impact of mass extinctions on an ecological community requires us

to assemble a more complex community. Most of the research on mass extinctions in Avida

has used a complex cross-feeding environment that functions similarly to a trophic pyra-

17

mid (Yedid et al., 2008, 2009, 2012). In this environment, simpler tasks produce byproduct

resources that can be metabolized by performing more complex tasks. Thus, as in most

ecological communities, species at higher trophic levels (which are often more complex) are

dependent on species at lower trophic levels.

In a press extinction, this dependency means that the most complex ecotypes are gen-

erally lost from the population (Yedid et al., 2008). If the amount of resources entering the

lowest trophic levels is decreased, less of the byproduct resources can by produced, with a

cascading eﬀect across all levels. This deﬁcit percolates up through the trophic pyramid, re-

sulting in minimal resource available for the highest level. What are the long-term of eﬀects

of such episode on a community? Yedid et al. studied the process of regaining complex traits

after the press event is over, and found that ecological communities that contained ecotypes

with the ability to perform the most complex trait were more likely to re-evolve it than

communities that never had such functionality (Yedid et al., 2008). While the mechanisms

behind this dynamic were complex and varied, many of them are also present in biological

ecosystems, suggesting that this trend might be general.

However, a follow-up study by Yedid et al. showed that recovery from press extinctions

still takes a long time, particularly at higher trophic levels. Recovery from pulse extinctions,

on the other hand, occurs much more rapidly (Yedid et al., 2009). This tendency is believed to

be the result of press extinctions providing an alternative selection regime that encourages

the loss of more complex traits in favor of simpler traits that will more reliably yield a

reward. The authors also observed some parallels between the results of this experiment and

observations from the fossil record, further supporting the idea that these same dynamics may

occur in biological ecosystems. Further analysis of the phylogenies of populations subjected

to these perturbations revealed that stronger perturbations, particularly press events, results

in a substantial loss of phylogenetic diversity (Yedid et al., 2012).

Similar research conducted in the standard logic-9 environment (with only a single

trophic level) suggests that the use of multiple trophic levels in Yedid et al’s work were

18

important to their results. Communities evolved in logic-9 and then subjected to extreme

bottlenecks (eﬀectively strong pulse extinctions) were able to regain their original levels of

consumption of each resource relatively quickly, in contrast to the delayed recovery of the

most complex task in the cross-feeding set-up (Olson et al., 2013). Interestingly, although the

post-bottleneck communities fulﬁlled all of the same functions as the original communities,

the ecotypes present often partitioned resources very diﬀerently. This result reﬂects the fact

that there are a wide variety of sets of ecotypes that are able to stably co-exist in the logic-9

environment.

A much more mild form of perturbation is a localized disruption of some regions within

the environment, but not others. Such a scenario is commonly observed in nature in the

form of anthropogenically fragmented ecosystems. Often, governments will set aside some

regions as biological reserves, where the native ecosystem will, theoretically, be undisturbed.

Making such choices determines the spatial structure that populations in this environment

will have going forward, a property known to have a dramatic eﬀect on evolutionary dynamics

(Tomassini, 2005). While this topic had received a great deal of attention from ecologists,

there is a relative lack of knowledge of how evolutionary considerations should inform such

decision-making. In chapter 3, I use Avida to preliminarily investigate this question (Dolson

et al., 2016b).

1.2.6

Interaction networks

All of the ecological forces mentioned above can, of course, be combined. Such studies

can quickly become too complex to understand the results of, so it is important to take

care in experimental design. However, a number of interesting directions for research on

eco-evolutionary dynamics concern the network of interspeciﬁc interactions that deﬁnes an

ecological community. For such lines of research, combining diﬀerent drivers of ecology in

a single experiment is necessary. A number of ideas for approaching such questions have

been laid out by Fortuna et al. (Fortuna et al., 2013), and some are already being put into

19

practice, with promising results (Strona and Laﬀerty, 2016).

1.3 Eco-evolutionary dynamics in evolutionary computa-

tion

Evolutionary computation researchers have devoted substantial attention to understand-

ing how to promote coexistence among lineages exploring diﬀerent regions of a ﬁtness land-

scape (Goldberg et al., 1987; Mahfoud, 1995; Mouret and Doncieux, 2009; Pugh et al., 2015).

Promoting diversity in an evolving population is important for EC because it reduces prema-

ture convergence on suboptimal ﬁtness peaks while still encouraging both exploration and

exploitation. However, some types of diversity facilitate ﬁnding global optima better than

other types. For example, a high mutation rate generates more new genotypes, but this

increased exploration tends to be localized to a single region of the ﬁtness landscape and

sacriﬁces ﬁne-tuned exploitation of promising prospective solutions; the population plows

ahead exploring rather than reﬁning the solutions already found.

Even amongst more advanced diversity-promoting approaches, some produce forms of

diversity that are spread across the ﬁtness landscape in ways that appear to be more or less

conducive to solving a given problem. For example, Mouret and Doncieux found that diver-

sity in agent behavior is much more predictive of success in evolutionary computation than

diversity in neural network architecture (Mouret and Doncieux, 2009). Similarly, Walker

and Ofria observed diﬀerent evolutionary potential at equivalent diversity levels under dif-

ferent environmental conﬁgurations in Avida (Walker and Ofria, 2012). Although we have a

high-level idea of why many techniques promote diversity, the mechanistic details by which

these populations spread across the ﬁtness landscape are less well understood. For example,

ﬁtness sharing promotes diversity via negative density dependence (Goldberg et al., 1987).

Does this diversity represent qualitatively diﬀerent portions of the ﬁtness landscape than, for

example, lexicase selection (Spector, 2012)? The answer to this question depends on subtle

20

diﬀerences in the evolutionary pressures that these diﬀerent selection schemes place on the

evolving population.

Because techniques for promoting diversity rely on creating ﬁtness interactions between

individuals in the population (beyond basic competition for space in the next generation),

they are, by deﬁnition, creating simple ecologies. Ecologists have developed rigorous theory

to predict how ecological communities change over time and are exploring their interactions

with evolutionary dynamics. A particularly popular area of study concerns the conditions

under which long-term stable coexistence between diﬀerent species is possible (Pacala and

Tilman, 1994; Chesson, 2000; Chase and Leibold, 2003; Letten et al., 2017). Such a the-

oretical framework can be applied to understand stable coexistence in EC populations as

well, ultimately allowing us to determine dynamic properties of the system. Of particular

interest, such theory should facilitate the prediction of which pathways can simultaneously

be traversed along a given ﬁtness landscape. By understanding this interaction between

diversity maintenance strategies and the ﬁtness landscape, we can predict which algorithmic

techniques will be the most eﬀective a priori. These insights should apply both to choosing

an appropriate diversity maintenance technique and to setting its parameters. Additionally,

an improved mechanistic understanding of why existing algorithms work should facilitate

building more eﬀective variations of those algorithms.

A number of highly eﬀective evolutionary-computation techniques owe their success to

ecological dynamics. In particular, I will focus on ﬁtness sharing, Eco-EA, and lexicase

selection.

Fitness sharing, one of the ﬁrst ecology-based diversity maintenance approaches, is by far

the simplest of these techniques (Goldberg et al., 1987). Under this regime, the ﬁtness of each

individual is reduced in proportion to the number of similar individuals in the population.

As such, constant pressure to diversify is imposed.

Eco-EA is an approach, originally built on to of Avida, to solving complex problems

by associating limited resources with simpler challenges that can be used as stepping stones

21

(Goings and Ofria, 2009; Goings, 2010; Goings et al., 2012). These challenges can be sub-

problems of the larger problem, individual test cases, or other related tasks that may be

used as components of a global solution. Resources remain plentiful until a solution to the

corresponding sub-problem is discovered. When a population ﬁrst solves a sub-problem, the

large amount of resource dramatically boosts the ﬁtness of individuals that use it. As the

number of individuals using the resource increases, it become less valuable. As long as there

is a cost to attempting to use each resource, negative frequency dependence fosters small,

but stable sub-populations capable of solving each sub-problem.

Lexicase selection (Spector, 2012) has proven extraordinarily successful in genetic pro-

gramming (Helmuth et al., 2015; Helmuth and Spector, 2015). In lexicase selection, a large

number of test cases are used as criteria for evaluation. Each time an organism has to be

selected to propagate into the next generation, the test cases are applied in a random order,

successively ﬁltering out all but the most ﬁt candidates. Once a single candidate remains (or

all of the test cases have been applied and a random individual is chosen from the ﬁnal set) it

is replicated into the next generation and the process is repeated. The fact the the ordering

of these test cases continually changes means that solutions successful at diﬀerent subsets of

tasks are all able to co-exist. Although lexicase selection has traditionally been used with

test cases, there should be no reason it cannot be used with multiple ﬁtness functions instead

to provide ecologically-mediated hints. In this way, any hint that could be given to a system

like Eco-EA could just as easily be provided to lexicase selection as well.

From an ecological perspective, lexicase selection creates pressure for the population to

diversify into diﬀerent niches that are based on building blocks to a more complex problem.

In order for an organism to be evolutionarily successful, it must be one of the best at at least

one of the test cases/ﬁtness functions. In most cases, it must be among the best at many

of them (probably a related group). Thus, organisms compete with others within the niches

created by each test case/ﬁtness function. Eco-EA also promotes intra-niche competition,

but in a subtly diﬀerent way. Limited resources are used up as organisms receive ﬁtness

22

bonuses, meaning that they compete with other organisms beneﬁting from the same bonus.

Thus, those with higher ﬁtness will competitively exclude less ﬁt organisms that depend too

much on the same resources (Chesson, 2000). Natural systems with limited resources also

promote intra-niche competition, but in a subtly diﬀerent way. Limited resources are used

up as organisms consume them, meaning that competition occurs per-resource. Thus, those

more capable of consuming multiple resources will competitively exclude less ﬁt organisms

that depend too much on those same resources (Chesson, 2000).

This distinction between competition in lexicase selection and in systems with limited

resources has a few implications. First, limited resources allow for generalists that do not

excel at any tasks but are decent at many. It is unclear whether this ﬂexibility opens up

additional pathways through the ﬁtness landscape. Second, lexicase selection always exerts

selective pressure to improve on all test cases/ﬁtness functions, whereas the pressure exerted

by limited resources can be very weak (or even non-existent) if a niche is already occupied.

This dynamic may make lexicase selection less robust to receiving bad advice about how

to solve a problem than other mechanisms for providing ecologically-mediated hints. Third,

lexicase selection automatically incentivizes organisms that excel at uncommon combinations

of test cases/ﬁtness functions. Ideally, we will be able to ﬁnd a hybrid approach to providing

ecologically-mediated hints that combines limited resources with lexicase selection to achieve

the best of both worlds.

MAP-Elites is another successful algorithm that leverages ecological dynamics and user

input (Mouret and Clune, 2015). In MAP-Elites, the user chooses axes along which they

believe variation will be important. MAP-Elites then breaks the hyperspace deﬁned by these

axes into discrete regions. Each region can be occupied by at most one solution. When a

new candidate solution is created, it is assessed on each axis and the corresponding bin is

located. If the new candidate solution has a higher ﬁtness than the previous occupant of that

bin, it replaces that occupant. Otherwise, it is tossed out. Evolution proceeds by choosing

occupants of various bins in the search space and mutating them to create new candidate

23

solutions.

The sub-division of axes for MAP-Elites explicitly partitions the search space into niches

in a way that has clear parallels to choosing tests to provide to lexicase selection or hints to

associate with limited resources. The lack of directionality in MAP-Elites, however, makes it

distinct from lexicase selection. At face value, it might seem to make it diﬀerent from Eco-EA

as well. Certainly this intuition is true to some extent - hints provided as limited resources

do have directionality to them. However, the foundation of limited resources is negative

frequency-dependence which, in addition to reducing the risk of the entire population being

led astray, has the important eﬀect of promoting diversity along the axis of the hint. This

diversity goes beyond merely allowing a portion of the population to drift; niche partitioning

should force the population to spread out along the axis in question to produce meaningful

diversity that may be helpful in solving the overall problem.

There are a wide variety of other ecologically-inspired strategies that have proven to

be eﬀective for maintaining diversity in evolutionary algorithms. These largely fall into

four categories: niching/speciation (e.g. (Goldberg et al., 1987; Stanley and Miikkulainen,

2004)), parent selection (e.g.

(De Jong, 1975; Mahfoud, 1992)), dividing the population

into subpopulations (e.g.

(Hornby, 2006; Hu et al., 2005)), and adjusting the objective

function to favor diversity and/or novelty (e.g. (Mouret and Doncieux, 2009)). Of these

approaches, only niching/speciation consistently promotes stable coexistence of diﬀerent

types of strategy. Even among niching/speciation strategies, most emphasize a diversity

of phenotypes rather than the diversity of evolutionary building blocks that Eco-EA and

Lexicase Selection promote. While solutions built on diﬀerent building blocks may often

exhibit diﬀerent phenotypes, they may also arrive at similar phenotypes despite taking very

diﬀerent paths through the ﬁtness landscape. These diﬀerent paths will likely result in

underlying diﬀerences in genetic architecture that may inﬂuence which behaviors are easy

for a lineage to evolve next. As such, I suspect that diversity of building blocks will promote

greater evolutionary potential than other kinds of population diversity.

24

1.3.1 Limited resources and Eco-EA

The original formulation of limited resources that led to ecologically-mediated hints

arose in work by Cooper and Ofria where they demonstrated that limited resources are

suﬃcient to evolve stable branching of diﬀerent ecotypes (Cooper and Ofria, 2003). The

resulting ecological community was incredibly simple, with competition being the only form

of interaction between organisms. However, I hypothesize that these simple ecologies (with

meaningful diﬀerences between niches) were better positioned to solve complex problems

than populations in which diversity was promoted by other means.

Goings and Ofria initially tested a more applied form of limited resources in Eco-EA,

using a toy bitstring matching problem, with the goal of maintaining a population of diverse

solutions (Goings and Ofria, 2009). In this experiment, resources (either limited or unlim-

ited) were associated with diﬀerent bitstrings that could be matched to varying extents by

members of the population. When resources were unlimited, the population converged on

matching a single one of these strings. Limiting the resources, however, produced consis-

tent subpopulations specialized on each string. The negative frequency dependent selection

imposed by the limited resources caused the diﬀerent bitstrings to stably coexist within the

population. These results held when the task was to match a more general pattern of bits,

rather than an exact string.

Following this initial conceptual test, Goings et al. applied Eco-EA to a complex real-

world problem: generating models for the behavior of sensor nodes in a ﬂood warning network

(Goings et al., 2012). Not only did Eco-EA successfully evolve a diversity of models that

satisﬁed the constraints of the problem, but the models evolved by Eco-EA were better

than a comparison algorithm at continuing to diversify when transferred to an environment

without limited resources. These results demonstrate the strength of Eco-EA at evolving a

diversity of solutions to complex problems.

Eco-EA is just one example of the beneﬁts that making more thoughtful and intentional

use of ecology can provide. Here, I deﬁne ecological dynamics as interactions between mem-

25

bers of the population that aﬀect ﬁtness. I propose that such dynamics provide a variety

of beneﬁts, some of which have begun to be put in to practice and some of which have not.

The most obvious of these eﬀects is the beneﬁt to diversity alluded to above.

1.3.2 Complexifying environments

In nature, the network of interactions within a community grows increasingly complex

over evolutionary time, as the evolution of new species creates new niches for other species

to evolve into. In turn, new types of interaction emerge (such as parasitism, mutualism, and

predation). This gradual complexiﬁcation has two implications that may be important for

evolutionary computation: 1) The gradual increase in complexity of the biotic environment

facilitates the continual production of more complex niches and thus the evolution of more

complex traits, and 2) Ecological communities as a whole can often perform functions that

no single species could perform alone.

The niches inhabited by various members of an ecological community usually bear com-

monalities. While they are not identical, the ability to occupy one niche is often a building

block for the ability to occupy a diﬀerent one (a clear example of this eﬀect in nature is

metabolic pathways for metabolizing various resources). When new niches appear, they are

closely enough related to existing niches that it’s plausible that existing populations will

evolve to inhabit them. Without the gradual feedback loop of increasing complexity that

eco-evolutionary feedbacks enable, evolving the ability to exploit newly established niches

would be incredibly improbable.

Community-level functions represent a diﬀerent mechanism for solving complex prob-

lems. For example, a forest, collectively, is able to store solar energy in sugar molecules,

ﬁx nitrogen, take in various nutrients from the soil, and use these nutrients and energy to

power mobile, decision-making agents. Of course, these functions are each performed by

diﬀerent species in the community. While it may be possible to evolve a single species that

carries out all of these tasks, it is worth considering the possibility that it is easier to evolve

26

a community that carries them out collectively. The fact that such communities seem to be

so common in nature is certainly suggestive of this possibility. Such an approach would have

a variety of potential beneﬁts. For example, it would promote more functional modularity, a

trait that is believed to be critical for evolutionary potential. Moreover, the closely interde-

pendent species found in a complex ecological community are a likely precursor to egalitarian

major transitions, which are believed by many to be a critical step towards evolving complex

species. Lastly, as ecological communities can eﬀectively be thought of as a set of subroutines

running in parallel, they lend themselves easily to evolving parallel programs, somewhat akin

to Holland’s bucket brigade algorithm (Holland, 1985) and more recent advances in learning

classiﬁer systems.

1.4 Thesis Statement

The central thesis of this dissertation is that ecology plays a critical role in determining

the outcome of evolution and is necessary to unlock its full constructive power. Thus, we

can use ecology as a tool to intentionally guide evolution.

1.5 Contributions

This dissertation is organized into four parts:

1.5.1 Part 1: Introduction

In this chapter (1), I provide a high level introduction and literature review of concepts

that are relevant across chapters. Chapter 2 describes all study systems and evolutionary

computation approaches used across this dissertation.

27

1.5.2 Part 2: Understanding eco-evolutionary dynamics in biology

Nature exhibits a striking range of eco-evolutionary dynamics. To understand how they

might be harnessed in other evolutionary contexts, it is important to understand the forces

the generate and maintain diverse ecological communities. To this end, I begin by furthering

the current understanding of eco-evolutionary dynamics in biology. Speciﬁcally, the two

chapters in this section deal with the impact of spatially heterogeneous environments on

eco-evolutionary dynamics. Spatial heterogeneity is an important driver of eco-evolutionary

dynamics that is relatively easily to manipulate. Thus, from the perspective of using ecology

as a tool for manipulating evolutionary outcomes, spatial heterogeneity is a logical entry

point. Within this section, chapter 3 examines the ecological communities formed by spatially

heterogeneous environments, showing that certain kinds of heterogeneity promote diversity

and increased probability of evolving complex traits. Chapter 4 then delves into the speciﬁcs

of this later ﬁnding; I show that environments contain local regions where evolving a given

complex trait is more likely. Although I do not isolate a precise mechanism, this work

provides strong support for the idea that we can be build landscapes to promote desirable

outcomes.

1.5.3 Part 3: Harnessing eco-evolutionary dynamics for evolution-

ary computation

In this section I show that many of the best breakthroughs in evolutionary computation

have come from adding ecology to existing algorithms and demonstrate the power of further

incorporating ecology into evolutionary algorithms. This work functions as a case study on

the value of ecology for promoting evolutionary innovation. Moreover, it lays the ground-

work for future use of evolutionary computation as a model system for experimenting with

approaches to controlling eco-evolutionary dynamics.

Speciﬁcally, in chapter 5, I demonstrate that theory and techniques from ecology can be

28

used to improve our understanding of and predictive capacity for evolutionary computation.

I show that that these two systems are governed by the same underlying laws (and are even

in some cases mathematically identical). Using this new theory, I explore the functional

diﬀerences between similar-seeming algorithms, generating hypotheses to explain why they

behave diﬀerently.

I extend this work in chapter 6 with a proposal to use ecology to facilitate a new ap-

proach to evolutionary computation. For the same reasons that humans are notoriously bad

at creating ﬁtness functions, attempts to provide evolutionary algorithms with “hints” (e.g.

domain knowledge) about how to solve a given problem tend to be ineﬀective. Intentionally

creating corresponding ecological niches, however, may be a more robustly successful ap-

proach to providing such guidance. In this chapter, I demonstrate that Eco-EA can be used

in this way; it is able to make use of good advice without being harmed by bad advice. This

property is in contrast to other selection schemes. Thus, in addition to being a promising

direction for algorithm development, this work furthers our understanding of the practical

diﬀerences between the ecological communities created by diﬀerent selection schemes.

1.5.4 Part 4: Developing tools to identify and understand complex

digital ecologies

Making the case that ecology is critical to evolution requires comparing systems that

have more or less ecology. In chapter 7, I present an approach to identifying the presence

of ecology and quantifying the number of occupied niches in a system. This work is part of

a larger project to develop a suite of metrics which aid in identifying a range of hallmarks

for “open-ended evolution” (i.e. evolution that displays the same large-scale properties as

evolution in nature). Of course, simply being able to identify the presence or absence of

various dynamics is not suﬃcient for understanding the mechanisms that produced them.

Thus, in chapter 8, I present an additional suite of metrics and visualizations aimed at

quantifying the underlying evolutionary dynamics that are occurring in a system. Together,

29

these metrics open up a world of possibilities for advancing the study of eco-evolutionary

dynamics in digital systems.

30

Chapter 2

Methods

2.1 Study systems

To understand how the role of ecology generalizes across evolutionary substrates, I use

a range of experimental systems: NK landscapes, real-valued function optimization, and two

versions of linear genetic programming. Across all systems, reproduction was asexual.

2.1.1 NK Landscapes

An NK model uses two parameters, N and K, to randomly generate a ﬁtness landscape

(Kauﬀman and Levin, 1987). N speciﬁes the number of sites in the genome, each of which

is a 0 or a 1. The ﬁtness landscape speciﬁes the eﬀect of a given value at a given site on

the ﬁtness of the bit-string organism. This ﬁtness eﬀect depends on the values at the K

subsequent adjacent sites. As such, K tunes the ruggedness of the landscape; low values

of K produce smooth landscapes with few peaks, whereas high values produce landscapes

with many peaks. I use NK models in chapter 7 to test my intuitions about evolutionary

processes, because they are a well-understood system for studying general questions about

evolutionary dynamics.

31

2.1.2 Real-valued function optimization

In real-valued function optimization, a population of vectors of ﬂoating-point numbers

is evolved to ﬁnd the optimum of a function. These problems have easy-to-visualize ﬁtness

landscapes, making them another useful domain for testing intuitions about evolutionary

processes. I used two types of problem in this context:

2.1.2.1 The N-dimensional box problem

The N-dimensional box problem is a toy problem that I use in chapter 6 to explore the

possibility of using ecological niches to provide “hints” to an evolutionary algorithm about

how to solve a problem. In the version used here, solutions exist within a 10-dimensional

box. All sides of the box have a length of 1. Candidate solutions in the population are

sequences of 10 ﬂoating point numbers between 0 and 1, representing a point within the box.

The goal is to ﬁnd the origin (i.e. the sequence [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]). This problem

could be trivially solved by using the inverse Euclidean distance between a point within the

box and the origin as the ﬁtness function. To simulate a more challenging problem, I use

the following ﬁtness function:

f itness =



.01

1(cid:115) 10(cid:80)

i=1

(cid:115)
(cid:115)

10(cid:80)
10(cid:80)

i=1

x2
i

i=1

x2
i > .1

i ≤ .1
x2

(2.1)

In this function, inverse Euclidean distance from the origin is the ﬁtness only when

Euclidean distance from the origin is less than 0.1 (which, given the high dimensionality of
the space, represents 2.5× 10−13 of the possible positions). For all points in the box that are
farther away from the origin, ﬁtness is 0.01. I used 0.01 rather than 0 as the base ﬁtness to

ensure the Eco-EA’s ﬁtness multipliers would have an eﬀect. The cut-oﬀ of 0.1 was chosen

such that it is unlikely for evolution to solve the problem without hints. Using this ﬁtness

function creates a quintessential "needle in a haystack" problem, where the ﬁtness landscape

32

is ﬂat except for one incredibly tall and thin peak. Such problems are generally considered to

be among the most challenging for evolutionary computation to solve, as they do not allow

for incremental improvement.

2.1.2.2 Niching Competition Benchmark Problems

The GECCO Competition on Niching Methods is an annual competition that evaluates

the performance of evolutionary algorithms on multi-niche problems (i.e. those with multiple

optima). Entrants compete on a range of real-valued optimization functions (Li et al., 2013).

I select a subset of these functions for use here: Himmelblau, Shubert, Composition Function

2, and Six-Humped Camel Back. For each test problem, the X and Y coordinates oﬀered

by a given organism are translated by the function into a ﬁtness value. Because of their

low dimensionality, it is possible to fully visualize each problem’s actual ﬁtness landscape,

a property I take advantage of in chapter 8. I use the implementations of these problems

at https://github.com/mikeagn/CEC2013 (C++ for ﬁtness calculations during evolution,

Python for post-hoc analysis). Figure 2.1 shows the ﬁtness landscapes deﬁned by each of my

four chosen test problems. Each organism’s genome consisted of two ﬂoating point numbers

that deﬁned its position in the ﬁtness landscape.

2.1.3 Linear genetic programming

In linear genetic programming, an individual is a computer program composed of a

sequence of instructions. The program receives input and must produce a corresponding

output. In this work, I use two diﬀerent linear genetic programming systems: one (ScopeGP)

developed with the goal of solving problems via evolutionary computation and one (Avida)

developed with the goal of being a model system for better understanding biology.

33

Figure 2.1: The ﬁtness landscapes used in this experiment: A) Himmelblau, B) Six-
humped Camel Back, C) Shubert, and D) Composition Function 2.
Interactive versions
available at https://emilydolson.github.io/fitness_landscape_visualizations.

34

2.1.3.1 ScopeGP

ScopeGP is a linear genetic programming representation designed around the use of

modular subroutines called “scopes.” Making eﬃcient use of modular subroutines has long

been thought to be important to facilitating the evolution of genetic programs that solve

complex problems. ScopeGP is designed to facilitate the evolution of modular and reusable

code while minimally aﬀecting the way in which traditional linear genetic programs are

organized (i.e., linear sequences of instructions). The instruction set includes a number of

instructions that automatically create programming structures like loops and subroutines.

ScopeGP is described in more detail in (Dolson et al., 2018c).

I use ScopeGP in the context of solving three diﬀerent problems:

• Squares: in this task, which was designed to be quickly-solvable, programs must take

a number as input and return its square as output.

• Collatz: a problem taken from Helmuth et. al’s general program synthesis benchmark
suite (Helmuth and Spector, 2015) that is known to be challenging. Programs receive

a number as input and must return the number of iterations that it takes the a Collatz

sequence starting from that number to converge.

• Dow chemical challenge: a real-world symbolic regression problem White et al.
(2013) in which programs must predict details about the outcome of an industrial

chemical synthesis process based on a wide range of measurements.

For all of these problems, programs are evaluated based on their performance on a suite

of test-cases.

2.1.3.2 Avida

The Avida Digital Evolution Platform (Ofria and Wilke, 2004) is a well-established

artiﬁcial life system that has been used to study a wide range of evolutionary dynamics

35

(Lenski et al., 2003; Goldsby et al., 2012; Zaman et al., 2014; Dolson et al., 2016b). Avida’s

track record and popularity make it a good platform both for testing intuitions about new

analytical techniques and for conducting experiments aimed at improving our understanding

of biology.

Avida is a world of self-replicating linear computer programs competing for space in

a 60 x 60 toroidal lattice. Organisms in Avida replicate asexually by copying themselves

instruction-by-instruction and dividing; copy and divide operations, however, are not perfect

and can result in mutated oﬀspring. When an organism successfully replicates, the result-

ing oﬀspring is placed randomly within the parent’s Moore neighborhood (the surrounding

8 cells), replacing that location’s former occupant. Thus, there is selection pressure for

organisms to replicate quickly before being copied over by others.

Improvements in ﬁtness (i.e. reproductive output) can either come from optimizations

to the code of the genome (increasing its eﬃciency) or from gaining the ability to take in

resources by performing boolean logic operations. Resources give programs extra CPU cy-

cles, allowing them to copy themselves faster than competitors. Each of the nine two-input

Boolean logic functions allows organisms to use a diﬀerent resource, with more complex func-

tions corresponding to more valuable resources. The presence and value of these resources

can be adjusted to create a wide range of environments.

I use the following environments in this work:

• Minimal: This environment contains no resources. As such, selection pressure is

entirely focused on optimizing the eﬃciency at which organisms can self-replicate.

• Logic-9: The environment contains the nine resources associated with non-trivial one-
and two-input Boolean logic functions: NAND, NOT, OR-NOT, AND, OR, AND-

NOT, NOR, XOR, and EQUALS (for more information on these logic functions in

Avida see Lenski et al., 2003).

• Limited resources: This environment contains the same resources as Logic-9. How-

36

ever, in this environment each resource is limited in quantity. When organisms use

resources, they deplete them, creating pressure to use resources that are not used by

other members of the population. Unlike the previous two conﬁgurations, the limited

resources environment contains multiple niches and support stable ecologies (Cooper

and Ofria, 2003; Chow et al., 2004).

• Changing: In this environment, the rewarded functions cycle between two opposing
states: ENV-NAND (where NAND is rewarded and NOT is punished) and ENV-NOT

(where NOT is rewarded and NAND is punished). ENV-NAND and ENV-NOT are

conﬁgured such that no phenotype can be optimal across both environment states. To

achieve maximal ﬁtness in either ENV-NAND or ENV-NOT, organisms must perform

only the focal function.

• Spatially heterogenous: In this class of environments, resources are geographically
limited, creating an environment in which diﬀerent traits are advantageous in diﬀerent

spatial regions.

I create these environments by placing circular patches of diﬀerent

resources within the world. By allowing these patches to overlap, I can insure the

presence of a wide variety of niches.

2.2 Selection schemes

For all experiments performed outside of Avida (in which selection is implicit), a means of

selecting individuals to reproduce is required. In much of this work, I explore the diﬀerences

amongst four selection schemes: tournament selection (the control, which does not create

ecology), ﬁtness sharing (Goldberg et al., 1987), lexicase selection (Spector, 2012), and Eco-

EA (Goings and Ofria, 2009).

37

2.2.1 Tournament selection

As tournament selection is one of the most popular selection techniques and does not

create an ecology, I use it as a control. In its simplest form, tournament selection has one

distinct parameter, T , the number of individuals to randomly pick from the population (with

replacement) for each tournament. The ﬁttest individual in the tournament is then chosen

to reproduce. An independent tournament is run to ﬁll each position in the population. T

controls strength of selection where higher tournament sizes correspond to strong selection

and lower tournament sizes correspond to weak selection (Blickle and Thiele, 1995). A

tournament size of one is equivalent to no selection pressure (i.e., every organism in the

population has an equal chance of being selected to reproduce).

2.2.2 Fitness sharing

Fitness sharing was an early use of ecology to promote diversity in EC (Goldberg et al.,

1987). In ﬁtness sharing, the ﬁtness of every member of the population is reduced in pro-

portion to the density of similar individuals (i.e. they “share” their ﬁtness). Speciﬁcally,

the extent to which each organism needs to share its ﬁtness is calculated by summing the

sharing equation, sh(d), over the population:

1 − (

0

sh(d) =

d

σshare

)α

d < σshare
d ≥ σshare

(2.2)

In this equation, d is the distance (deﬁned in terms of genotype or phenotype) between

the individual that is having its ﬁtness calculated and the other member of the population

that it is being compared to. α is a parameter that tunes the relationship between the

similarity of individuals and how much they compete with each other. σshare (the sharing
threshold) speciﬁes how similar individuals need to be to compete with each other at all.

For each individual, Sh(d) is summed across the population and that individual’s ﬁtness is

38

divided by the sum.

2.2.3 Lexicase selection

In lexicase selection, solutions are evaluated on a large number of criteria (Spector,

2012). Traditionally, these criteria are test cases, but other types of ﬁtness functions are also

eﬀective (Dolson et al., 2018a). To choose an individual to reproduce, the selection criteria

are applied to the population in a random order. Only the individuals that perform best on

each criterion move on to be evaluated on the next, until only one individual remains. In

case of a perfect tie, the winner is selected randomly from the remaining options.

Lexicase selection is highly eﬀective at solving challenging problems in genetic pro-

gramming (Helmuth and Spector, 2015; Helmuth et al., 2015) while maintaining diverse

populations (Helmuth et al., 2016a). Prior work on the population dynamics of lexicase

selection has noted the surprising prevalence of “hyperselection events,” occasions in which a

single individual is selected as a parent for the vast majority of the next generation (McPhee

et al., 2016b). Given lexicase selection’s success at maintaining diversity, the fact that such

events occurred at all was unexpected. Further research, however, has suggested that these

events are not important to lexicase selection’s success at solving challenging problems and

maintaining diversity (Helmuth et al., 2016b). Additional case studies of phylogenetic trees

generated by lexicase selection suggest that lexicase selection promotes the existence of sub-

populations that are focused on a speciﬁc selection criterion (McPhee et al., 2016a).

2.2.4 Eco-EA

Eco-EA (Goings and Ofria, 2009; Goings, 2010; Goings et al., 2012) is an ecology-inspired

approach to maintaining diversity with respect to various ﬁtness criteria. Each criterion is

associated with a limited resource, that only confers ﬁtness beneﬁt in proportion to the

amount remaining. Each resource ﬂows into the environment at some rate, I, and ﬂows out

at some rate, O. Here, we use an inﬂow rate, I, of 100, and an outﬂow rate, O, of .01. Thus,

39

100 units of each resource enter the environment over the course of each generation, and 1%

of the total quantity of each resource exits the environment at the end of each generation.

Resources enter the environment at a constant rate as ﬁtnesses are evaluated, minimizing

stochastic eﬀects from the order in which the population is evaluated.

A few other parameters are necessary for Eco-EA. First, there is Cf, the consumption
fraction. Just as no organism in an ecosystem in nature is capable of consuming all of

a resource in that environment, no solution in Eco-EA is. Cf speciﬁes what fraction of
the total quantity of resource any individual solution consumes. Next, there is m, the

maximum amount of resource an individual is capable of consuming at any one time. For

all experiments, I use Cf = .0025 and m = 5 to maintain consistency with previous Eco-EA
research (Goings, 2010). Additionally, there is the c, the cost of attempting to use a hint. In

order to create negative frequency dependence, there must be a cost to attempting to use a

hint when too many other members of the population are also attempting to use it. In most

scenarios, there is an implicit cost to attempting to use a hint, stemming from the trade-oﬀs

inherent in choosing to take one action over another.

In a more traditional evolutionary

computation setting, however, there is no such implicit cost. Without an explicit cost, there

will not be negative frequency-dependence. Since negative-frequency dependence is a core

element of Eco-EA, I impose a cost of c = 1. Lastly, if there is a cost to attempting to

use a hint, then we also must deﬁne the range within which a solution is considered to be

attempting to use that hint. This range is the niche width, n, which is set to 0.2 for all

of these experiments. Since the maximum score for any hint function is 1, this means that

solutions must score a minimum of 0.8 on a hint function in order to potentially get a reward

or pay a cost from it.

In order to calculate the ﬁtness impact of a resource, we need two more pieces of in-

formation. The ﬁrst is R, the current amount of the relevant resource present. The second

is s, the score on the associated ﬁtness criterion, which is squared to incentivize even small

increases in performance on the criterion. The amount of resource successfully used, A, is

40

calculated with the equation:

0

A =

min((s2 ∗ Cf ∗ R) − c, m)

s < 1 − n
s ≥ 1 − n

(2.3)

A is subtracted from the current amount of resource in the environment, and is used to

update the base ﬁtness of the current organism with the equation:

f itness = f itness ∗ 2A

(2.4)

Thus, successfully using more than c resource will multiply the solution’s ﬁtness by a number

greater than one, whereas using less than c resource will multiply the solution’s ﬁtness by a

number less than one.

These calculations are performed for all resources for all members of the population to

determine resource-adjusted ﬁtness for all candidate solutions. Tournament selection with a

tournament size of two is then performed on the population based on these ﬁtness values.

To ensure that the impact of hints doesn’t wash out small ﬁtness gains on the main ﬁtness

function, every generation created with Eco-EA also contains a copy of the individual from

the previous generation with the highest base ﬁtness.

2.3 Statistical methods

Except where otherwise noted, all statistics are performed using the R statistical com-

puting language (R Core Team, 2017) and plots are made with ggplot2 package (Wickham,

2016). Statistical signiﬁcance for diﬀerences among groups is assessed using Kruskal-Wallis

tests followed by post-hoc pairwise Wilcoxon rank-sum tests accompanied by Bonferroni

corrections for multiple comparisons.

41

2.4 Code Availability

Avida is open source and freely available at https://github.com/devosoft/avida. All

systems other than Avida were built using the Empirical library (Ofria et al., 2018), and

the metric suites developed in chapters 8 and 7 are implemented there. The library is

header only, and designed to be as easy to integrate into existing systems as possible. Code

reliability is ensured with a suite of unit tests automatically run when code is added. As

a proof of this concept, the Avida experiments presented in chapters 8 and 7 are carried

out using a lightly modiﬁed version of Avida that incorporated this implementation of the

metrics (Bryson et al., 2018).

42

Chapter 3

Spatial resource heterogeneity increases

diversity and evolutionary potential

This chapter is adapted from a pre-print on bioarxiv (Dolson et al., 2017).

3.1 Introduction

A central problem in ecology is understanding the mechanisms that promote species

coexistence and richness in natural ecosystems (Hutchinson, 1961; Chesson, 2000). There is

mounting evidence that spatial heterogeneity (the lack of homogeneity across space) is an im-

portant driver of biological diversity (Stein et al., 2014). A common hypothesis, here termed

the positive heterogeneity-diversity hypothesis, states that even seemingly small amounts of

spatial heterogeneity promote diversity by creating new spatially-restricted niches for special-

ist species to inhabit, thereby reducing competition for limited resources (Pianka, 1966a,b;

Stein et al., 2014). This hypothesis is supported by many cases in which spatial hetero-

geneity clearly inﬂuences the dynamics of species coexistence, allowing for greater species

diversity than in homogeneous environments (Pacala and Tilman, 1994; Jankowski et al.,

2014; Barnett and Beisner, 2007; Tilman, 1977; Harpole and Tilman, 2007).

Contrary to the positive heterogeneity-diversity hypothesis, some recent studies have

43

found evidence that diversity should peak at intermediate levels of spatial heterogeneity (Al-

louche and Kadmon, 2009; Allouche et al., 2012; Haller et al., 2013). These results are driven

by two related factors: 1) the area-heterogeneity tradeoﬀ and 2) decreasing contiguity of any

given habitat.

The area-heterogeneity tradeoﬀ is an observation that increasing the number of unique

regions within an environment must decrease the average area spanned by each, if total

geographic area is held constant (Allouche et al., 2012).

In thinking through the eﬀects

of such a decrease, it is important to consider whether these regions represent: A) funda-

mental niches - the full range in which a species is capable of surviving, or B) preferred

habitat - a set of conditions to which a species may be particularly well-optimized to out-

compete others. Research suggesting that the area-heterogeneity tradeoﬀ leads to a peaked

heterogeneity-diversity relationship has so far only considered heterogeneity in fundamental

niches for species of interest (Allouche and Kadmon, 2009; Allouche et al., 2012). Such a

scenario is akin to dynamics theorized to occur in harsh climates, in which environmental

gradients impose strict constraints on where a species can and cannot survive. The amount

of space that is within a species’ fundamental niche places a cap on its maximum population

size. Since smaller populations have an increased risk of stochastic extinction, species whose

fundamental niches occupy less area (relative to the ideal amount of area they need) are

at greater risk of being lost from the community. These extinctions are believed to drive

decreased diversity at high levels of heterogeneity. However this eﬀect should not hold if we

instead focus on heterogeneity of preferred habitats. If all species are capable of surviving

anywhere in the environment, any stochastic extinction will allow for subsequent range ex-

pansion by other species and a corresponding drop in their risk of stochastic extinction. This

process leaves an unexploited niche for future speciation. These factors should balance out

such that, at worst, diversity remains constant as heterogeneity increases (this hypothesis

is supported by the exploration of niche width in (Allouche et al., 2012)). This scenario is

more akin to dynamics observed in the tropics, where species’ realized niches are thought

44

to be constrained primarily by competition. While these two scenarios are both important

to understand, and likely represent extrema of a continuum, here we will focus on preferred

habitat heterogeneity.

Some metrics of environmental heterogeneity, such as edge-density, deﬁne more frag-

mented environments as more heterogeneous. In a system that deﬁnes heterogeneity along

only a single environmental gradient, high values of these metrics can be achieved only if

the environment is highly fragmented (as in (Haller et al., 2013)). Similar to the area-

heterogeneity tradeoﬀ, this decreased patch contiguity increases the likelihood of stochastic

extinctions, reducing population diversity at high levels of heterogeneity. However, when

niches are deﬁned along more than one dimension, it is possible to increase heterogene-

ity without meaningfully increasing fragmentation. Fragmentation is an important issue to

study, but it is only one component of heterogeneity. For simplicity, here we will assume

that patches of any one resource are large enough to support a stable population, although

intersections where multiple patches overlap may not be.

The development of a general theory of the impact of spatial heterogeneity on diver-

sity has been inhibited by the fact that the causal mechanism, populations evolutionarily

diversifying into new spatial niches, occurs over a long time scale. As evolution is the origin

of biological diversity, a thorough understanding of the eﬀects of spatial heterogeneity on

diversity requires consideration of the associated evolutionary dynamics. Evolution occurs

on a broad temporal scale, while the relevant ecological dynamics occur on a broad spatial

scale. Dealing with either of these scales in a biological experiment is challenging, and their

combination is eﬀectively intractable. For this reason, we turn to a Avida to perform in silico

experiments with digital organisms. In digital evolution, rapid generation times and trans-

parent data allow us to perform large-scale experiments while perfectly controlling resource

distributions and measuring organismal phenotypes.

Previous studies in Avida have found that introducing negative frequency dependence

via resource competition is suﬃcient to generate and maintain diverse phenotypes through

45

adaptive radiation into multiple niches (Chow et al., 2004; Cooper and Ofria, 2003).

In

this study, we explore the eﬀect of introducing a spatial component to this resource com-

petition. Each organism in Avida occupies one cell in a two-dimensional grid. We create

spatial heterogeneity by varying the set of resources available in each grid cell. Speciﬁcally,

we place circular patches of resources in the environment. Since many patches will overlap,

this placement technique creates many new spatial niches. We expect that subsets of the

population will adapt to be able to use locally available resources. This process is closely

related to ecological speciation (Schluter, 2009), but, as speciation is challenging to deﬁne

for large and complex populations of asexual organisms such as ours, we will focus instead on

the diversity of distinct phenotypes. Prior research on speciation in spatially heterogenous

environments has demonstrated that some types of heterogeneity can indeed promote evolu-

tionary branching (Haller et al., 2013); that study found a peak in diversity at intermediate

levels of heterogeneity, which we expect is the result of focusing on heterogeneity along a

single environmental gradient.

3.1.1 Evolutionary potential

Here focus on short-term evolutionary potential, which we deﬁne as the probability that

a mutated oﬀspring possesses new beneﬁcial adaptations. It should be clear that the diversity

of a community will be entangled with the combined evolutionary potential of its constituent

populations: On one hand, diversity generally increases evolutionary potential (Lavergne

and Molofsky, 2007; McDonald and Linde, 2002), since a community that is more spread

out across a genotypic or phenotypic space is more likely to contain an organism that is

a short mutational distance away from a given feature. There is also substantial empirical

evidence from the ﬁeld of evolutionary computation that diversity is important to ﬁnding

globally optimal solutions when using evolution as a problem-solving tool (De Jong, 1975;

Goldberg et al., 1987; Lehman and Stanley, 2008; Mouret and Doncieux, 2009). Evolutionary

computation, as a context, allows us to sensibly talk about how “useful” diversity is as a whole

46

and which types of diversity are most eﬀective toward achieving the desired goal (Mouret

and Doncieux, 2009). This concept can also be translated back to biological evolution in

contexts where we are interested in the evolution of a speciﬁc feature or set of features (such

as increased complexity, or recovery of community traits lost due to a severe disturbance).

Thus, understanding how to promote the evolutionary generation of diversity has important

practical applications. Conversely, greater evolutionary potential facilitates the evolutionary

generation of diversity, as populations that are able to adapt rapidly can take advantage of

new niches more quickly.

A previous study in Avida on the relationship between the availability of resources

and the resulting phenotypic diversity found that evolutionary potential peaks when all re-

sources are intermediately plentiful. Interestingly, diversity peaks at lower resource levels

than evolutionary potential (Walker and Ofria, 2012). These results are inconclusive, how-

ever, as to whether a direct relationship exists between diversity and evolutionary potential,

and highlight the complexity of the relationship between diversity, resource availability, and

evolutionary potential. Here, we attempt to clarify the causal relationships between these

factors.

3.1.2 Quantifying environmental heterogeneity

Many metrics have been proposed to quantify spatial heterogeneity, but most fall into

two categories: composition and spatial conﬁguration (McGarigal, 2006). Composition met-

rics ignore the relative positions of resources, and instead deal with what types of spatial

niches are present in an ecosystem. For example, commonly used composition metrics in-

clude richness (a count of how many distinct niches exist), evenness (do all niches cover

the same amount of space?), and composites of the two such as Shannon entropy. For the

purposes of this study, we use Shannon entropy of the number of cells in which each set of

resources is present as a metric of environmental composition. This metric balances both

richness and evenness and is a generally accepted metric of spatial heterogeneity (Forman

47

and Godron, 1986).

Spatial conﬁguration metrics, on the other hand, deal with arrangements of niches

in space and include patch shape complexity, core area, contrast, and isolation. There is

no single metric that can accurately summarize spatial conﬁguration—for any given study

it is important to identify which variables are meaningful. Here, we track the extent to

which single-resource patches overlap to form multi-resource spatial niches. Because we

deﬁne an organism’s phenotype as the set of resources that it is able to consume, a spatial

niche containing more resource types will be hospitable to a wider variety of phenotypes.

Moreover, because of the tendency of simpler functions in Avida to facilitate the evolution

of more complex ones, niches with more types of resources will be easier for populations to

adapt to. Thus, spatial resource overlap is the primary conﬁguration metric we track here.

This overlap is captured by the distribution created by counting the number of resources in

each cell of an environment. We will discuss overlap in terms of the mean, variance, skew,

and kurtosis of this distribution.

3.2 Results and Discussion

3.2.1 Niche stability

Our goal is to isolate the eﬀects of heterogeneity from the eﬀects of fragmentation. To

do so, we must ﬁrst determine the minimum patch size that would be required to stably

support populations specialized to live in those regions.

Likewise, we need to identify the minimum overlap between two patches to ensure the

resulting spatial niche is similarly stable. We measured these threshold sizes empirically

by carrying out two series of experiments in which the world contained only two circular

resource patches. In the ﬁrst set of experiments, we tested all possible pairs of resources at

three levels of overlap: complete, half-way, or none (Fig 3.1). For each region, we deﬁned

the optimal phenotype as one that performed a set of functions that perfectly matched the

48

Figure 3.1: Spatial arrangement of phenotypes at the end of each experiment in
worlds containing patches of two diﬀerent resources, NOT and ORN, each radius 14.
Patch locations are indicated by circles - gold represents the NOT resource, green represents
the ORN resource, and blue represents both resources overlapping. The color of each cell
in the grid indicates the most common phenotype in that cell across the 30 replicates. A:
Patch centers are 0 cells apart, creating complete overlap. B: Patch centers are 14 cells
apart, leading to three coexisting phenotypes. C: Patch centers are 30 cells apart, resulting
in complete patch separation and no phenotype capable of both functions.

available resources.

The optimal phenotype for the spatial niche at the intersection of the two resource

patches evolved consistently when the organisms needed to perform suﬃciently simple func-

tions (Fig 3.1). For example, the optimal phenotype evolved 66% of the time across all

30 replicates within all pairwise combinations of the resource associated with the simplest

function and the resources associated with the ﬁve next simplest functions. The probability

of evolving the optimal phenotype at the intersection within a given resource pair varied

from a low of 30% to a high of 100%, suggesting that it is easier for organisms to evolve

some pairs of functions than others.

This outcome is likely the result of one of the valuable functions serving as a building

block for the other function (Lenski et al., 2003). Such interaction eﬀects illustrate the

complex evolutionary dynamics that spatially heterogeneous environments can cause.

In our second round of experiments, we chose a single pair of resources, but tested all

possible amounts of overlap. If resource patches have suﬃciently small overlap, spatial niches

49

aNoneNOTORNBothbcFigure 3.2: Proportion of populations (out of 30) at each inter-patch distance
that contained organisms performing both functions (NOT and ORN) after 200,000
updates (approximately 4,000 generations). Error bars represent estimates of the standard
deviation under the assumption that the underlying pattern resembles a Poisson distribution.
Number of cells in the area where the patches overlap is speciﬁed in parenthesis for each
distance.

should not be able to maintain a phenotypes capable of both functions. While phenotypes

can extend beyond their target niche, their ﬁtness will decrease in neighboring niches. As

spatial niche size decreases, the probability that the matching phenotype will be lost from

the population due to drift increases. As expected, the probability of the two-resource niche

being occupied by organisms that could perform both valuable functions decreased with

niche size. The conﬁguration with 21 spaces between the centers of the circles, translating

to a niche size of 50 cells, was the smallest at which this probability was greater than 50%

(Fig. 3.2).

50

3.2.2 Eﬀect of spatial heterogeneity on phenotypic diversity

Having established that suﬃciently large overlaps between patches create new niches, we

can now move on to more complex environments with more patches and more heterogeneity.

We used three diﬀerent techniques to arrange patches in the environment.

In the ﬁrst,

we anchored patches in ﬁxed positions and controlled spatial heterogeneity by adjusting

patch size (Fig. 3.3).

In the second, we gave all patches the same radius and controlled

heterogeneity by setting the distance between their center points (Fig. 3.4).

In the third, we randomly selected the number, size, and location of patches in the

environment, resulting in a variety of spatial heterogeneity levels (Fig. 3.5) (for details,

see methods). The resource associated with the most complex logic function, EQU, was

consistently available. We tracked the evolution of EQU as a proxy for the evolutionary

potential of the population. However, the evolution of EQU should diminish the importance

of spatial heterogeneity. To account for this confounding factor, we measured the diversity

for each replicate as the phenotypic diversity of viable organisms in the population at the

last time point prior to the evolution of EQU. For replicates in which EQU never evolved,

we used phenotypic diversity from the ﬁnal time point. To ensure that the presence of these

two types of replicates did not bias the results, we repeated all analyses using data from the

ﬁnal time point for all replicates, and did not ﬁnd any substantive diﬀerence.

Average phenotypic Shannon diversity diﬀered signiﬁcantly across resource patch sizes

and proximities (Size: Kruskal-Wallis test, Chi-squared = 237.05, p< 2.2E-16, Fig. 3.3;

Distance: Kruskal-Wallis test, Chi-squared = 104.0322, p < 2.2E-16, Fig. 3.4). Both size

and distance experiments showed trends in phenotypic diversity qualitatively similar to the

trends in spatial heterogeneity. The locations of the peaks in diversity overlapped with the

peaks in environmental entropy (see Fig. 3.3 and Fig. 3.4).

The third experimental treatment involved continuous rather than categorical variation

in spatial heterogeneity, and we used a simple linear model regressing phenotypic Shannon

diversity on environmental entropy to analyze the results (Table 1, 3.5). We also generated

51

Figure 3.3: Shannon entropy (representing both phenotypic Shannon diversity
and environmental entropy) across resource patch radii. The blue line shows envi-
ronmental entropy at each radius (mean and standard deviation, calculated from 100 ran-
domly generated environments at each radius). Environmental entropy peaks at patch radius
18. In environments with patch radii less than this peak (see sub-ﬁgures a and b), resources
are more spatially isolated; conversely, in environments with patch radii greater than this
peak, resources increasingly overlap (see sub-ﬁgure c). The high-radius extreme represents a
homogeneous control environment (see sub-ﬁgure d), as all patches overlap and completely
ﬁll the world. Final phenotypic diversity at each radius condition tested in Avida is shown
by the boxplots (n=30 for each radius; boxes show 25%, 50%, and 75% quantiles, with
whiskers extending up to 1.5*IQR). Radii that do not share a letter have signiﬁcantly diﬀer-
ent distributions of phenotypic diversity (pairwise two-sided Wilcoxon tests with Bonferroni
adjustment, p<0.05). Sub-ﬁgures a-d show representative environments for the radius sizes
indicated. Phenotypic diversity appears to peak at a patch radius somewhere between 13
and 23, an interval which includes the peak in environmental entropy.

52

abcdFigure 3.4: Shannon entropy (representing both phenotypic Shannon diversity
and environmental entropy) across diﬀerent resource inter-patch distances. The
blue line shows environmental entropy at each distance (mean and standard deviation, cal-
culated from 100 randomly generated environments at each distance; note that the standard
deviation is small and hard to see). Environmental entropy peaks at inter-patch distance
21 (sub-ﬁgure c).
In environments with inter-patch distances greater than this peak, re-
sources are more spatially isolated (sub-ﬁgure d); conversely, at inter-patch distances lower
than this peak, resources increasingly overlap (sub-ﬁgures a and b). Distance 0 represents a
homogeneous control environment, as all resources overlap completely. Boxplots show ﬁnal
phenotypic diversity at each level tested in Avida (n=30 for each treatment; boxes show
25%, 50%, and 75% quantiles, with whiskers extending up to 1.5*IQR). Distances that do
not share a letter have signiﬁcantly diﬀerent distributions of phenotypic diversity (pairwise
two-sided Wilcoxon tests with Bonferroni adjustment, p<0.05). Sub-ﬁgures a-d show repre-
sentative environments for the inter-patch distances indicated. Phenotypic diversity appears
to peak at an inter-patch distance somewhere between 14 and 21, an interval which includes
the peak in environmental entropy.

53

acbdy = Phenotypic Diversity

Estimate SE CI
Variable
Intercept
1.53 ***
Environmental Entropy 0.32 ***

0.07
0.02

(1.39, 1.67)
(0.28, 0.37)

Residual standard error: 0.71 on 498 DF
R2 = 0.28, F-statistic = 196.3 on 1 and 498 DF

Table 3.1: Regression table for model predicting phenotypic diversity, measured as
Shannon entropy of viable organisms on the last measured update before EQU evolved (or
the last update if EQU never evolved), from spatial heterogeneity, as measured by Shannon
entropy of niches in the environment.

models including patch overlap as a metric of spatial conﬁguration, but found no support for

retaining the additional terms. Based on this model, we estimate that phenotypic diversity

will increase by approximately one third of a unit, on average, for every unit that spatial

heterogeneity increases - a fairly substantial eﬀect.

Across the board, populations that evolved in spatially heterogeneous environments

were more diverse than those evolved in more homogeneous environments, regardless of how

that heterogeneity was generated. Increasing spatial heterogeneity partitions resources into

spatially structured areas, thereby creating new, unique niches into which organisms can

diversify and partially escape competition. This trend is in contrast to the peak in diversity

found at intermediate heterogeneity by some other experiments (Haller et al., 2013; Allouche

et al., 2012). We suspect that this result occurred because, in our experiments, increased

heterogeneity does not necessarily come at the cost of decreased fundamental niche size.

Our results agree with the many biological studies that have found a positive relationship

between phenotypic diversity and spatial heterogeneity (Stein et al., 2014).

3.2.3 Spatial heterogeneity and evolutionary potential

The proportion of replicates in which the most complex task (EQU) evolved had a

positive saturating response to increasing patch radius. As patch radius increased from low

54

Figure 3.5: Results from randomly generated environments. A) Scatterplot of en-
vironmental entropy vs phenotypic Shannon diversity across diﬀerent randomly generated
environments (Method 3) (n=500). Line represents the linear model shown in Table 1, and
shaded area represents 95% conﬁdence interval for the slope. B) and C) are two examples
of randomly generated environments.

55

abc(6-10) to intermediate values (15-24), populations evolved the EQU function signiﬁcantly

more frequently. (Fig. 3.6). Evolutionary potential reached a maximum saturating response

such that we were not able to detect a signiﬁcant diﬀerence between the probabilities of EQU

evolving in trials with medium and high patch radii (> 24) after a Bonferroni correction

(Fig. 3.6).

The inter-patch distance experiment highlighted the importance of patch overlap relative

to spatial entropy in evolving EQU. At low distances between patch centers (0-14), resource

patch overlaps and evolutionary potential were at their highest levels (Fig. 3.7). At interme-

diate to large inter-patch distances, evolutionary potential decreased, as did the intersection

between patches. Larger overlaps have multiple resources available, allowing populations to

build up to more complicated functions with greater ease than than in conditions in which

patches overlap less (Lenski et al., 2003).

The common trend from the ﬁrst two experiments is that the amount of overlap between

resource patches (i.e. the number of diﬀerent resources present within a spatial niche) has a

stronger impact on evolutionary potential than diversity. To disentangle the eﬀect of resource

overlap (our spatial conﬁguration metric) and spatial heterogeneity (our spatial composition

metric) on evolutionary potential, we built a linear model using the overlap as a predictor.

Recall from our discussion of Quantifying Heterogeneity that we measured overlap by looking

at the distribution of the number of resource types per cell in an environment, and taking its

mean, variance, skew, and kurtosis. Ultimately, these statistics capture similar information

about the environment to that captured by spatial entropy, as demonstrated by the high

R2 value of the linear model predicting environmental entropy from the mean and skew of

number of resources available per cell (see Table 2). Variance and kurtosis were dropped from

the ﬁnal model because of the high colinearity between mean and variance and between skew

and kurtosis. Thus, we did not include spatial entropy in the overall model of evolutionary

potential, as it would have added largely redundant information.

We used a logistic regression to predict the probability of evolving the EQU function

56

Figure 3.6: Proportion of populations that evolved EQU across diﬀerent patch
radius sizes (n=30 per treatment). Error bars are estimates of standard deviation based
on the square root of the count of populations evolving EQU for each condition. Radii
that do not share a letter have signiﬁcantly diﬀerent probabilities of evolving EQU (pairwise
two-sided Fisher’s exact tests with Bonferroni adjustment, p<0.05).

57

Figure 3.7: Proportion of populations that evolved EQU across diﬀerent inter-
patch distances (n=30 per treatment). Error bars are estimates of standard deviation
based on the square root of the count of populations evolving EQU for each condition.
Inter-patch distances that do not share a letter have signiﬁcantly diﬀerent probabilities of
evolving EQU (pairwise two-sided Fisher’s exact tests with Bonferroni adjustment, p<0.05).

58

y = Spatial Entropy

Variable
Intercept
Mean resources/cell
Skew resources/cell
Mean * Skew

Estimate SE CI
4.27***
1.75***
0.24***
-2.62***

0.07
0.19
0.02
0.08

(4.12, 4.41)
(1.37, 2.12)
(0.19, 0.29)
(-2.77, -2.47)

Residual standard error: 0.36 on 496 DF
Adjusted R2 = 0.93, F-statistic = 2320 on 3 and 496 DF

Table 3.2: Regression table for linear model predicting entropy of the environment
from the mean and skew (and their interaction) of the number of resource types per cell.

given phenotypic diversity and environmental heterogeneity (Table 3). As environmental

descriptors, we used mean number of resources per cell, skew of the number of resources

per cell, and their interaction. We included phenotypic diversity (before EQU evolved) as

the ﬁnal predictor in the model of evolutionary potential, due to its positive association

with spatial heterogeneity. As expected, this model suggests that diversity is important to

evolutionary potential. Every additional unit of phenotypic diversity increases the odds of

EQU evolving by approximately 4.3 times. However, this eﬀect is dwarfed by the eﬀect of

mean resources per cell (i.e. overlap). Increasing the average number of resources per cell by

1 (e.g. adding an additional resource patch that covers the entire grid) would increase the

odds of EQU evolving by approximately 148.8 times. The interaction between diversity and

mean resources per cell has an eﬀect of a similar magnitude. This interaction suggests that

either A) environments with more overlap amplify the beneﬁts of diversity to evolutionary

potential, B) environments with more overlap promote the evolution of diversity that is more

useful to evolving EQU than the diversity promoted by other environments, or C) both.

Ultimately, the amount of overlap between single-resource patches is the strongest driver

of the evolutionary potential of EQU, with global phenotypic diversity playing an important

secondary role. This impact of diversity may be responsible for the saturating evolutionary

potential curves seen in the ﬁrst two experiments; during the plateaus in these curves, overlap

is decreasing while diversity is increasing. Given the strength of the synergies between

59

y = EQU Evolution

Variable
Intercept
Phenotypic Diversity
Mean resources/cell
Skew resources/cell
Diversity * Mean
Diversity * Skew
Mean * Skew
Diversity * Mean * Skew 2.65**

Estimate SE CI
1.30***
1.46***
5.00***
0.74***
4.38***
0.81**
1.70

0.17
0.22
0.84
0.21
0.93
0.26
0.81
1.01

(0.97, 1.66)
(1.05, 1.90)
(3.40, 6.71)
(0.32, 1.13)
(2.60, 6.25)
(0.30, 1.32)
(-0.00, 3.21)
(0.64, 4.62)

Null deviance: 651.08, 499 DF, Residual deviance: 549.88, 492 DF

Table 3.3: Logistic regression table for linear model predicting whether or not
EQU evolved from phenotypic diversity, mean resource types per cell, and skew of resource
types per cell, as well as all interactions between these variables.

functions in Avida, it is possible that the eﬀect of overlap in this system is greater than it

would be in other contexts. These results support the idea that it is possible to select for

diversity that is more or less useful to evolving a speciﬁc feature.

3.2.4 Conclusion

We have presented evidence supporting the positive diversity-heterogeneity hypothesis.

We found that population diversity increases in response to higher levels of spatial hetero-

geneity, rather than peaking at an intermediate level. Further, we have shown that this

increase in diversity leads to higher evolutionary potential of a complex function. How-

ever, the spatial conﬁguration of the landscape in which a population is evolving can have a

much stronger eﬀect on evolutionary potential than diversity alone. The presence of a syner-

gistic interaction between diversity and spatial conﬁguration provides preliminary evidence

that landscape conﬁguration can promote diversity that is more or less useful to evolving

a given trait. These ﬁndings help to solidify the growing consensus that there is a causal

relationship between spatial heterogeneity and population diversity. Moreover, by exploring

these changes in diversity in an evolutionary context, we push toward a more mechanistic

understanding of spatial heterogeneity and diversity.

60

However, a number of open questions remain. Most notably, we only considered di-

versity as phenotypic Shannon diversity within individual populations. It is possible that

other measures of diversity, such as beta-diversity across populations, would have diﬀerent

relationships to spatial heterogeneity. Similarly, the only metric of spatial heterogeneity that

we explored was Shannon entropy. There are many alternative metrics, particularly metrics

of spatial composition (see Quantifying Heterogeneity), that may have interesting subtleties

in their relationship to diversity. Additionally, allowing sexual recombination, motility, or

the evolution of dispersal distance could produce substantially diﬀerent results.

Our results have implications beyond ecology; for example these same dynamics can

be harnessed to improve computational problem solving with evolutionary algorithms. A

common approach to solving hard problems in these systems is to ﬁrst evolve solutions

to easier related problems and use them as building blocks (Lenski et al., 2003; Bongard

and Hornby, 2010; Grabowski and Magaña, 2014). Spatially heterogeneous environments

can be created by rewarding population members that solve a range of simpler problems

in diﬀerent parts of a spatially structured population. Recent research on an evolutionary

algorithm that creates spatial heterogeneity via host-parasite co-evolution supports the idea

that this approach may be useful (Harper, 2012). Thus, it may be possible to use the results

of our study to create environments that are particularly conducive to solving problems

with evolutionary computation. Such an approach would synergize well with previously

discovered positive impacts of spatial structure on diversity maintenance in evolutionary

computation (Whitley et al., 1998; Cantú-Paz and Goldberg, 2003).

Understanding drivers of diversity, particularly with regard to their impact on evolution-

ary potential, also has important practical implications for preserving biodiversity in natural

systems. Landscapes world-wide are experiencing new disturbance regimes and there is cur-

rently little theory predicting how ecosystems will respond (Arnell, 2004; Galloway et al.,

2004). Anthropogenic disturbances tend to be a homogenizing force (McKinney, 2006),

but this eﬀect can be mitigated by urban planning if we understand the evolutionary and

61

ecological dynamics at play. Given the theorized importance of biodiversity to ecosystem

stability and function, knowing how spatial heterogeneity can aﬀect diversity across diﬀerent

ecosystems will be critical to achieving such goals (Craine et al., 2013; Wilmers and Getz,

2005). Thus, going forward, it will be important to solidify our understanding of the factors

aﬀecting the shape of the relationship between spatial heterogeneity and diversity.

3.3 Experimental Design

Speciﬁc details for experimental treatments are described below. For each, we performed

30 replicates and initialized all conﬁguration settings to Avida defaults (unless otherwise

speciﬁed) with the exception of copy mutations, which we set to a probability of 0.0025 per

instruction copied. We allowed each replicate to evolve for 100,000 updates. An update is the

amount of time it take for the average organism in the population to execute 30 instructions;

100,000 updates translates to approximately 30,000 generations on average.

We recorded phenotypic Shannon diversity of organisms and the number of populations

that had at least one organism that performed the most complex logic function at the end of

the experiment. We deﬁne an organism’s phenotype as the set of logic functions it performs.

We performed all statistics in R version 3.0.1 (Wickham, 2016; R Core Development Team,

2013). For each experimental condition, we ran a large number of replicates relative to the

amount of variation in results, ensuring suﬃcient power to detect reasonably-sized eﬀects.

Because each replicate had a unique random number generator seed, all data points were

independent and, as such, the assumptions for all statistical tests were met.

3.3.1 Niche stability

For the niche stability experiments, each circle had a radius of 14 cells. We set up the

patches to be perfectly on top of each other in the complete-overlap condition (0 cell distance

between centers), 14 cells between centers for halfway overlap, and 30 cells for no overlap.

62

We tested these conditions in a full-factorial design with 30 replicates of all potential pairwise

combinations of resources, for a total of 192 treatments.

We used the two simplest logic functions in the functional niche-size assessment experi-

ments because both functions consistently evolved in all of the two-circle overlap experiments

where they were present. These patches also had a radius of 14, and between 0 and 30 cells

between their centers. To ensure that niches created in this experiment were ecologically sta-

ble, we allowed these replicates to continue for an additional 100,000 update "ecology phase"

with no mutations. We measured niche stability as the frequency with which intermediate

phenotypes evolved and persisted until the end of the experiment.

3.3.2 Fixed layout experiments

To systematically measure the eﬀect of spatial heterogeneity on diversity, we ran two

sets of experiments with 16 resource patches (two for each resource type) placed at pre-

determined locations in the environment. In the patch radius experiment, we varied the size

of each circular patch, while in the inter-patch distance experiment, we varied the spacing

between ﬁxed-sized patches. To select treatment values for each of these variables, we wrote

a simulation in Python that created random environments with the speciﬁed parameters

and measured their heterogeneity (available at https://github.com/emilydolson/resource-

heterogeneity). The results of these runs produced the data for the curves in ﬁgures 3 and 4.

We used these curves to select values of each variable that would produce a range of levels

of environmental heterogeneity. Across both sets of experiments, the most complex logic

function was valuable everywhere to ensure that it would be maintained in the population

long enough to be measured if it evolved. We designed many treatments within each experi-

ment (detailed below) and ran 30 replicates for each treatment. While the positions of each

resource patch were consistent within a treatment, we randomized the type of resource in

each patch across replicates.

In the patch-radius experiment, we manipulated spatial resource heterogeneity by chang-

63

ing the radius sizes of the single-resource patches. We placed patch centers 12 spaces apart.

Based on the results of the Python simulation, we used patch radii of 6, 8, 10, 15, 20, 24, 30,

36, 48, and 59 as our experimental conditions (see Fig 3.3). Changing patch radii introduces

a potentially confounding variable: the overall amount of resources in the environment. To

control for this additional factor, we ran another experiment in which we manipulated spatial

resource heterogeneity by changing the distances between the resource patches while holding

patch radius constant. At each inter-patch distance, we set resource patches to a constant

size (radius 14) in a 115 x 115 non-toroidal world. We manipulated distances between patch

centers across treatments (using distances 0, 3, 7, 10, 14, 17, 21, 24, or 29) (see Fig 3.4).

This condition had a constant quantity of each resource, but introduced its own constraints

on the types of environments that could exist at each level of heterogeneity. In combination

with each other, however, these two sets of experiments address the eﬀect of heterogeneity

in a more comprehensive way.

We used Kruskal-Wallis tests to determine if the distribution of phenotypic Shannon

diversity varied signiﬁcantly across treatments. Subsequently, we determined where average

diversity peaked using post-hoc pairwise Wilcoxon tests with a Bonferroni correction for

multiple comparisons. We used Fisher’s exact tests to determine if the number of populations

that performed the the most complex logic function by the end of the experiment changed

across distance treatments. We performed additional Fisher’s exact tests to determine where

evolutionary potential peaked.

3.3.3 Randomized patch experiments

In the two prior experiments, it is hard to disentangle the speciﬁc eﬀects of patterned

resource overlap from the more general eﬀects of heterogeneity, as these variables covaried.

To sort out these factors, we placed a random number of patches of random sizes in ran-

dom locations in the environment. Using the Python simulation from the previous section

to determine parameter ranges that would produce the widest variety of levels of spatial

64

heterogeneity, we tested 500 environmental conﬁgurations, consisting of 4-124 patches with

radii between 6 and 35. Ultimately, the total number of resource patches in an environment

was the primary determiner of its heterogeneity; varying patch sizes and placements achieved

variation in patch overlap within a given level of heterogeneity.

Because these experiments produced continuous rather than categorical data, we used

linear regression models in R to determine the factors inﬂuencing phenotypic diversity and

spatial heterogeneity. For the logistic regression predicting evolution of EQU, we used a

generalized linear model. We narrowed down potential models using AIC; we then evaluated

and trimmed factors with high collinearity until we found the minimally adequate model.

We also ﬁt a model that included total sum of resources across all cells, but there was

a curvilinear relationship between total sum of resources and spatial entropy that rendered

the factors largely redundant and added little explanatory power. We identiﬁed a substantial

outlier, in the form of a single run in which the population went extinct, but removing it

had negligible eﬀect on our statistical analyses, so it was included.

3.3.4 Code and Data Availability

These experiments were carried out using Avida version 2.12.4. Data and scripts used to

generate diﬀerent environmental conditions and analyze data are available via Zenodo with

DOI 10.5281/zenodo.162981 (Dolson et al., 2016a).

65

Chapter 4

Spatial resource heterogeneity creates

local hotspots of evolutionary potential

This chapter is adapted from a paper that appeared in the 2018 Artiﬁcial Life Conference

(Dolson and Ofria, 2017).

4.1 Introduction

Understanding the evolutionary processes that facilitate the evolution of complex traits

is at the heart of many open questions in artiﬁcial life, evolutionary computation, and evolu-

tionary biology. Numerous studies have suggested that spatially complex environments can

drive the evolution of complex traits (Doebeli and Dieckmann, 2003; Wagner et al., 2013;

Dolson et al., 2017), but it is unknown whether this result is a global pattern that is only

visible at the scale of the entire environment, or whether it is driven by local spatial dynam-

ics. If sub-regions exist that disproportionately promote the evolution of speciﬁc adaptive

traits (evolutionary hotspots), they would be a valuable phenomenon to understand. Un-

derstanding what factors produce evolutionary hotspots should help us design environments

that are more likely to evolve complex traits. This ability would also enable us to build

more interesting artiﬁcial life systems, more eﬀective evolutionary algorithms, and biological

66

environments that promote desirable evolutionary outcomes.

Where should we expect to ﬁnd evolutionary hotspots? Instinctively, we might expect

a trait to arise only in regions where it is beneﬁcial. However, selection can only favor traits

once they appear, not promote that initial appearance, so other aspects of the environment

must play a critical role. At the other extreme, hotspots seem unlikely in a homogeneous

environment with no spatial structure. However, few environments in nature are truly ho-

mogeneous, and almost all have some form of spatial structure. Even among digital systems,

once we consider the fact that any kind of interaction between neighboring individuals in-

troduces spatial heterogeneity, few are truly spatially homogeneous. Thus, understanding

if and how evolutionary potential varies across space could be important for understanding

evolutionary dynamics in a variety of systems.

Why should we expect evolutionary hotspots to exist? To better understand this idea,

it is useful to consider Wright‘s ﬁtness landscape metaphor (Wright, 1932). At its simplest,

a ﬁtness landscape is a three-dimensional surface representing two traits (x and y) with

varying values, and the ﬁtness of an organism having a given combination of those trait

values (z). When there are gradients in this space, we expect populations to climb to a

ﬁtness peak over time. Typically, populations will then get stuck at the peak of a local

gradient, regardless of whether it is the highest location in the landscape. As such, the

topography of the local ﬁtness landscape plays a pivotal role in determining what traits will

evolve in a population. A wide variety of factors can inﬂuence what part of a ﬁtness landscape

a population ends up in at a given point in time (Wilke et al., 2001). I predict that spatial

variation across the environment is one such factor. A spatially heterogeneous environment

can be conceptualized as a network of diﬀerent ﬁtness landscapes, one for each location in

space. However, in a dynamic, interdependent ecological community, these ﬁtness landscapes

shift over time. Some local landscapes will tend to guide portions of the population toward

regions of mutational space from which new traits are more easily accessible.

Of course, this network of ﬁtness landscapes may not guide evolving populations in a

67

consistent direction. While it is possible that landscapes could align to perfectly guide a

lineage through the most challenging steps in the evolution of a complex trait, this condition

seems unlikely. We might just as easily expect that the lack of a constant landscape will

only serve to increase stochasticity, rather than having a predictable eﬀect. Perhaps the

ﬁtness landscapes in adjacent locations even work at cross-purposes, with each guiding the

population in opposite directions. Given this uncertainty, the primary goal of this chapter will

be to determine whether hotspots actually occur with suﬃcient frequency and strength to be

worthy of further consideration. Addressing this question requires vast numbers of repetitions

of experimental evolution in complex spatial environments. Conducting such an experiment

in a wet lab system would not be worth this immense eﬀort until additional theory has been

developed. Once we have generated concrete hypotheses about what we would expect to

see in biology, more targeted follow-up experiments in a wet lab system may be worthwhile.

Here, I lay the groundwork for such theory by performing experiments in a digital system,

where I will be able to precisely control the spatial distribution of environmental conditions

and measure the drivers behind any spatial variation in evolutionary potential that I detect.

Here, I focus on a simple environment to avoid adding unnecessary complications to

an already complex question. To this end, I use sessile, asexual organisms in an unchang-

ing abiotic environment where diﬀerent sets of traits are advantageous in diﬀerent regions.

Importantly, these traits are related to each other. This set-up is roughly analogous to a

wide variety of scenarios within biology, ranging from bacteria surviving on a surface treated

with a range of antibiotics (similar to (Baym et al., 2016)) to plants adapting to survive on

diﬀerent soil types within a region.

4.2 Biological Background

Biologists have traditionally studied patterns of diversiﬁcation rather than patterns

of novel trait evolution (but see (Baym et al., 2016)). These processes have important

68

distinctions from each other; diversiﬁcation usually requires either physical separation of

sub-populations or conﬂicting selection pressures, whereas novel trait evolution does not.

Moreover, they are maximized by diﬀerent conditions (Walker and Ofria, 2012). Neverthe-

less, diversiﬁcation and novel trait evolution are closely related, as diversity is thought to

increase evolutionary potential (Rouzic and Carlborg, 2008). The most well-studied biologi-

cal pattern pointing to spatial variation in evolutionary processes is the latitudinal diversity

gradient: a consistent pattern of high biodiversity near the tropics, and progressively less

toward the poles (Hillebrand, 2004). While it remains impossible to perform controlled evo-

lution experiments on a planetary-scale, progress has been made on untangling the drivers of

this pattern. Most importantly for the purposes of this chapter, evidence is mounting that

diversiﬁcation rates are higher in the tropics (Mittelbach et al., 2007). The drivers of this

variation are still debated, but many hypothesized drivers are related to spatial dynamics

within the environment, most notably the idea that the tropics may contain more environ-

mental gradients, barriers to gene ﬂow, and other spatial properties that promote speciation

(Moritz et al., 2000; Doebeli and Dieckmann, 2003). Although these dynamics are more com-

plex than those addressed here, particularly given that many rely on sexual reproduction,

they suggest that spatial variation in evolutionary processes has broad relevance.

In related work, conservation biologists have sought to locate regions with an elevated

intensity of evolutionary processes. This research stemmed from an increasing recognition

that preserving the processes that generate diversity is at least as important as preserving

existing diversity (Moritz, 2002; Ferrière et al., 2004). In accordance with this motivation,

these studies tend to focus on small regions of land, making them more directly comparable

to the research presented here. At a global scale, certain regions simply lack the genetic

background for a given trait to have a chance of evolving (e.g. a novel heat tolerance

mechanism seems unlikely to evolve in the arctic). Our question centers on more localized

regions that are spatially heterogeneous with regard to the extent to which various related

traits are useful. The results from conservation biology suggest that variation in evolutionary

69

potential on such a local scale is plausible; even within relatively small regions, these studies

have demonstrated substantial spatial variation in phylogenetic diversity (Forest et al., 2007;

Vandergast et al., 2008).

Most of the processes that are hypothesized to be involved in the formation of hotspots

of phylogenetic diversity are based on insights from simpler systems, most notably island

groups.

Islands represent clearly delineated spatial patterns and processes, making them

an excellent context in which to build up insights about how eco-evolutionary dynamics will

play out across space. For example, islands are known to promote diversiﬁcation via adaptive

radiation (Losos, 2010), and the formation of new islands within a system has substantial

impacts on the assembly of ecological communities over evolutionary time (Gillespie, 2004).

Island systems are also a particularly pronounced example of variation in spatial structure.

Spatial population structure has a profound eﬀect on how evolution will proceed; more

structured populations (i.e. those with more constraints on where oﬀspring end up relative

to their parents) are generally more diverse and more likely to reach the highest peak in a

ﬁtness landscape (Tomassini, 2005; Nahum et al., 2015). The theory of island biogeography

makes various shorter timescale predictions (MacArthur and Wilson, 1967). More recently,

biologists have begun to translate theory developed for island systems to more continuous

land masses, often by treating regions of a species’ preferred habitat as "islands" (also called

patches) within a matrix of less suitable habitat (Forman, 1995). Many have argued this

approach is likely an oversimpliﬁcation (McGarigal et al., 2009; Franklin and Lindenmayer,

2009). A better understanding of the relationship between the dynamics observed in island

systems and the dynamics that occur in more continuously varying environments will likely

be important to understanding the spatial evolutionary dynamics that create evolutionary

hotspots.

70

4.3 Methods

4.3.1 Experimental design

I arbitrarily selected eight of the 500 randomized environments from 3 for further study

without examining them beforehand. In each environment, I performed 100 replicate runs

of Avida. Each replicate was run for 100,000 updates (approximately 2,000 generations). I

also performed 100 replicate runs of a homogeneous control condition where all resources

were globally available. From these results, I extracted the coordinates of the grid cell in

which each trait ﬁrst appeared within each run. While a trait may evolve independently

multiple times within a run of Avida, including these subsequent evolutions would introduce

various biases into the dataset (especially when traits were lost and then re-evolved). Thus,

I consider only the ﬁrst organism ever to display a given trait within a replicate, before

selection has an opportunity to play a role.

These extracted (x,y) coordinates translate into 81 separate point patterns (one for each

of the 9 traits across each of the 8 environments plus the control) (see Figure 4.1A): one for

each of the nine diﬀerent traits in each of the eight environments. Each pattern contains a

maximum of 100 points. Many patterns contain fewer points, as it is not guaranteed that

every trait will evolve in every replicate. Because each replicate is completely independent

of each other replicate, the only relationship between points in the same pattern is that they

evolved in the same environment.

4.3.2 Statistical Approach

To determine whether or not diﬀerent regions have diﬀerent evolutionary potential, I

need to determine whether the point pattern is signiﬁcantly diﬀerent from complete spatial

randomness. To make it possible to appropriately correct our statistics for multiple com-

parisons, I performed the test for spatial randomness in two steps: 1) a test to see whether

each pattern, as a whole, was random, and 2) a follow-up test to determine which regions of

71

Figure 4.1: Hotspot location technique. A) An example point pattern, generated from
100 runs of Avida within a single environment. This pattern corresponds to the most complex
trait, EQU. Each point represents the location in space of the ﬁrst organism within one of
the 100 runs that was able to perform EQU. B) Kernel intensity map generated from the
point pattern in part A. C) Regions within that kernel intensity map that are substantially
more intense than observed in the Monte Carlo simulations (i.e. the hotspot).

non-random patterns had signiﬁcantly more points than expected by chance.

For the overall randomness test, I calculated a test statistic for various distances, r. For

each point in the pattern, I counted the number of other points falling within r distance

units of it. I then averaged this number across all points to get our test statistic for that

value of r. For each point pattern, I generated a range of expected results under the null

hypothesis of complete spatial randomness by running 100 Monte Carlo simulations where I

randomly placed the number of points in the actual data across an area of the same size. If

the value of the test statistic calculated from our observed data was greater than the highest

value from the Monte Carlo simulations for more than 5% of the r values, I concluded that

the observed pattern was substantially more clumped than I would expect to see by chance

(see Figure 4.2). A test statistic below the range of those observed for the Monte Carlo

simulations would suggest that the observed data was more uniform than I would expect

by chance; such a result would be unexpected for our experiments. Note for those familiar

with spatial statistics: this statistical test is functionally equivalent to Ripley’s K function

(Ripley, 1981). However, since Ripley’s K function assumes ﬁrst-order spatial stationarity

72

Figure 4.2: Results of performing the test for spatial clumping on the point pattern
from Figure 4.1A. Observed values of the test statistic (black line) are well above the values
observed in the Monte Carlo simulations (shaded area), indicating a clumping in this pattern.
The red line indicates the theoretically derived expected value for a random pattern.

.

and the data is exclusively the result of ﬁrst-order trends, I do not expect to see the same

patterns traditionally observed in Ripley’s K function.

For those point patterns that were signiﬁcantly more clumped than I would expect by

chance, I performed a follow-up test to determine the location of the hot spot. I calculated

a kernel intensity surface (essentially a heat map of point density across the environment)

for each of these point patterns using the spatstat library (Baddeley et al., 2015) (see Figure

4.1B). Then, to determine whether the patterns within the data were signiﬁcantly more

intense than I expect from a random process, I performed a Monte Carlo hypothesis test

with 100,000 simulations. I simulated patterns generated under the assumption of complete

spatial randomness (our null hypothesis) by randomly selecting points from a uniform 60x60

lattice grid approximating the lattice of cells in Avida. For each point pattern, I selected the

number of points observed in the original pattern. Kernel intensity values were created for

each of these simulations. For each cell in the grid, I compared the experimentally-derived

73

Figure 4.3: Hotspots across all traits and all environments. Each panel represents a
diﬀerent environment. Background colors represent the diﬀerent combinations of resources
that are present in each location; each color represents a diﬀerent combination, with colors
assigned such that higher hue values roughly correspond to more complex environments
(either in number of resources or complexity of traits associated with resources). Each
polygon color represents the outline of hotspots for the trait speciﬁed in the legend. Note
that only hotspots signiﬁcant at an alpha-level of .00001 are shown. This conservative value
was necessary to be certain that hotspots are present despite the large number of hypothesis
tests performed. However, in practice, it likely translates to an underestimation of the true
size of many of the hotspots.

intensity to the distribution of simulated intensity values. Cells in which our observed value

was higher than 99.999% of the simulated values were labeled as hotspots of evolutionary

potential for a given trait. This signiﬁcance threshold (p < .00001) was selected to ensure

all results are signiﬁcant after a sequential Bonferroni correction for multiple comparisons

across all 3600 grid cells.

4.3.3 Code and data availability

Scripts used to generate environment ﬁles are available via Zenodo with DOI

10.5281/zenodo.162981. Statistical analyses were performed using the spatstat and qqtest

74

R libraries (Baddeley et al., 2015; Oldford, 2016). Data and scripts used in this project are

available at https://github.com/emilydolson/evo_in_space.

4.4 Results and Discussion

4.4.1 Existence of evolutionary hotspots

Out of the 81 point patterns, 43 diﬀered signiﬁcantly from random. All 43 of these

non-random patterns showed signiﬁcant clumping of points. For 40 of these 43 non-random

patterns, the kernel density map had at least one region of higher intensity than I would

expect to see by chance (hotspot) (see Figure 4.3). It is possible that the three patterns

without hotspots instead had areas of signiﬁcantly decreased evolutionary potential. Al-

ternatively, these patterns may contain hotspots that were too weak to be detected by the

highly conservative threshold for signiﬁcance. Every test environment had hotspots for at

least two traits, while none of the point patterns from the spatially homogeneous control

displayed signiﬁcant clumping, suggesting that these hotspots are a product of the spatially

heterogeneous environment. These results provide clear evidence that hotspots of evolution-

ary potential not only exist but are relatively common; some regions of the environment

promote the evolution of new traits to a greater extent than others.

In most environments, there appears to be strong overlap in hotspots for the evolution

of diﬀerent traits (see Figure 4.3). However, there are some environments in which this is

not the case, and even the environments with the neatest overlap have some traits with non-

overlapping hotspots. Some amount of overlap is to be expected, for three reasons: 1) If any

trait is easier to evolve in a given region, then lineages living in that region are more likely

to be successful as a result of having evolved the trait, giving them an older evolutionary

history, 2) many traits serve as building blocks for other traits, so possessing some traits

is an indicator that a lineage is in a part of the ﬁtness landscape from which other traits

are easier to reach, and 3) some traits are close to each other within the ﬁtness landscape

75

(for example, the two most complex traits, XOR and EQU, which often have hotspots in

nearly identical locations). The fact that the overlap is not universal suggests that the local

environment has eﬀects on the likelihood of evolving a trait that go above and beyond the

ability of lineages to perform other traits.

4.4.2 Potential drivers of hotspots

Now that I have established that a spatial pattern exists in the locations where traits

ﬁrst evolve, the obvious next question is: what causes this pattern? Since all traits have

potential to serve as building blocks for each other, the simplest explanation would be that

locations with more resources have higher evolutionary potential. Note that because I am

considering only the ﬁrst mutation to confer each trait, the resource for that trait cannot

exert selective pressure.

To test this explanation, I performed a multi-variate linear regression attempting to

predict the value of each position in the kernel intensity map from a series of boolean variables

indicating which resources were present in a cell. I tried such regressions on various subsets of

the data: a single trait within a single environment, a single trait across environments, and all

traits across all environments. Models built from data on a single trait in a single environment

varied wildly in the percent of variation that they explained, but many explained a high

percentage (R2 values ranged from .11 to .95). However, little (if any) of this explanatory

power generalizes across environments. Resources that increase evolutionary potential for a

given trait in one environment (as indicated by a regression coeﬃcient signiﬁcantly greater

than 0) may decrease it in a diﬀerent environment (as indication by a regression coeﬃcient

signiﬁcantly less than 0). Linear models calculated across environments were unable to

reconcile these diﬀering patterns, explaining minimal variation in evolutionary potential (R2

values ranged from .03 to .13). A similar set of models in which the predictor variables

were the distance to each resource was similarly ineﬀective. While high degrees of spatial

autocorrelation in the kernel intensity maps mean that the assumption of independence is

76

violated for these linear models, such a reduction in degrees of freedom should only inﬂate the

apparent variation explained. The failure of this simple mechanism to explain our observed

results suggests that more complex processes are driving these eﬀects. Note: Interestingly,

there does appear to be a correlation between hotspot location and the pink background

color in Figure 4.3. This suggests that the color selection algorithm may be making useful

generalizations that are worth further exploration. Nevertheless, it does not appear to be

a complete explanation (not all pink regions are hotspots and not all hotspots are in pink

regions) and further exploration will be necessary to determine what factors are driving the

correlation.

Many hotspots appear to overlap with borders between diﬀerent combinations of re-

sources. Such regions likely have elevated local diversity, as diﬀerent phenotypes will be

better adapted to the diﬀerent regions. As discussed above (see Background), there is rea-

son to believe that diversity leads to increased evolutionary potential. Thus, variation in

local diversity is another possible driver of evolutionary hotspots. To test this hypothesis, I

calculated local diversity across the environment for time points immediately before a novel

trait ﬁrst appeared. Speciﬁcally, I measured local diversity as the Shannon entropy of phe-

notypes within the 25 cells in the 5x5 patch around a given focal cell.

I performed this

calculation for each cell in the environment at a given time point, producing a map of local

diversities. Within each of these maps, I calculated a percentile for each of the 3600 local

diversity values, based on the proportion of sites that had a lower diversity.

Next, I located the cell in which a novel trait was about to evolve and recorded its

percentile. If diversity has no impact on the probability of a new trait evolving, I would

expect all percentile values to be equally likely (i.e.

follow a uniform distribution). If, on

the other hand, new traits are more likely to evolve in areas with elevated local diversity, I

would expect high percentiles to be overrepresented.

To test this hypothesis, I compared the distribution of observed percentiles to a uniform

distribution. For the simplest traits, low percentiles were overrepresented due to the presence

77

Figure 4.4: Quantile-quantile plot comparing the distribution observed diversity
percentiles for the most complex trait (EQU) to a uniform distribution (Oldford,
2016). The shaded area represents the 95% conﬁdence interval for where I would expect
this line to fall if these distributions were identical. Since the line is mostly within this
area, I conclude that the distribution of ranks does not deviate substantially from a uniform
distribution. This ﬁnding suggests that there is nothing special about the level of diversity
in the regions where traits are more likely to evolve. Results for other complex traits are
similar.

of populations that contained large swaths of area populated by minimally evolved organisms

with none of the traits. However, among the ﬁve most complex traits, I found no substantial

deviation from a uniform distribution (see Figure 4.4). This result suggests that traits did

not evolve more frequently in more diverse regions. While it is possible that this lack of

an eﬀect is the result of the speciﬁc diversity metric used, it strongly suggests that local

diversity is not an important driver of hotspot location. This result may also cast doubt on

the idea that hotspots of diversity are hotspots of all evolutionary processes, but I would

need to examine phylogenetic diversity to fully address that point.

A third potential driver of evolutionary hotspots is the temporal series of local conditions

78

that lineages experience over evolutionary time as each oﬀspring is born in a slightly diﬀerent

location than its parent. Perhaps some sequences of environments are more conducive to the

evolution of certain traits than others. To explore this idea, I located the ﬁrst organism with

each trait across all replicates and traced the spatial path of their lineages. I then overlaid

these paths on the corresponding environment (see Figure 4.5). While a thorough statistical

analysis of the variation in paths is beyond the scope of this chapter, I do see some qualitative

patterns. Indeed, some traits display consistent patterns that are not seen in paths generated

from the control data. These patterns include the middle portions of paths, indicating that

not just the ending conditions are likely to matter. For instance, in the three environments

in which most of the hotspots are overlapping and in the center (D, E, and G in Figure

4.5), it is rare for any lineage that has ever gone through the lower right corner to be the

ﬁrst one to evolve the most complex task (EQU). This pattern is almost certainly related

to the fact that all replicates are seeded with a single organism in the upper left corner; in

these environments, successful lineages go directly to the hotspots and don’t leave. In other

environments, however, successful lineages appear to travel through diﬀerent sequences of

regions (see Figure 4.5F, as an example). While more research is required to understand

the mechanisms behind such patterns, this eﬀect is the most promising explanation for the

existence of evolutionary hotspots.

4.5 Conclusions

I have demonstrated the existence of spatial variation in evolutionary potential across a

heterogeneous environment. Research on what speciﬁc aspects of a local region are responsi-

ble for elevated evolutionary potential is ongoing. The failure of obvious explanations, such

as resource presence/absence and local diversity, to predict the location of hotspots suggests

that the actual drivers are complex and nuanced. Preliminary investigations of the sequence

of locations that a lineage passes through suggest that this avenue is promising for further

79

exploration; there appear to be broad commonalities among paths of lineages that evolved

the same trait in the same environment.

Understanding the drivers of spatial variation in evolutionary potential would give us a

valuable tool for controlling evolutionary trajectories. The most immediate applications are

in evolutionary computation. Many evolutionary algorithms vary the ﬁtness landscape over

time, starting with easy problems and working up to more challenging problems (Hornby,

2006; Ovaska et al., 2009). This approach often requires hand-tuning of the diﬀerent problems

and when they appear in the environment. Varying the ﬁtness landscape across space instead

could achieve the same outcome. If we can purposefully build spatial networks of ﬁtness

landscapes to promote the evolution of a solution to the problem, that would make such an

approach even more powerful.

In the longer term, once our ﬁndings have been replicated in a biological system, I hope

to use these same concepts to promote the evolution of biological traits of interest. For

example, we may be able to engineer habitats that increase the chances of species adapting

to climate change through evolutionary rescue. Conversely, we may also be able to arrange

environments to inhibit the evolution of undesirable traits, such as antibiotic resistance in

bacteria. Overall, this line of research presents a variety of promising applications.

80

Figure 4.5: Spatial paths taken by the ﬁrst lineages to evolve the most complex
trait (EQU) from ﬁve arbitrarily-selected replicates from each environment. The
path from each replicate is colored a diﬀerent shade of gray. As in Figure 4.3, background col-
ors represent the set of resources in each part of the environment. I) shows the homogeneous
control environment.

81

Chapter 5

Ecological Theory Provides Insights into

Evolutionary Computation

This chapter is adapted from a pre-print on PeerJ Computer Science (Dolson et al.,

2018d).

5.1 Introduction

Evolution and ecology are fully entwined in nature (Kokko et al., 2017). As such, evo-

lutionary theory and ecological theory need to build upon each other to realize their full

potential. Here, I argue that ecological theory is similarly important in the context of the

theory of evolutionary algorithms. We can better understand much of what happens in evo-

lutionary algorithms if we reformulate our analyses with ecological theory in mind. In partic-

ular, concepts such as diversity maintenance, species co-existence, niches, and spatial eﬀects

are suddenly at the forefront of concerns in such a perspective. While the No-Free-Lunch

theorem states that there is no single best algorithm for solving all possible optimization

problems (Wolpert and Macready, 1997), progress is possible because real-world problems

have patterns that can be exploited, which tend to cluster into groups with similar properties.

Algorithm can solve problems from across these classes by simultaneously exploring qual-

82

itatively diﬀerent paths through the solution space. Such a technique eﬀectively produces

multiple “species” of solutions that coexist in an ecological community. Thus, by leveraging

ecological theory, we should be able to use ﬁrst principles to design evolutionary algorithms

that take maximal advantage of this property and mitigate the practical ramiﬁcations of the

No-Free-Lunch theorem.

There are two main sets of tools from ecology that I expect can be helpful for analyzing

evolving EC communities: mathematical theory and empirical techniques to evaluate that

theory. Ecologists use mathematical theory to make a priori predictions about the fate of

natural communities. However, translating these predictions to work with EC systems re-

quires careful attention to implicit assumptions about nature that may or may not carry over.

For example, ecologists can safely assume that organisms inhabit a ﬁnite three-dimensional

Euclidean space. A more substantial problem is that most ecological theory assumes that

evolution is too slow to be relevant and attempting to introduce evolution can create messy

feedback loops (although eco-evolutionary dynamics research is bridging this gap (Hendry,

2016)). Another important limitation is that equations in ecology generally calculate the

average or expected behavior of a system. Such results can be misleading in highly contin-

gent processes, such as evolution, where rare mutations can redirect a population into a new

region of the ﬁtness landscape. Nonetheless, ecological theory is a useful starting point for

predictions. As such, I will lay groundwork here for using it to guide development of EC

systems.

Empirical measurements complement ecological theory, allowing biologists to assess pre-

dictions and reﬁne theoretical frameworks. Furthermore, empirical measurements allow us

to uncover general patterns even in situations where theory is lacking (Dolson et al., 2018b).

To this end, ecologists have developed a toolbox of techniques to evaluate the diversity of a

community. Some of these metrics, such as richness (number of unique species) and Shannon

diversity (entropy) are already used in EC. Other valuable metrics have not yet been adopted

in EC. Phylogenetic diversity (Winter et al., 2013), for example, can assess the extent to

83

which an algorithm is maintaining independent subpopulations that are exploring distinct

regions of the ﬁtness landscape vs. organisms that only recently diverged from a common

ancestor (Dolson et al., 2018b). This distinction can impact how useful the population diver-

sity is likely to be for adaptive evolution. We can also use empirical ecological measurements

to assess hypothesized mechanisms for diﬀerent diversity maintenance techniques. For exam-

ple, ecologists often build graphs representing the pairwise interactions between community

members; the topology of these graphs signals how the community is likely to change over

time (Fontaine et al., 2011). We can do the same for interacting communities in EC.

In the rest of this Introduction, I will provide additional background on ecological theory

and the selection schemes that I will be examining. In the rest of this contribution, I will

examine four illustrative selection schemes in the light of ecological theory. Our goal is

to develop unifying principles for understanding the ecological dynamics inherent in an EC

system and predicting its resulting adaptive potential. I intend for this theoretical framework

to provide powerful insights about the dynamics in EC systems and lay the groundwork for

easier application of ecological ideas in the future.

Box 1: Glossary

Biological ﬁtness (WWW ): An individual’s expected number of oﬀspring.

Equalizing forces: Forces that increase the similarity of ﬁtnesses in a population.

Interaction network: A weighted directed graph describing interactions between mem-

bers of a population.

Phylogenetic diversity: The amount of diversity in ancestry represented within a

population.

Phylogeny: The graph of parent-child relationships in a population.
R∗:R∗:R∗: The lowest quantity of a limiting resource for which a given species’ population
growth rate is positive. In an EC context, can be thought of as the lowest quantity of a

84

limiting resource for which attempting to use that resource increases ﬁtness.

Stable coexistence: A scenario in which a group of species are expected to coexist

indeﬁnitely.

Stabilizing forces: Forces that increase the ﬁtness of rare species.

5.1.1 Ecological theory

One way that ecological theory can inform EC is to identify conditions under which

diﬀerent types of organisms can or cannot stably coexist. The ability to predict coexistence

dynamics will inform us about types of lineages that can simultaneously explore the ﬁtness

landscape. As such, determining coexistence criteria is a primary focus of this chapter.

Early theory on coexistence in ecology focused on competition for resources that are

both limited in quantity and limiting of the growth rate of species that rely on them. The
most important value for determining coexistence in this context is a species’ R∗R∗R∗, the resource
availability level at which the species’ population growth rate is 0. If the current resource
availability is less than a species’ R∗, that population will shrink, reducing the utilization
of the resource and (in the absence of other species) increasing its availability. Conversely,
if the current resource availability is greater than the R∗ for a species, its population will
grow. Assuming no other ecological factors are aﬀecting a species abundance, its population
size will stabilize with a resource abundance equal to R∗.

In the simplest case of species competing for a single resource, the species with the
lowest R∗ should out-compete the others (Grover, 1997). That species’ population will
continue to increase, depleting the resource until it reaches the species’ R∗; meanwhile,
the resource availability will dip below the other species’ R∗, causing their populations to
decrease until they ﬁnally go extinct, having been out-competed. Adding additional resource

types introduces the potential for stable coexistence among multiple species if each species

is a better competitor for a diﬀerent resource and consumes more of that resource (Chase

85

and Leibold, 2003) (summarized in (Letten et al., 2017)). Note that I use the word “species”

here to be consistent with ecology; the same insights apply to other taxonomic units, such

as phenotypes or genotypes. In most cases, these are more appropriate taxonomic units in

the context of EC.

This resource-mediated coexistence eﬀect is an instance of a broader rule: species can

coexist if individuals of each species compete with each other more than they compete with

individuals of other species (Chesson, 2000). This rule works because it forces species to be

self-limiting, creating negative frequency dependent dynamics where each additional mem-

ber of a species reduces the competitive ability of other individuals of the same species

(Adler et al., 2007). The magnitude of diﬀerence between interspeciﬁc (between species)

and intraspeciﬁc (within species) competition that is required to enable long-term stable

coexistence is determined by the diﬀerence in the ﬁtness of the two species. If one species is

dramatically more ﬁt, it can drive the other species to extinction even with limited competi-

tion (Chesson, 2000). Ecologists draw a distinction between “stabilizing” dynamics, which

alter the ratio of interspeciﬁc competition and intraspeciﬁc competition, and “equalizing”

dynamics which alter the diﬀerence in ﬁtness between two species (Adler et al., 2007). In

the absence of any stabilizing dynamics, equalizing dynamics will lead to an unstable equi-

librium; even if two species have identical ﬁtness, one should eventually drift to extinction.

Stabilizing dynamics, on the other hand, actively correct any deviation from equilibrium.

Note that in biology, “ﬁtness” is strictly a measurement of reproductive output. This def-

inition is in contrast to the externally-deﬁned “ﬁtness functions” used in EC to determine

competitive advantage.

In more heterogeneous environments, there is another mechanism for species coexistence:

spatial segregation. Such segregation can arise due to a variety of factors. The simplest of

these factors is the presence of physical barriers that inhibit movement between regions.

In biology, physical barriers play an important role in facilitating diversiﬁcation (Faith,

1992; Dolson et al., 2016b), and these beneﬁts carry over to EC in the context of island

86

models (Tomassini, 2005). Spatial segregation can also be brought about by environmental

conditions. For example, one region might require tolerance to extreme heat, while another

might require tolerance to acidic soil. In this case, species would only compete when both

could inhabit the same regions. The resulting traversal (over evolutionary time) of multiple

environments has a profound eﬀect on the traversal of the ﬁtness landscape (Dolson and

Ofria, 2017). An even weaker form of spatial segregation occurs when species are capable of

surviving in range of environments but are best optimized to a speciﬁc region (Dolson et al.,

2017).

What do these coexistence dynamics mean for EC? First, when choosing parameters,

we should consider the circumstances under which they promote coexistence. When two

independent lineages are traversing the ﬁtness landscape, what genotypic or phenotypic

diﬀerences are required for them to both stably persist? Second, when designing selection

schemes, we should determine how many lineages we want to co-exist and how frequently

we want lineages to turn over.

If it is costly to stochastically lose established lineages,

we should consider including stabilizing dynamics in addition to or in place of equalizing

dynamics. Finally, we should give careful thought to the metaphors that we use to compare

EC to biological populations; determining whether a given selection scheme is more akin to

competition for resources, spatially segregated habitats, or something else all-together will

make it much easier to draw parallels and gain a deeper understanding. If we can show that

a system in EC is isomorphic to a system in biology, we can rapidly import the vast wealth

of insights from biological research into how that system behaves.

5.1.2 Empirical ecological techniques

Empirical ecologists seek to ﬁnd general patterns in complex, messy data. While data

from EC may be less noisy than ﬁeld data, the interconnections within any ecological com-

munity are still complex. Ecologists have developed many tools for extracting meaning from

this complexity, which may be equally helpful in trying to understand the ﬁne-scale dynam-

87

Figure 5.1: Conceptual illustration of phylogenetic diversity metrics. The letters
around the outside represent the full set of extant taxa and the circle in the middle indicates
their most recent common ancestor. Branching points indicate the locations of intermediate
taxa (note that in biology these have to be inferred, but in EC we have perfect informa-
tion). The three panels indicate the three diﬀerent facets of phylogenetic diversity: richness,
divergence, and regularity. Reproduced from (Tucker et al., 2017).

ics of EC systems. Here I focus on two approaches: phylogenetic analysis and interaction

networks.

5.1.2.1 Phylogenetic analysis

Phylogenetic analysis refers to a suite of metrics that are used to quantify the topology

of a populations’ phylogeny (ancestry tree) (Winter et al., 2013). In biology, these trees

generally need to be inferred from extant species, but in EC we can record the actual tree as

it forms. Many of these methods measure the amount of evolutionary history discernible in a

population. For example, evolving sub-populations that have stably coexisted for a long time

will reﬂect deeper evolutionary history than a single population that is all descended from

a recent common ancestor. Even if the latter population contains many unique phenotypes,

they will all be relatively close to each other on the ﬁtness landscape. As such, the population

is likely exploring only one basin of attraction within the ﬁtness landscape at a time, limiting

the rate of adaptation. Phylogenetic diversity provides more direct evidence about the

eﬃcacy of diversity maintenance techniques than the more commonly used measures of

genotypic and phenotypic diversity.

Techniques for analyzing the topology of phylogenies can be split into three categories

(Tucker et al., 2017) (see Figure 5.1 for an illustration of each):

1. Richness: the total quantity of evolutionary history represented

88

2. Divergence: how spread out the population is in phylogenetic space

3. Regularity: how evenly the population is divided across evolutionary space

Here, I focus on richness and divergence, as our goal is to analyze selection schemes, which

regularity does not oﬀer clear insight into. A number of metrics quantify phylogenetic rich-

ness and phylogenetic divergence (summarized in (Winter et al., 2013; Tucker et al., 2017)).

Here, I measure phylogenetic richness with the original phylogenetic diversity metric (simply

referred to as “phylogenetic diversity” (Faith, 1992)), as the count of nodes in the minimal

spanning tree connecting each extant genotype to the most recent common ancestor of all

extant genotypes. I measure divergence as the mean pairwise distance between genotypes in

the population (Webb and Losos, 2000).

Note that in most cases biologists implicitly assume that trees are rooted at the most

recent common ancestor of all taxonomic units being compared. This assumption is un-

avoidable in the context of biology, because phylogeny reconstruction techniques cannot

make inferences about anything preceding the most recent common ancestor.

In EC, we

have the full history. However, including it would not add additional information; phyloge-

netic diversity would increase by a constant for each member of the population and mean

pairwise distance would not change. Note that in this chapter I calculate these metrics on a

per-genotype basis, but they can also be calculated per-individual or per-phenotype.

5.1.2.2

Interaction networks

Since ecological communities are collections of interacting organisms, drawing a graph

representing the network of pairwise interactions is a useful technique for understanding

them (Fontaine et al., 2011). The topology of this graph illustrates the selective pressures

that organisms place on each other (see Figure 5.2). Ecologists often generate these networks

from knowledge of the species involved such as the fact that two species use the same limited

resource and thus must compete with each other. Of course, in EC we can directly measure

89

the ﬁtness of each member of the population with and without each other member present

to identify interactions.

5.2 Ecology in EC systems

In this section, I discuss how each of my chosen EC selection schemes can be intuitively

mapped onto ecology. I believe that these metaphors will facilitate a two-way ﬂow of ideas

between ﬁelds. With the aid of these metaphors, I consider how to calculate two ecologically-

important factors: an individual’s expected number of oﬀspring (i.e. biological ﬁtness, W )

and the conditions under which stable coexistence is possible. The former is critical for

calculating interaction networks within ecological communities (see Section 3.1) and the

latter provides insight into not only the amount of diversity that a selection scheme promotes,

but how the population is situated across the ﬁtness landscape.

5.2.1 Tournament selection

Individuals in tournament selection have no interaction beyond competition for space

in the next population. Since this competition aﬀects all individuals equally, tournament

selection approximates a condition where there is no ecological interaction. Nature lacks

comparable situations, as ecology is nearly ubiquitous outside of carefully controlled ex-

Figure 5.2: Example interaction network with three nodes. Red arrows denote harm-
ful interactions and blue arrows denote beneﬁcial interactions. Line width and color darkness
indicate magnitude of eﬀect. In this example, A reduces B’s ﬁtness a lot, B reduces A’s ﬁt-
ness a little, and B reduces C’s ﬁtness a lot. A increases C’s ﬁtness a lot. A plausible
mechanism for this positive interaction is that, by harming B, A reduces B’s harmful eﬀect
on C.

90

perimental environments (although evolutionary theorists do often make this simplifying

assumption). The absence of ecology in tournament selection means that stable coexistence

is impossible in the long term.

Calculating the biological ﬁtness, W , of an individual in tournament selection requires

determining the proportion of the population with a lower ﬁtness score than that individual’s,

pless. Based on this number, the probability of the focal individual winning a tournament
can be calculated. We must also account for ties based on pequal, the proportion of the entire
population with a ﬁtness score equal to that of the focal individual (including that individual

itself, since I select tournament members with replacement). The ﬁtness of an individual

can be calculated with the following equation:

(cid:18)(cid:0)pequal + pless

(cid:1)T − pT

S ×

(cid:19)

× pself
pequal

(5.1)

less

This equation ﬁnds the probability of an individual with ﬁtness equal to that of the focal

individual winning a tournament by ﬁnding the probability of a selecting a tournament that

contains no individuals with a higher ﬁtness. By multiplying this value by the probability

that the winning individual is the focal individual and the number of selection events per

generation (S), the focal individual’s expected number of oﬀspring can be calculated. T

is the tournament size and pself is the probability of selecting the focal individual from the
population (generally

population size).

1

5.2.2 Fitness sharing

Fitness sharing operates on the basic ecological assumption that individuals compete

more against others that they are more similar to. The most intuitive ecological scenario

to compare ﬁtness sharing to is one where a population consumes a single, continuously-

varying resource. In cases where the distance function is calculated over a discrete number

of dimensions (e.g. Euclidean distance between two vectors of equal length), the resource

91

can be thought of as varying over the same number of dimensions. An analogous situation

in ecology is that of Darwin’s ﬁnches (Schluter et al., 1985). Many of these ﬁnches eat seeds,

which vary along dimensions such as size and shape. Each species of ﬁnch has a beak that

is best adapted to eat seeds near a target size and shape. Over evolutionary time, the ﬁnch

species have partitioned the space of possible beak morphologies into stable niches specialized

on diﬀerent seed types. In the context of ﬁtness sharing, I expect the population to partition

the space of possible phenotypes into stable niches. The theory of limiting similarity suggests

that these niches should be somewhat separated from each other in genotypic/phenotypic

space (Pacala and Tilman, 1994). Fitness sharing results suggest that these niches are often

associated with peaks in the ﬁtness landscape (Goldberg et al., 1987).

Deb and Goldberg have already established the criteria for stable coexistence in ﬁtness

sharing (Deb and Goldberg, 1989), which I ﬁnd to be mathematically identical to Chesson’s

predictions under modern coexistence theory (Chesson, 2000; Chesson and Kuang, 2008). For

the algebra demonstrating this equivalence, see supplemental information. This coexistence

criterion can be summarized by the following equation:

ρ ≤ k1
k2

≤ 1
ρ

(5.2)

where k1 and k2 are the ﬁtness function scores of the genotypes being evaluated and ρ is
the niche count between these two genotypes, as calculated by Equation 2.2. I chose these

symbols to be consistent with Chesson’s framework (Chesson, 2000). Note that in Chesson’s

framework, ρ generalizes to be the amount of niche overlap between the two individuals. The

intensity of stabilizing dynamics (i.e. negative frequency dependence) can be calculated as
1− ρ. The smaller ρ is, the greater the diﬀerence in ﬁtness values it is capable of stabilizing.
The biological ﬁtness, W , of individuals in ﬁtness sharing can be calculated by modifying

ﬁtness scores based on Equation 2.2 and then applying Equation 5.1 based on the adjusted

scores.

92

Figure 5.3: Depiction of a Lexicase decision tree. A, B, and C are test-cases, and the
internal nodes represent the order in which they are picked. Each possible path through the
tree leads to a diﬀerent leaf node (“island”). The individuals on that “island” are the ones
that have the potential to be selected for reproduction.

5.2.3 Lexicase selection

In ecological terms, lexicase selection creates a vast number of niches nested within each

other. While it is tempting to use a resource metaphor to understand the competition within

these niches, the concept of resources does not clearly map onto lexicase selection. Whereas

using a resource harms all other individuals that use that resource, improving on a selection

criterion in lexicase selection only harms a (usually small) subset of the population. Instead,

I argue that population structure is a more apt metaphor.

Imagine N ! islands of equal size, where N is the number of selection criteria. Each island

corresponds to a single potential ordering of selection criteria and can only be inhabited by

the individuals that are best at that ordering (see Figure 5.3). This arrangement is analogous

to situations in nature where niches are deﬁned exclusively by an organism’s ability to survive

a set of harsh abiotic conditions. Being better able to survive these conditions increases the

number of oﬀspring an individual can have. Over time, genotypes that are better at surviving

in a given set of conditions competitively exclude those that are worse at surviving there.

This competitive exclusion happens more rapidly in lexicase selection than it usually would

93

in biology, but the principle is the same.

While multiple genotypes can theoretically inhabit the same island if they are pheno-

typically identical, this coexistence will be unstable due to the lack of stabilizing dynamics

to increase the populations of rare genotypes on an island. Eventually, all but one should

stochastically go extinct. Furthermore, much as an island in nature can experience a ran-

dom catastrophic event, (e.g., a volcanic eruption), the stochastic nature of lexicase selection

means that in every generation there is a chance that an island will get unlucky and not

be selected. The probability of such an event progressively increases as the proportion of
the population living on each island ( population size

) decreases. When the total population

N !

size is less than the number of islands, genotypes must inhabit multiple islands to survive.

Thus, stable coexistence in lexicase selection depends on whether a genotype’s range (i.e.

the proportion of islands it can occupy) is large enough to survive in the long term.

The proportion of islands a genotype can occupy is equivalent to its probability of

selection. Just as species in nature face greater risk of stochastic extinction if their range

(the area they inhabit) is too small, so too do genotypes in lexicase selection. Details for

calculating this quantity are described in equation 4 of (La Cava et al., 2018). To convert the

probability of selecting an individual under lexicase selection to biological ﬁtness, it simply

needs to be multiplied by the number of selection events that will occur:

W = S × P

(5.3)

where S is population size (the number of selection events per generation) and P is

the proportion of islands occupied. What does this condition mean for stable coexistence in

lexicase selection? The chances of a genotype with ﬁtness W surviving for G generations in

a population of size S are given by the equation:

P (survival) = (1 − (1 − W )S)G

(5.4)

94

Figure 5.4: The probability of long-term survival under lexicase selection across
diﬀerent population sizes (S) and lengths of evolutionary time (G).

Plotting this function for various parameter values, we see that there is generally a cut-oﬀ

where the probability of survival rises abruptly from approximately 0 to approximately 1 (see

Figure 5.4). The value at which the transition occurs is eﬀectively the minimal percentage

of islands that a genotype must occupy in order to be expected to survive in the long term.

This value is closely related to the concept of a minimum viable population in biology, and

is useful to consider when making decisions about how large of a population size to use.

5.2.4 Eco-EA

Eco-EA is analogous to a traditional resource competition scenario in ecology. All

individuals occupy the same region of space and compete with each other by targeting the

same resources. The threshold amount of a resource in the environment needed by each

individual to beneﬁt from using it can be calculated:

R∗ =

cost

Cf ∗ score

(5.5)

where cost is the cost of performing the corresponding task, Cf is the fraction of the avail-
able resource consumed, and score is how well the individual performed the task (normalized
to 1). In terms of resource competition theory, this value is eﬀectively that individual’s R∗
for that resource.1 Note that a higher score reduces R∗, meaning that the individual can
1This deﬁnition is subtly diﬀerent from the traditional deﬁnition of R∗, which states that R∗ is the lowest
value of resource for which a species’ per capita population growth is not negative (Chase and Leibold, 2003).

95

beneﬁt from the task even if less of the resource is available, making it a better competitor
for that resource. Thus, as in natural ecosystems, the individual with the lowest R∗ can
out-compete individuals with a higher R∗.

If a group of individuals use completely diﬀerent resource types from each other, there

will be a high degree of stabilization, meaning they should coexist under a wide range of

ﬁtness diﬀerences. The more complex coexistence scenario occurs when two individuals

compete for the same resources. Ecologists distinguish resources that can be used inter-

changeably with each other (“substitutable resources”) from resources that cannot be used

interchangeably (“essential resources”). Resources in Eco-EA are substitutable, because an

equivalent ﬁtness gain could be achieved using either one (up to a point); one just requires

more resource to do so. In this scenario, two individuals can stably coexist as long as they
have lower R∗s for opposite resources, and each one consumes more of the resource it has a
lower R∗ for. Because both R∗ and the amount of resource consumed are determined by the
individual’s score on the task associated with the resource, the latter criterion will always
be met. Thus, as long as neither individual has the lowest R∗ for every resource, coexistence
should be possible. In EC terms, this requirement boils down to each individual being Pareto

dominant. As before, coexistence also requires that the individuals have somewhat similar

ﬁtnesses. The amount of stabilization (and thus the maximum ﬁtness diﬀerence) can be

calculated based on the resource consumption of each species (Letten et al., 2017).

Biological ﬁtness, W , can be determined in the same way as in ﬁtness sharing: ﬁrst,

calculate the adjusted ﬁtnesses based on resource use, then use Equation 5.1.

5.2.5 Summary

As we would expect, selection schemes that successfully maintain diversity allow for

long-term stable coexistence. For ﬁtness sharing and Eco-EA, the criteria are described by

Chesson’s coexistence theory (Chesson, 2000; Letten et al., 2017); to stably coexist, each
I frame R∗ in terms of beneﬁt rather than population growth because population growth in EC is generally
restricted by competition for space in the population.

96

species/genotype/phenotype must limit itself more than it limits others (as formalized in

Equation 5.2). In lexicase selection, biological ﬁtness (i.e. the proportion of “islands” the

species dominates) must be above a cutoﬀ (described in Figure 5.4).

Although Eco-EA and ﬁtness sharing have the same coexistence criteria, they have an

important diﬀerence from each other: in Eco-EA, competition occurs along multiple, poten-

tially orthogonal dimensions, whereas in ﬁtness sharing competition is mediated through a

single function that summarizes all aspects of an individual. This diﬀerence should result

in fewer, more intense pairwise competitive interaction in Eco-EA than in ﬁtness sharing

(although still not as few, or as intense, as those in lexicase selection). I predict that the

more focused interactions in Eco-EA and lexicase selection will promote forms of evolution-

ary divergence that are more useful for adaptive evolution. The requirement in Eco-EA that

coexisting individuals must be non-dominated suggests that individuals in Eco-EA should

fall roughly along a pareto front, similarly to individuals in lexicase selection. An impor-

tant distinction between these two selection schemes, however, is that lexicase selection’s

rigid population structure produces strong pressure for specialists, whereas generalists can

be successful in Eco-EA.

5.3 Empirical Methods

To conﬁrm and extend the intuition I developed in the previous section, I now empirically

investigate these selection mechanisms in the context of actual evolving populations.

5.3.1

Interaction networks

In Section 5.2, I predicted the competitive pressures that diﬀerent selection schemes

exert on populations. Here, I assess these predictions by drawing interaction networks for real

populations. For simplicity, I use a population containing 10 individuals with 5 integer traits

selected from a geometric distribution. Each trait represents a niche that these individuals are

97

competing to occupy, with higher numbers corresponding to higher competitive ability. For

lexicase selection, each trait is a selection criterion and the individual with the highest value

there wins. In Eco-EA, I added a resource associated with each trait and an individual’s

value for that trait deﬁnes its ability to use that resource. In ﬁtness sharing, distance is

measured as euclidean distance between sets of traits. The ﬁtness landscape is otherwise

ﬂat. I compare the same population across all selection schemes. To calculate the eﬀect that

a given individual, A has on another individual, B, I ﬁrst calculate the ﬁtness of B in the

presence of the whole population. Then I remove A from the population and recalculate the

ﬁtness of B. The diﬀerence between these two ﬁtnesses is the eﬀect of A on B. Note that

these ﬁtness values are biological ﬁtness, i.e. the expected number of oﬀspring, rather than

the ﬁtness produced by the ﬁtness function. For ﬁtness sharing and Eco-EA, I assume a

tournament size of 2 when making this calculation.

5.3.2 Phylogenetic analysis of evolved populations

What is the long-term eﬀect of these diﬀerent interaction network topologies? I arrive at

a ﬁrst-order approximation by analyzing the phylogenies of populations evolved under each

selection scheme. Across selection schemes, the behavior of a population depends on the part

of the ﬁtness landscape that it is currently exploring. If the entire population is climbing a

steep hill, all forms of diversity should be low, due to frequent selective sweeps. To assess the

eﬀect of such diﬀerences in ﬁtness landscapes, I compare phylogenetic diversity across three

diﬀerent genetic programming problems believed to have qualitatively diﬀerent ﬁtness land-

scapes: squares, Collatz, and the Dow chemical regresssion problem (see chapter 2). Using a

simple linear genetic programming representation (ScopeGP, descibed in chapter 2), I evolve

linear genetic programs for 1000 generations. I use 11 test cases for the squares problem and

100 test cases each for the Collatz problem and Dow chemical challenge. Phenotypes are the

vectors of inverse error for each test case (so higher scores are better). Fitness is calculated

as the sum of these vectors. For lexicase selection and Eco-EA, each test case corresponds

98

to a selection criterion or resource.

I evolved 30 populations for each selection scheme and each problem. Because of the

profound eﬀect of the sharing threshold parameter on the behavior of ﬁtness sharing, I also

performed 30 runs each of ﬁtness sharing with ﬁve diﬀerent sharing thresholds. Statistics

across selection schemes were calculated using the sharing threshold with the highest phy-

logentic diversity. For each run, I calculated a variety of metrics, including phenotypic

diversity (measured as Shannon entropy), genotypic diversity, phylogenetic diversity (Faith,

1992), and mean pairwise distance of genotypes in the phylogeny (Webb and Losos, 2000).

Across all runs, I used a genome length of 200 and a population size of 500. I applied mu-

tations to every new individual by randomly changing the code at up to three sites in the

genome (the speciﬁc number was selected from a uniform distribution). To simplify calcu-

lation of the phylogenetic metrics, I seeded each population with a single individual that

served as the common ancestor to all others. For Eco-EA, I used a resource inﬂow rate of

100, a cost of 1, and a consumption fraction for 0.0025.

5.3.3 Code availability

Data for the interaction network analysis was generated using a simulation written with

Python 3.6.3. Graphs were visualized using the networkx package (Hagberg et al., 2008).

All code for generating and analyzing the data presented in this chapter is open source and

available (Dolson, 2018a).

5.4 Results and Discussion

5.4.1

Interaction networks

Diﬀerent selection schemes create strikingly diﬀerent interaction networks (see Fig-

ure 5.5). Notably, lexicase selection’s rigid population structure leads to far fewer interactions

99

(a) Representative example networks.

(b) Summary over 100 randomly generated popu-
lations.

Figure 5.5: Interaction networks for the same community under four diﬀerent se-
lection schemes: tournament selection, ﬁtness sharing, Eco-EA, and lexicase selection. a)
Shows the networks for a single population. Red edges indicate harmful interactions, blue
edges indicate beneﬁcial interactions. Edge width denotes interaction strength. An interac-
tive version of this chapter is available (Dolson, 2018a). b) Shows boxplots summarizing the
number of beneﬁcial and harmful interactions observed under each selection scheme across
100 randomly generated populations.

100

Figure 5.6: Phenotypic and phylogenetic diversity over time for each problem.
Shaded area is the bootstrapped 95% conﬁdence interval around the mean.

of both types2 than under other schemes (pairwise Wilcoxon rank-sum test, p < .0001), and

ones that are predominantly negative. Eco-EA creates far more interactions, because individ-

uals can use diﬀerent resources to diﬀerent extents. Importantly, there are many beneﬁcial

interactions; if A competes with B and B competes with C, A can help C by suppressing

B. These interactions imply a stabilizing dynamic between A and C, because they indicate

that A is competing with C less than with itself. Additionally, the Eco-EA community has

many weak interactions, which can promote stabilizing dynamics community-wide (McCann,

2000). Lastly, in ﬁtness sharing, most individuals harm each other approximately the same

amount. The sharing threshold subtly aﬀects interaction strength.

5.4.2 Phylogenetic analysis

As hypothesized, phylogenetic diversity is low for all selection schemes on the square

problem (see Figure 5.6). This result is presumably due to ﬁtness diﬀerences so large that

2With one exception: I observed no beneﬁcial interactions in lexicase or tournament selection, so these

groups were not diﬀerent from each other.

101

stabilizing eﬀects are insuﬃcient to overcome them. In the context of such a steep evolu-

tionary gradient, such behavior is expected and often desirable. Due to negative frequency

dependence, phenotypic diversity remains high for ﬁtness sharing and Eco-EA, even after

most populations solved the problem (generally between 200 and 500 generations in). For

lexicase selection, on the other hand, it initially increases and then drops rapidly as the

populations converge on the solution.

Results from the Collatz problem support our hypothesis that selection schemes with

more restrictions on which individuals compete promote phylogenetic diversity (see Figure

5.6). Lexicase selection and Eco-EA did not have signiﬁcantly diﬀerent ﬁnal phylogenetic

diversity (Wilcoxon rank-sum test, p=0.31), but all other pairs of selection schemes did

(Wilcoxon rank-sum tests, p < .05). Results for mean pairwise distance were similar, sug-

gesting that lexicase selection and Eco-EA (and to a lesser extent ﬁtness sharing) do promote

the coexistence of divergent branches.

Interestingly, phenotypic diversity does not correlate especially closely with phyloge-

netic diversity (see Figure 5.6). In particular, Eco-EA has relatively low phenotypic diver-

sity, despite its high phylogenetic diversity. Conversely, ﬁtness sharing has relatively high

phenotypic diversity despite its mid-range phylogenetic diversity. This result suggests that

whereas lexicase selection and ﬁtness sharing allow similar phenotypes to coexist, Eco-EA

forces them to converge. A potential explanation for this diﬀerence is that Eco-EA rewards

generalists substantially more than lexicase selection.

This discrepancy is even more pronounced in the context of selecting a sharing threshold

for ﬁtness sharing (see Figure 5.7). Phenotypic diversity is maximized at σshare = 1 (Wilcoxon
rank-sum tests, p < .005), whereas mean pairwise distance and phylogenetic diversity are

highest at σshare = 10 (Wilcoxon rank-sum tests, p < .005). This result emphasizes the fact
that phylogenetic diversity is meaningfully diﬀerent from phenotypic and genotypic diversity

in ways that can aﬀect choice of parameter values.

Results from the Dow problem (see Figure 5.6) illustrate the strong eﬀect that the

102

Figure 5.7: Phenotypic and phylogenetic diversities in ﬁtness sharing across various
values of sharing threshold (σshare) in the context of the Collatz problem. Shaded areas are
bootstrapped 95% conﬁdence intervals around the mean.

topology of the ﬁtness landscape has on which techniques are most eﬀective at maintaining

phylogenetic diversity.

In contrast to its high eﬃcacy on the Collatz problem, Eco-EA

maintains no more phylogenetic diversity than ﬁtness sharing and tournament selection on

the Dow chemical problem (Wilcoxon rank-sum tests, p = 1 for all). Lexicase selection,

on the other hand, maintains phylogenetic diversity on both of these problems (Wilcoxon

rank-sum tests, p < .005). Understanding what properties of the ﬁtness landscape account

for this diﬀerence is an important area of future research.

5.5 Conclusions

In this chapter, I have taken a high-level tour of tools that ecology can contribute to

EC. Using metaphors and mathematical theory from ecology, I have identiﬁed important dif-

ferences across four selection schemes. Importantly, I have shown mathematical equivalence

between rules governing coexistence in ecology and in EC. This framework should allow us

to directly translate insights between these ﬁelds.

Building on this conceptual understanding, I empirically measured the interactions that

describe the communities created by diﬀerent selection schemes. The topology of these in-

teraction networks supports our theoretical predictions, and provides insight into the mecha-

nisms by which these techniques maintain diversity. A more systematic study of interaction

103

networks under various selection schemes will further improve our understanding. I gained

further insight by exploring the long-term eﬀects of these selection schemes with phyloge-

netic metrics. Notably, I demonstrated that phylogenetic analysis metrics describe aspects

of the underlying dynamics of diversiﬁcation that are distinct from those described by geno-

typic and phenotypic diversity. This result holds across diﬀerent parameter choices for the

same selection scheme, and across diﬀerent selection schemes. Analyzing the phylogenies of

populations evolving on a wider range of ﬁtness landscapes has the potential to help predict

which parameters and selection schemes are most appropriate for which problems.

From our analysis thus far, I can conclude that lexicase selection and Eco-EA rely on

distinct underlying mechanisms to promote the evolution and maintenance of solutions that

are evolutionarily divergent. Lexicase selection creates a small number of intense competi-

tive interactions, whereas Eco-EA creates a larger number of weaker interactions that can

be either competitive or facilitative. As a result, lexicase selection generates many specialist

phenotypes, while Eco-EA supports fewer, more generalist phenotypes. Both of these selec-

tion schemes produce populations representing a wider diversity of evolutionary history than

those produced by ﬁtness sharing.

Ecological techniques have the power to revolutionize our understanding of diversity in

EC, and our work here has barely started. In the future, I plan to delve deeper into these

approaches, to import more ideas and statistical techniques from ecology, and to apply them

to an even wider range of selection schemes.

104

Chapter 6

Applying ecological principles to genetic

programming

This chapter is adapted from a chapter in the proceedings of Genetic Programming in

Theory and Practice XV (Dolson et al., 2018a).

6.1 Introduction

Natural evolution produces eﬀective solutions to complex problems, often well beyond

the ability of human engineers to duplicate. If we are to harness these natural evolutionary

dynamics, we must understand the full depth of how they function and why they are so

eﬀective. In this chapter, I explore how ecological factors promote more open-ended evolu-

tionary systems that have a greater potential to produce complex, dynamic, and practical

solutions to targeted problems. First I discuss why I believe that this approach can help

solve diﬃcult AI problems, then I review the current scientiﬁc understanding of ecological

dynamics of interest. Next, I present experiments that I have performed using Eco-EA and

lexicase selection to test the potential of ecologically-mediated hints, before discussing the

implications of these results, and ﬁnally laying out the next steps I plan to take in this line

of research.

105

6.1.1 Motivation

Many problems in machine learning center around creating artiﬁcial intelligence systems

that can resolve challenging problems that are traditionally solved by humans. Often, these

programs are written with the goal of mimicking the strategies employed by skilled humans.

When human strategies can be clearly articulated as a set of well-deﬁned rules, the process

of writing the AI can be straight-forward. However, humans tend to rely on intuition when

solving many types of problems. While human intuition is often eﬀective, it is challenging

to abstract into an algorithm that an AI can follow without missing important nuances.

Attempts to do so often produce rigid AIs that fail to appropriately adapt to situations that

are subtly diﬀerent than those that were originally expected. Such issues are exacerbated

for problems where early decisions determine what scenarios the AI will later encounter. For

example, in many board games, minor mistakes early on can drastically reduce success and

alter what options are available for the middle and end of the game.

This property—whereby early decisions shape what strategies are possible later—can

pose even more substantial obstacles to evolving AIs. If a good strategy is critical early on,

an AI without one would consistently lose. However if a good early strategy is not suﬃcient

for overall success, the selection pressure in favor of it will be weak (at best) relative to its

importance. Speciﬁcally, in any problem where many tasks must all be performed reasonably

well for ﬁtness to be non-zero, it is nearly impossible for evolution to get enough initial

traction to be successful. Taken, from another perspective, the ﬁtness landscape for such a

problem will be almost entirely ﬂat, with a small rugged region featuring steep peaks that

are challenging to even ﬁnd, let alone navigate. Previous attempts to make this landscape

more easily navigable by giving the evolving AIs "hints" based on strategies successfully

employed by humans have generally been ineﬀective due to the aforementioned diﬃculties

with abstracting human intuition.

I hypothesize that we can apply ecological dynamics to evolve models of human decision

making and reliably create human-competitive AIs. Here, I present our concepts in terms

106

of playing games, as this is an intuitive set of problems to think about, but the techniques

I propose should generalize to other categories of problems. The key is to have evolution be

responsible for the process of abstracting general strategy from human intuition. I believe

that we can achieve this goal by supplying data on human decisions from a wide variety

of scenarios and selecting for AIs that are capable of predicting the move that a given

human made in any given situation. This approach will allow selection to operate evenly

across early-, mid-, and late-game strategy, removing the issue of temporally compounded

mistakes. Furthermore, it eliminates the need for humans to be able to codify their strategies

into a concrete rule set.

Selecting for AIs that can predict human moves has a second, more fundamental beneﬁt.

Often, problems that are too complex for evolutionary computation to solve in isolation can

be solved if there is also a ﬁtness beneﬁt for solving simpler, related problems. These simpler

problems, often referred to as "building blocks", cause genomes within the population to

accumulate information that is relatively easy to repurpose into solving the actual problem

(Lenski et al., 2003), a technique that has proven to be eﬀective in evolutionary computation

(Bongard and Hornby, 2010). I hypothesize that AIs evolving to predict a human’s choices

will often do so by recreating the underlying building blocks that comprise that human’s

strategy—even if the human does not recognize it themself.

An obvious problem with this approach is that some humans may be poor at a given

problem. For example, predicting their moves in a game may be impossible because they

are eﬀectively random, or worse employing actively poor strategy that will lead the evolving

population down a maladaptive path.

Interestingly, some preliminary work on Eco-EA

suggests that it might be robust to bad advice (Goings and Ofria, 2009; Goings, 2010;

Goings et al., 2012). Resources associated with useful sub-tasks should produce evolutionary

building-blocks that get incorporated as parts of a more complex strategy and used in the

overall solution. Unhelpful or misleading sub-tasks, however, should be used by only a

small fraction of the population, resulting in a trivial slowdown as compared to running the

107

algorithm without the resources present.

The paradigm of supplying hints to an evolving algorithm without worrying about

whether they will be misleading opens up the potential to evolve solutions to highly complex

problem. For example, if we wanted to evolve a controller for a robot that would per-

form search-and-rescue, we would never get there if we started with random controllers and

only rewarded robots that successfully rescued people. We could, however, use ecologically-

mediated hints by adding a set of limited resources that each rewarded some component, such

as (1) moving to a target location, (2) exploring, (3) fully scanning an area, (4) identifying

dangers, (5) navigating around obstacles, (6) identifying trapped people, (7) moving toward

trapped people, (8) freeing trapped people, (9) moving with people, and (10) ﬁnding your

way back to safety. With all of these helpers as building blocks, it is easy to imagine that

a controller would eventually evolve that could occasionally ﬁnd and return people safely,

which would allow it to start getting rewarded by the primary (unlimited) ﬁtness function.

This approach uses ecologically-mediated hints to push the population to solve the problem

from multiple directions at once; before the whole problem is solved, some individuals might

be able to ﬁnd people, but then ignore them. Some will explore and then ﬁnd their own

way back. Others might be able to free people they ﬁnd, but not know what to do with

them next. Some evolutionary trajectories might be more likely than others to continue

to evolve toward the ﬁnal goal. Of course, some of the “hints” might be associated with

counter-productive tasks. If there were a limited resource associated with avoiding all dan-

gers (because a misguided programmer thought this would be helpful), the result might be

to make it less likely for the robot to ﬁnd trapped people. With ecologically-mediated hints,

however, only a small portion of the population (a single niche) would be rewarded for this

ability, so others would still plow ahead and complete the mission.

A key component of this approach is that in an ecology, niches drive the types of diverse

solutions that appear. If an organism is the ﬁrst to occupy a new niche, it must have some

traits associated with that niche, but is otherwise free from direct competition, allowing

108

it to sustain substantial loss of unrelated traits. This dynamic allows for many diﬀerent

pathways to coexist in a population, any of which can be followed to solve the high-level

problem, with the most successful strategies dominating multiple niches. In essence, these

co-existing niches facilitate the creation of a variety of building blocks leading to successively

more complex strategies.

Another beneﬁt to an eco-evolutionary approach is that ecological communities are, by

deﬁnition, at least somewhat diverse. While promoting diversity in evolutionary computation

has long been recognized as critical to avoiding the problem of premature convergence, most

existing mechanisms to promote diversity in evolutionary computation select for solutions

that are distinct from each other, regardless of other qualities (Goldberg et al., 1987). In

natural systems, however, diversity arises due to organisms ﬁlling niches, each requiring

speciﬁc phenotypic traits for success. Thus there is pressure for a diversity of functional

traits. Furthermore, new niches are continuously created in nature as organisms interact

with each other and modify both their physical and social environment. In problem domains

such as playing complex games, this diversity of solutions becomes even more important, as

there is no single, deterministic, best strategy. Instead, there are various strategies that are

eﬀective in diﬀerent situations and against diﬀerent opponents, just as there would be in an

ecological community.

6.2 Methods

I expect that ecologically-mediated hints provided via Eco-EA and lexicase selection

will display similar set of strengths. Both maintain a diverse population with respect to the

hints that are provided. However, as discussed above, I expect the use of limited resources

to be robust to hints that turn out to be counter-productive, while lexicase selection will

have diﬃculty escaping uncontested bad advice. To assess the accuracy of these hypotheses,

I evaluate them in the context of a proof-of-concept problem, the N-dimensional box, where

109

we can easily manipulate the quantity and quality of the hints.

The N-dimensional box problem is designed to be nearly impossible to solve without

hints. As such, we provide hints about the optimal value for each dimension. Since the

goal is to minimize all dimensions, a good (i.e. informative) hint would be to minimize an

individual dimension. Conversely, maximizing an individual dimension would be a bad (i.e.

misleading) hint, leading a population away from the goal. We provided various combinations

of good and bad hints to Eco-EA (as resources) and to lexicase selection (as test cases). As an

additional control, I ran an equivalent number of trials using unaltered tournament selection.

6.2.1 Conﬁguration details

For all experiments, I evolved a population of 5000 vectors containing 10 ﬂoating-point

numbers between 0.0 and 1.0 (inclusive) for 50,000 generations. The next population for

each generation was chosen using one of the three selection schemes being compared: lexicase

selection, Eco-EA, or tournament selection. To avoid giving Eco-EA an unfair advantage

due to its use of elitism (in a problem domain where that could only be beneﬁcial), I always

preserved a copy of the individual in the population with the highest ﬁtness. For selection

schemes requiring a tournament size (tournament selection and Eco-EA), I used a tournament

size of two.

I placed the individuals selected by each iteration of the selection scheme into the next

generation. Each site in each genome was mutated by adding a value randomly selected

from a Gaussian distribution centered at 0 with a standard deviation of .05. Subsequently,

I recombined each vector with a random other vector, using one-point crossover.

6.2.2 Statistical methods

To determine the eﬀects of good advice, bad advice, and selection scheme on the proba-

bility of solving the problem, I performed a logistic regression. The predictor variables were

the number of good hints, the number of bad hints, a boolean indicating whether lexicase

110

selection was used, and a boolean indicating whether Eco-EA was used. Tournament selec-

tion, being the control, was the base case to which all other conditions were compared. The

regression coeﬃcients for the interactions between good or bad hints and the selection type

were used as the test statistic for all statements about the eﬀect of a hint type on a selection

scheme.

Since some combinations of variables were able to perfectly separate successes from fail-

ures, which can bias the results of logistic regression, I used Firth’s bias reduction technique,

as implemented in the R package brglm (Kosmidis, 2017).

6.2.3 Code availability

The full source code for the experiments and analysis in this chapter is freely available

at https://github.com/emilydolson/eco-ea-box.

6.3 Results and Discussion

Both Eco-EA and lexicase selection make eﬀective use of good advice (see Figure 6.1).

Eco-EA solved the problem consistently when given at least four good hints. Lexicase selec-

tion did even better, solving the problem most of the time when given as few as two good

hints. Good hints enable both lexicase selection and Eco-EA to signiﬁcantly outperform

tournament selection (logistic regression, β = 1.3334 for lexicase, β = 1.8363 for Eco-EA,

p < .0001 for both).

These data strongly support our hypothesis that Eco-EA is essentially unaﬀected by

receiving bad hints; the regression coeﬃcient for the interaction between Eco-EA and bad

hints is not signiﬁcantly diﬀerent from 0 (logistic regression, β = 0.0633, p = .66). Lexicase

selection, on the other hand, is harmed dramatically by bad hints (logistic regression, β =
−10.6841, p < .0001).

Overall, our results illustrate the power of ecological approaches as a vehicle for providing

111

hints to evolutionary algorithms. I have provided a proof-of-concept that this technique can

make it possible for evolution to solve problems that would otherwise have been out of

reach. Moreover, I have clariﬁed the instances in which two speciﬁc ecological approaches,

Eco-EA and lexicase selection, are most appropriate; lexicase is best when all of the hints

are accurate, whereas Eco-EA is robust in scenarios where there may be some misleading

hints.

6.4 Conclusions and future work

Thus far, I have tested ecologically-mediated hints on only simple model problems as

a proof-of-concept, but I expect this approach to excel on complex problems where ﬁtness

functions do not provide a clear path from random starting conditions to meaningful solu-

tions. Artiﬁcial Intelligence is a perfect example of this type of problem. Problem-solving

strategies may require many diﬀerent components to coordinate to formulate plans. Board-

game-playing agents are a particularly accessible problem that has these properties: they are

complex, while still being experimentally tractable, and are often well-studied. They involve

clear measures of success, while allowing for multiple co-existing strategies. Most impor-

tantly, they are intuitive to humans who can provide “suggestions” for limited resources to

produce building blocks for complex strategies.

The next step will be to more thoroughly explore the parameter space in which providing

hints via Eco-EA and lexicase selection is eﬀective. In the process, I hope to gain insight

that will allow us to develop an approach that includes the best properties of both. Thus

far, I have explored the impact of good and bad hints that are orthogonal to each other.

However, most hints in real world problems will not be independent, and we expect that this

connection may inﬂuence the way Eco-EA and lexicase selection respond to them. Similarly,

I have not explored the impact of completely neutral hints, which may be harmful if they

are present in excessive quantities. That said, a resource is helpful only when it produces a

112

building block that turns out to be useful for a more complex goal, after which it is no longer

needed. As such, many resources would be helpful if they were around just long enough for

the associated niche to be ﬁlled as a stepping-stone to more complex niches.

If good advice is helpful and other advice is harmless, it should be possible to bootstrap

the solving of a problem by generating random hints. Each hint, once used, will remain for a

limited amount of time (providing an opportunity to be used as a building-block) before it is

removed and replaced by a new hint associated with a new randomly-determined behavior.

These random behaviors will usually be harmful and thus ignored, but any that turn out to

be helpful building blocks should be incorporated into the successful players.

Finally, I will use human problem-solving patterns to create niches for evolving “crowd-

sourced” AIs. While we believe transient resources will have some utility, most of those

building blocks are likely to be useless, merely using up computational time. A beneﬁt of

working with board games, however, is that I can collect a wealth of information about how

human players make decisions across a variety of situations. Rather than reward building

blocks that I identify from those logs, I can instead create limited resources that reward AI

players for consistently predicting the next move made by a human player. In other words,

we do not need to understand why a human player made a decision in order to reward an AI

for following the same type of strategy. If that player plays well, the building blocks produced

to mimic them should be generally useful, even for some other strategy types. Ultimately,

the biggest rewards will still come from winning games, so mimicking poor players should

have a minimal negative impact (as is usually the case with Eco-EA).

113

Figure 6.1: Impact of good and bad advice on Eco-EA and Lexicase. Heat maps for
each algorithm show the success rate of that algorithm in the presence of varying quantities
of good and bad advice. Note that tournament selection is not actually capable of receiving
hints; it is presented here as a control. Each cell in the heat maps represents the proportion
of runs (out of 10) that successfully found the optimal solution to the 10-dimensional box
problem.

114

Chapter 7

The MODES Toolbox: Measurements of

Open-ended Dynamics in Evolving

Systems

This chapter is adapted from a paper in the Artiﬁcial Life Journal (Dolson et al., 2019b).

7.1 Introduction

A central goal of the ﬁeld of artiﬁcial life is to build evolving systems that capture

the full range of dynamics found in natural systems. Such systems should be capable of

producing evolutionary outcomes such as sophisticated navigation behaviors, novel coopera-

tive strategies, complex ecosystems, and major evolutionary transitions, to name but a few.

Researchers seek such “open-ended” systems for a number of reasons:

1. For biologists, access to systems exhibiting complex and nuanced evolutionary processes

allows rapid experimentation and facilitates developing a deep intuition for underlying

mechanisms (Tenaillon et al., 2016).

2. For evolutionary computation researchers, insights from open-ended evolving systems

115

will allow researchers to break complexity barriers, expanding the classes of engineering

problems that evolutionary algorithms can solve (Hara and Nagao, 1999; Potter and

De Jong, 2000) and producing more general forms of evolved intelligence.

3. For artiﬁcial life researchers, it is concerning that there may be dynamics of funda-

mental importance to biology that artiﬁcial life systems do not exhibit. The existence

of such dynamics suggests that we are not building evolving systems as innovative as

those found in nature, be it due to limited memory, limited time, or simply an insuf-

ﬁcient understanding of the necessary components. Identifying these missing factors

should allow us to better understand life as it is and to better explore life as it could

be.

While various artiﬁcial life systems have reproduced individual dynamics – such as

the evolution of complex traits (Lenski et al., 2003), cooperative behaviors (Goldsby et al.,

2012), and coexistence of diverse ecotypes (Cooper and Ofria, 2003) – these accomplishments

have been in highly controlled circumstances. The overarching goal of open-ended evolution

research is to create a system where all of these dynamics emerge more organically, as in

nature. Additionally, replicating this process would provide substantial insights into our

own origins, including the evolution of human brains. Indeed, harnessing a more open-ended

set of evolutionary dynamics could help us spur breakthroughs in the evolution of general

artiﬁcial intelligence.

Open-ended evolution is a many-faceted concept. A number of patterns are considered

to be hallmarks of open-ended evolution (Taylor et al., 2016), most notably the continual

production of novelty (Lehman and Stanley, 2011; Banzhaf et al., 2016), unconstrained

increases in diversity (Bedau and Bahm, 1994), ongoing increases in complexity (Lenski

et al., 2003; Korb and Dorin, 2011), and shifts in individuality such as those often associated

with major transitions in evolution (Maynard Smith and Szathmary, 1997). There is a

growing consensus in the ﬁeld that all of these dynamics are important pieces of the open-

ended evolution puzzle (Taylor et al., 2016).

In addition, I, along with colleagues, have

116

Figure 7.1: Relationships between the metrics. Originally published in (Dolson et al.,
2015). Solid lines with arrows indicate metrics which are prerequisites for other metrics.

previously suggested that there is a ﬁfth necessary and even simpler dynamic: continuous

change in the information content of components in the population (Dolson et al., 2015).

These ﬁve properties of a system ﬁt into a hierarchy, as shown in Figure 7.1. For nov-

elty to exist, there must be some degree of change in the information within a population.

While this observation is trivially true, many evolutionary algorithms suﬀer from premature

convergence, the absence of non-trivial change. Thus, it remains an important prerequisite

to deﬁne and explicitly address. Similarly, complexity and diversity can only increase indef-

initely if novel members of the population continue to be generated. Finally, transitions in

individuality typically involve multiple organisms coming together into a single individual,

building from complex and diverse progenitors. All of these dynamics capture diﬀerent sub-

sets of interesting behavior that evolving systems might exhibit and I propose they are all

necessary (but perhaps not suﬃcient) in a fully open-ended system.

To draw conclusions about what factors of a system promote or inhibit these dynamics,

we need methods for measuring the extent to which each dynamic is present. Importantly,

these methods must be applicable across a wide variety of systems. Some progress has been

117

made toward this end with evolutionary activity statistics (Bedau et al., 1998; Channon,

2001), an approach to isolating and quantifying the adaptive component of an evolving

system, separating out the non-adaptive dynamics. Evolutionary activity statistics require

that the user decide on two things ahead of time: a deﬁnition for “components” (meaningful

individual pieces of a system) and a way of ﬁltering noise out of the system (typically by

contrasting with a shadow population that evolves with selective pressures turned oﬀ).

Thus far, components have needed to be deﬁned for each system on a case-by-case basis.

In artiﬁcial life systems, alleles or genotypes are typically used as components, while in the

fossil record, whole species were used as components (Bedau et al., 1998). This ﬂexibility to

choose diﬀerent components is valuable, as it allows for the study of open-ended evolution

at diﬀerent scales of organization. However, it also means that care must be taken when

comparing evolutionary activity statistics across systems. Here, I suggest a component

deﬁnition that should work for any system in which genomes are composed of elements that

collectively determine ﬁtness (see Identifying Meaningful Sites in the Genome). Note that,

although this component deﬁnition will not work with every system, I will suggest other

approaches with broader compatibility.

Due to the critical role of stochasticity in evolution, most evolving systems are noisy. In

order to make behavioral generalizations, we need a way to distinguish evolutionary signal

from this noise.

In the original description of evolutionary activity statistics, a speciﬁc

method was proposed for doing so: for each run of a system, there should be a corresponding

“shadow” run in which any outcome of selection is replaced with a random choice. Dynamics

observed in the shadow run can then be subtracted out from those in the main run. While

this control can be highly informative, it is challenging to implement in many systems and

requires researchers to be able to isolate all selective events in the system. For example,

when evolutionary activity statistics were applied to the fossil record, a diﬀerent ﬁlter had

to be used: the assumption that any species that was successful enough to have made it into

the fossil record was probably evolutionarily successful for a substantial amount of time. In

118

this chapter, I build on this idea to propose a ﬁlter for evolutionary activity that can be

more easily implemented across a variety of systems (see Filtering Out Noise).

Evolutionary activity statistics classify evolving systems based on how open-ended they

are. However, it is relatively easy to create a system that falls into the most open-ended

class while still failing to further our goals for open-ended evolution research or to match our

subjective understanding of what we would expect from a truly open-ended system (Maley,

1999). Indeed, there is debate over whether “open-endedness” is even quantiﬁable (Stanley

and Soros, 2016). Moreover, it is my opinion that most eﬀorts to deﬁne systems as either

open-ended or not have largely been unproductive; open-endedness is likely better thought

of as a continuum than as a binary. While there is much debate over what would constitute

a fully open-ended system, there is consensus in the ﬁeld that we are not particularly close to

building such a system1. My goal in this chapter is to extend evolutionary activity statistics

into easy-to-use diagnostic criteria that quantitatively measure key hallmarks of open-ended

evolution. I want researchers to be able to isolate the eﬀects of experimental settings on

these hallmarks, keeping such results relevant across experimental platforms. As such, I

hope to spur a more consistent and comparable march toward true open-endedness, adding

new metrics to this toolbox as the community reaches a consensus on the features that we

should promote.

In the rest of this chapter, I will introduce the Measurements of Open-ended Dynamics

in Evolving Systems (MODES) toolbox and explore the behavior of the metrics it contains

in the context of two evolving systems: NK-Landscapes (Kauﬀman and Levin, 1987) and

the Avida Digital Evolution Platform (Ofria and Wilke, 2004).

1This line of thought originally lead us to conceptualize the metrics described here in terms of possible
barriers a system might encounter that would prevent it from being open-ended (Dolson et al., 2015).
However, my attempts to measure these barriers align closely with dynamics that have since been identiﬁed
as hallmarks of open-ended evolution. Ultimately, these perspectives are two sides of the same coin and both
are useful frames through which to view open-endedness. For simplicity, I phrase this chapter in terms of
hallmarks rather than barriers.

119

7.2 Background

7.2.1 Evolutionary Activity

Evolutionary activity statistics attempt to quantify the extent to which adaptive dy-

namics are occurring in a population. In most applications, evolutionary activity has been

measured as the length of time that components exist in the population beyond what would

be expected in the absence of selection (Bedau et al., 1997, 1998; Channon, 2003). This value

was chosen because it translates easily across systems and represents a universal measure

of evolutionary “success”. In earlier work, a measure of selective sweeps was used instead of

component existence time (Bedau and Packard, 1992), but this metric could not be easily

generalized across systems.

Multiple facets of evolutionary activity are used in the interpretation of evolutionary

activity statistics: the activity of new components (Anew), the mean (or median) cumulative
activity of components in the population ( ¯Acum), and the diversity of components in the
population (D). Based on the long-term behavior of these values, systems that exhibit qual-

itatively similar dynamics have been grouped together into a class of evolutionary dynamics.

Initially, three possible classes were described: no evolutionary activity, bounded evolution-

ary activity, and unbounded evolutionary activity. Over time, additional classes have been

added to more precisely reﬂect the types of systems observed. For ease of referring to these

classes, table 7.1 merges together all prior additions to the original classiﬁcation system of

which we are aware.

According to the original formulation of evolutionary activity statistics, in order for a

system to be categorized among the most open-ended systems (originally class 3, now class

4) it must exhibit unbounded growth in summed evolutionary activity across all components

in the population (Bedau et al., 1997) (see Table 7.1). Technically, this growth could happen

either because of an unbounded increase in the number of components (diversity) or because

of an unbounded increase in the average evolutionary activity of components in the popu-

120

lation. The latter case was originally thought to not occur (Bedau et al., 1998), however

when such a case was observed, Channon suggested that class 3 open-ended dynamics should

be broken up into three subcategories. These subcategories depend on whether the growth

in evolutionary activity was driven by diversity, per-component evolutionary activity, or a

combination of both (Channon, 2001) (see Table 7.1).

In parallel, Skusa and Bedau reﬁned the classiﬁcation in a diﬀerent way (Skusa and

Bedau, 2003), inserting a new second class in which evolutionary activity was unbounded

but no novel components came into being (see Table 7.1). Such a situation involves purely

ecological dynamics. This observation may seem surprising at ﬁrst - shouldn’t unbounded

evolutionary activity involve adaptation? However, when evolutionary activity is measured

as the existence time of a component, evolutionary activity statistics draw no distinction

between stabilizing selection and selection favoring changes to the status quo (Channon,

2003). Thus, pressure for multiple eco-types to persist in their current form (i.e., ecology)

will show up as as evolutionary activity above and beyond that observed in the shadow run.

In fact, the presence of a single component under stabilizing selection will trivially

cause the mean evolutionary activity to increase indeﬁnitely; such a component will sit

in the population, increasing the population’s activity counter despite being quickly lost

from the shadow population. This behavior casts doubt on how we should interpret class

4b, as well. To remedy this concern, Channon suggested that we should look for unbounded

growth in median (rather than mean) per-component evolutionary activity (Channon, 2003).

This adjustment is a drastic improvement, but it still does not eliminate the possibility that

systems exhibiting class 4b evolutionary dynamics are not doing quite what we would expect.

If at least 51% of the components in the population are under stabilizing selection – as would

be expected in an ecological system – the rest of the population could still be behaving like

a class 3 system. While such a system would still be interesting for ecological studies, our

understanding of it would not be well-served by conﬂating it with systems that are exhibiting

open-ended adaptive evolution.

121

Median2

Evolutionary
Activity ( ˜Acum)

zero

unbounded
bounded
bounded
unbounded
unbounded

Class

1
2
3
4a
4b
4c

Anew
zero
zero

?
?

positive positive
positive positive
positive positive
positive positive

Change Novelty

(or Ecology)

Complexity

Diversity

(D)

Type of

evolutionary

dynamics

Described in

bounded
bounded
bounded
unbounded
bounded
unbouned

bounded
bounded
bounded
bounded

?
?

None

Uncreative
Bounded
Unbounded
Unbounded
Unbounded

Bedau et al, 1997

Skusa and Bedau, 2003

Bedau et al, 1997
Bedau et al, 1997
Channon, 2001
Channon, 2001

Table 7.1: Table combining all previously described classes of evolutionary dynamics as measured with evolu-
tionary activity statistics. For each class, I show the response of all quantities measured for evolutionary activity statistics
and in these proposed metrics (novelty and diversity should behave equivalently between the two systems). Note that I expect
bounded evolutionary activity to imply bounded complexity, as any scenario in which complexity is growing without bound
should imply that evolutionary activity is too. Question marks indicate that the value of a given metric is not speciﬁed in
the description for a class of evolutionary activity. Higher-numbered classes are generally believed to fall further along the
continuum of open-endedness than lower-numbered classes.
In principle, classes 4b and 4c could each be further split into
sub-classes based on whether complexity is bounded or unbounded. Likewise, classes 1 and 2 could be further sub-divided based
on whether change is 0 or positive. In the absence of further data on the behavior of real-world systems, it is unclear how
helpful such increased precision would be.

122122122

How can we know whether evolutionary activity is driven by stabilizing selection rather

than more interesting dynamics? If every component is experiencing directional (as opposed

to stabilizing) selection, the change metric I propose here should theoretically be comparable

to the number of components3. In contrast, if most of the population is under stabilizing

selection, the change metric should be very low.

Ultimately, the change metric (described in the next section) is in keeping with the

original evolutionary activity measurement, which sought to quantify the acquisition of new

genetic information (Bedau and Packard, 1992). For this reason, in this suite of metrics, I

replace the concept of evolutionary activity with change. I believe that this framing will be

easier to measure and interpret with little loss of information (although of course I encourage

the use of other measures of evolutionary activity where appropriate). My change metric

does have the downside of not being possible to usefully classify as bounded or unbounded.

Because I seek only to compare systems and identify progress toward higher levels of open-

endedness, this limitation should not be a problem.

7.2.2 Prior work using MODES

Soros (Soros, 2018) used a preliminary version of this framework (Dolson et al., 2015)

to study open-ended evolution in the artiﬁcial life system Chromaria. Agents in Chromaria

are colorful circles controlled by CPPNs that must ﬁnd a region of the world that matches

their color in order to reproduce. These agents can be classiﬁed into species based on their

patterns of coloration, and change and novelty can be assessed by measuring the emergence

of new species. Ecological interactions in Chromaria occur as a result of individuals planting

themselves in the world, which alters the color environment that subsequent agents must

navigate. Thus, Soros was able to measure the ecology of Chromaria through a series of
2The original formulation of evolutionary activity statistics used mean rather than median, but Channon
Channon (2003) makes a compelling argument for using median instead. Using median rather than mean
does not change any of the intuitions for how we expect this metric to behave and reduces the risk of
non-intuitive behavior due to outliers.

3However, the change metric may often be lower than the number of components because not every

component will change during every measurement period.

123

visual snapshots of the world, as well as by measuring the number of species that occur over

the course of a run. Lastly, she measured complexity in terms of the number of elements in

the CPPNs controlling the agents.

Using these MODES-inspired metrics, Soros investigated three hypothesized necessary

conditions for open-ended evolution: 1) some sort of minimal criterion (Soros and Stanley,

2014) must be met before reproduction, 2) when new types of individuals evolve, it should

create new ways to satisfy the minimal criterion, and 3) individuals should be responsible for

making decisions about how they interact with the world. By measuring hallmarks of open-

ended evolution under various controls that removed these conditions, Soros found strong

evidence that all of the conditions are indeed necessary for change and novelty (let alone

ecology and complexity) in Chromaria. These experiments perfectly illustrate the kind of

hypothesis-driven research that I hope a further formalization of these metrics will enable.

Additionally, they serve as an example of the range of approaches that can be taken to

translate these concepts between systems.

7.2.3 Applying MODES to biology

Since many hypotheses about open-ended evolution involve comparisons to the bio-

sphere, it is critical that MODES metrics are applicable not only to digital systems, but

are also relevant to experimental biological systems. To conﬁrm that they are, consider

how we would apply them to a well-studied wet-lab experiment. The Long-Term Evolution

Experiment (LTEE) (Lenski et al., 1991) is an exemplar of experimental evolution, con-

sisting of 12 populations of the bacteria E. coli, which have been evolving independently

for more than 60,000 generations (Good et al., 2017). As detailed in (Taylor et al., 2016),

the LTEE exhibits many hallmarks of open-ended evolution, including the criteria I propose

here. Because ﬁtness within the LTEE is best described by an unbounded power law function

(Wiser et al., 2013; Lenski et al., 2015), the system meets the change metric: populations

continue to change in non-trivial ways over time. Further, studies of individual populations

124

within the LTEE have shown numerous examples of the generation of novelty, including

exploration of new areas of the ﬁtness landscape (Tenaillon et al., 2016), repeated selec-

tive sweeps (Maddamsetti et al., 2015), and new diversity arising after such sweeps (Blount

et al., 2012). Toward the ecology metric, several populations within the LTEE demonstrate

frequency-dependent ﬁtness dynamics (Ribeck and Lenski, 2015; Rozen and Lenski, 2000;

Le Gac et al., 2012; Maddamsetti et al., 2015), which are necessarily cases of ecological in-

teractions. Included in these cases of frequency dependence is a special case (Blount et al.,

2008, 2012; Turner et al., 2015a) driven by cross-feeding and specialization on diﬀerent re-

sources (Turner et al., 2015b). Because each population in the LTEE descends from a single

ancestor present at the start of the experiment, all ecological complexity in any population

must have arisen during the course of the experiment, and thus satisﬁes the ecological met-

ric. The complexity metric is inherently harder to quantify in a biological system than in

a computational one, but recent large-scale genome sequencing from the LTEE (Tenaillon

et al., 2016) oﬀers the promise of being able to measure complexity at the genome level

over the course of the experiment. Because these metrics can theoretically be applied to a

well-studied and open-ended biological system, they can be used to compare dynamics in a

broad range of systems and enable the ﬁeld of artiﬁcial life to move forward in quantiﬁable

steps to open-ended evolution.

7.3 Metrics

Like the original evolutionary activity statistics (Bedau et al., 1997), the metrics I

present here can operate on a wide range of units. Often, these units will be genotypes or

phenotypes but in other cases they may by higher-level taxonomic groups, such as species.

To highlight this agnosticism, Bedau et. al referred to these units as “components”. I will

use the same terminology where applicable.

125

Figure 7.2: An illustrative example of how I ﬁlter components for persistent lin-
eages. At time point A, the purple lineage has proven to be persistent and therefore the
original component from A-t will be considered meaningful. Similarly, the green and blue
lineages persist to time point A+t and so the original green and blue components will be
considered meaningful as they existed at time point A.

7.3.1 Overarching techniques

I use two broad techniques to ensure that these metrics can focus on the most relevant

and meaningful information in an evolving population. Additionally, I describe a technique

for determining whether a metric is bounded or unbounded.

7.3.1.1 Filtering out noise

In any evolving population, mutations continually produce new maladapted components

that are then purged from the population via natural selection. All of the MODES metrics

assume that some form of ﬁltering has been applied before-hand, to prevent maladapted

components from overwhelming the hallmarks that I am measuring. Here, I describe a

persistence ﬁlter that I use in these experiments. However, shadow runs are also a viable

ﬁlter option, and there are likely further useful ﬁltering techniques that have not yet been

invented.

126

To focus only on the adaptive products of evolution, I limit my analysis to those com-

ponents whose descendants persist for a substantial number of generations. I refer to this

technique as a persistence ﬁlter. I mark each organism with a lineage ID at a given time

point A, as demonstrated in Figure 7.2 (where color indicates lineage ID). The lineage IDs

are passed on to oﬀspring for the next t generations, where t is a pre-determined number

of generations indicating the length of the ﬁltering process (hereafter referred to as ﬁlter

length). At time point A + t, I determine which components from the population at A have

descendants at A + t. At this point, those components are considered persistent; in the

example in Figure 7.2 the individuals at the bases of the green and blue lineages are con-

sidered persistent at time point A + t. These are the individuals that would be evaluated

in the MODES metrics. This ﬁltering leads to a delay in counting a component in a metric

until t generations later, but enables us to avoid an apparent increase in metrics due to drift

via mutation. For example, the red and orange components from time point A would not be

considered in these metrics because their lineages do not persist to time point A + t.

How large should t be? The correct value depends on our goals. If we are interested in

evolution on a shorter time scale, we may only want to ﬁlter out deleterious mutants, which

will likely only survive a few generations. In this case, a relatively small value of t should

suﬃce. Indeed, prior open-ended evolution research has used what is eﬀectively a persistence

ﬁlter with t = 1 as a supplemental ﬁltering technique (Channon, 2003). If we are interested

in a broader time scale, however, we may want to ﬁlter out neutral mutants too and measure

only adaptive evolution.

In this case, coalescence theory can inform the choice of t.

In

an asexual population without diversity-preserving forces, the population will periodically

“coalesce”, i.e. neutral clades will die out resulting in a new most-recent-common ancestor

of the current population. If we take a snapshot of such a population at any given point in

time and let the population continue evolving for long enough, a single individual from the

snapshot will eventually be a common ancestor of the entire extant population. I refer to

coalescence time here as the amount of time that this process takes, although it should be

127

noted that coalescence time is more commonly thought of retrospectively.

If we want to ﬁlter out all neutral mutants that do not go on to play a critical role in

evolution, it would be ideal to choose a value for t that falls above the expected distribution

for coalescence times. If we did so, then we could be conﬁdent that any individuals that made

it past the ﬁlter represented a meaningful part of the evolutionary history of the population.

If only a single individual makes it through the ﬁlter, that individual must be along the line

of descent for the entire population. Multiple individuals making it through the ﬁlter would

be evidence of ecological dynamics promoting their coexistence.

The median coalescence time for a well-mixed asexual population of N haploid indi-

viduals under no selective pressure is 2N generations (Fu and Li, 1999). Unfortunately, the

expected distribution of coalescence time is exponential, meaning that we would have to

choose a potentially impractically large value for t if we want to guarantee that it is rare to

get through the ﬁlter by chance. However, the presence of selective pressure dramatically

reduces expected coalescence time. Since most systems in which people study open-ended

evolution do have selective pressure of some form, in practice relatively low values of t are

still eﬀective ﬁlters.

For a meaningfully comparison across populations, we must ﬁlter them using consistent

values of t. We always expect ﬁlters with lower t values to let more individuals through,

and it is challenging to separate this eﬀect from changes in the underlying dynamics of the

system. Additionally, t must be measured in generations to ensure consistency in the amount

of ﬁltering that occurs. Researchers studying systems that use a diﬀerent time scale need

only calculate the average generations within the population to measure t.

In evaluating results, the ﬁeld should strive to use consistent values of t relative to

population size and be aware that, all else being equal, increasing selective pressure will

reduce the number of taxa that get through the ﬁlter.

This eﬀect brings up an important distinction between this ﬁltering technique and the

shadow run traditionally used with evolutionary activity statistics. Whereas shadow runs

128

ﬁlter out the eﬀect of neutral processes, the persistence ﬁlter does not entirely.

I view

this reduced ﬁltering primarily as an advantage – drift can be an important part of the

evolutionary process – but there may also be situations where it is undesirable. These

metrics are unable to distinguish between class 1 and 2 dynamics or between class 3 and 4b

dynamics (see Table 7.1), although they are able to distinguish between useful subcategories

within those classes (as discussed in the Change Metric section below).

7.3.1.2

Identifying meaningful sites in genomes

Because genomes are such a common unit of taxonomic organization to use as com-

ponents in open-ended evolution research, I present a technique for ﬁltering noise out of

genomes. Although this step is not necessary for using the MODES metrics, it will improve

the signal-to-noise ratio in a variety of common use cases and simplify the calculation of

complexity. I recommend its use where applicable.

While a genome may have descendants in t generations, if t is relatively small this

persistent genome may not be phenotypically diﬀerent from another persistent genome in

the population. To ensure that we are not separately counting genomes that diﬀer only

in non-coding regions, we use an additional ﬁlter in which we determine which sites in

the genome contain information about the environment. In calculating all of the following

metrics, we ﬁrst reduce the genome to its meaningful sites.

This approach can easily be extended to any system in which the genome is made up of

a set of elements that collectively determine ﬁtness. Whether or not a genomic position is

meaningful can be approximated by measuring the overall ﬁtness4 eﬀect of either removing it

or changing it to a null alternative that is known to not contribute information. If removing

the site resulted in a lower overall ﬁtness, that implies that the site contained information

(i.e. was meaningful). Conversely, if removing the site increased or had no eﬀect on ﬁtness

we can conclude that it is most likely not meaningful. We can then deﬁne a component as

4As deﬁned in the system being studied. If the system does not have a ﬁtness deﬁnition, average lifetime

reproductive output can be used.

129

the sequence of meaningful sites in a genome rather than the whole genome. By doing so,

we avoid treating functionally identical components as distinct.

When should we remove sites and when should we replace them with a null alternative?

A null alternative should be used in cases where changing the structure of the genome changes

the meaning of other sites. For example, in Avida it is critical that we replace instructions

with nulls rather than completely removing them because information can be encoded in the

number of instructions between two other instructions. A more accurate technique would be

to examine the ﬁtness eﬀect of substituting all possible alternative elements and calculate

the potential entropy at that site. When null substitutions are not possible, this technique

is an eﬀective method.

A caveat to this technique is that, in practice, there are interactions between sites.

By only knocking out a single site at a time, we miss these interactions. How to best

remedy this situation is an open question, as measuring all possible combined eﬀects is

computationally intractable. In many cases, measuring pairwise interactions is possible and

may be worthwhile. This issue will reduce the eﬃcacy of this approach at reducing noise,

because some functionally equivalent genomes will be classiﬁed as diﬀerent. When used for

calculating the complexity metric, it may cause fragile genomes to appear more complex

than robust ones.

Note that, although identifying informative sites can be computationally intensive, we

would need to do so anyway to calculate the complexity metric. As such, this additional

layer of ﬁltering is eﬀectively free.

7.3.1.3 Determining boundedness

In the design of these metrics, I have primarily focused on determining the eﬀect that

small changes to a system have on the extent to which that system exhibits hallmarks.

However, they can also be used to classify systems in much the same way that evolutionary

activity statistics do. As described in Table 7.1, this classiﬁcation requires determining

130

whether diversity is increasing without bound.

In addition, it would be informative to

determine whether complexity is growing without bound. In previous work, the deﬁnition

of boundedness in this context has been stated in terms of the limit of the suprememum of

diversity as time goes to inﬁnity (Bedau et al., 1998). While this is an excellent theoretical

deﬁnition, taking limits of empirical data as time goes to inﬁnity is generally not practical.

Previous applications of evolutionary activity statistics seem to determine boundedness based

on whether or not a line on a graph appears to be plateauing. This technique has the potential

to be misleading (Wiser et al., 2018).

Instead, I advocate the use of statistics to determine what mathematical model best

ﬁts the observed data. The pattern can then be classiﬁed as bounded or unbounded based

on the limit of the best-ﬁtting mathematical model. Such an approach has previously been

used to demonstrate that ﬁtness is following an unbounded growth pattern in a long-term

wet lab evolution experiment with E. coli (Wiser et al., 2013; Lenski et al., 2015).

7.3.2 MODES Metrics

7.3.2.1 Change Metric

Our ﬁrst metric focuses on whether the genetic makeup of the population is changing

in a non-trivial way. This metric will be above zero during adaptive evolution, including

situations where the population is returning to previous states, perhaps due to environmental

cycling. In the work presented here, I use a persistence ﬁlter (explained above) to ensure that

I mark a component as new only if its lineage full t generations. However, a diﬀerent ﬁltering

technique (e.g. a shadow run) could be used instead. For this comparison, I ﬁrst ﬁnd the

components from persistent lineages from generation A by determining which components

have descendants in generation A + t. In the example shown in Figure 7.2, the components

at the roots of the green and blue lineages would count as persistent. I then compare these

components to those found to have been from persistent lineages in the previous time point

(e.g. I would compare the roots of the blue and green lineages to the root of the purple lineage

131

in Figure 7.2). In this way, I create a sliding window to observe change in the population.

Note that the example in the ﬁgure assumes the resolution at which data are collected (i.e.

the number of generations between time points) is equivalent to the value of t, but this does

not need to be the case. It may be desirable to have a very long length of t but still gather

data frequently. In such a case, each time point is individually ﬁltered by looking ahead t

generations, but change is calculated by comparing the set of persistent taxa in the current

time point to the set of persistent taxa in the previous time point.

For those who ﬁnd it helpful, the change metric can be formalized with the following

equation:

0 if c ∈ F (cid:48)

1 otherwise

(cid:88)

c∈F

[h]change =

(7.1)

where F (cid:48) is the set of components from the previous time point that passed the ﬁlter,
F is the set of components from the current time point that passed the ﬁlter, and c is a

component in F .

While there is no change metric in the original conception of evolutionary activity statis-

tics, I expect that it will provide similar information to cumulative evolutionary activity

(Bedau et al., 1997). Change must be positive in systems exhibiting class 3 or higher evolu-

tionary dynamics, as these systems must all exhibit positive novelty. Class 1 systems may or

may not exhibit change; an evolving system that stagnates (e.g. many genetic algorithms)

would have zero change, whereas a completely neutral system where all change was caused

by drift would sometimes have a non-zero amount of change (depending on the value of t).

Class 2 systems would have non-zero change if they were cycling between ﬁxed states, but

not if they were purely the result of stabilizing selection.

132

7.3.2.2 Novelty Metric

The novelty metric measures how many components have evolved in the population

that have never been seen previously in the experiment. For this metric, I again ﬁlter

out components that do not have descendants in t generations, enabling us to focus on

meaningful novelty. As with change, I could have used a diﬀerent ﬁltering technique instead.

To measure novelty, I simply count how many components from persistent lineages have

never been in a previous time point’s persistent component pool. It is possible with this

metric for a component to evolve, but not persist, and therefore not be recorded in the

permanent history, but then evolve and persist at a later point and be counted as novel.

Once a component has been counted as novel, however, it is part of the permanent history

and will never be counted in the novelty metric again. Thus, while a component could be

delayed in being counted as novel, or not counted if it never persists, it will not be counted

twice. The novelty metric is functionally equivalent to Anew in evolutionary activity statistics
(Bedau et al., 1997).

The novelty metric can be formalized with the following equation:

(cid:88)

c∈F

novelty =

0 if c ∈ S

1 otherwise

(7.2)

where S is the set of all components that have ever passed the ﬁlter, F is the set of

components from the current time point that passed the ﬁlter, and c is a component in F .

7.3.2.3 Complexity Metric

The complexity metric measures the maximum complexity of any component found in

the entire population. There is still debate over how to best measure complexity and not all

approaches are viable for all component types. Here, I recommend an information-theoretic

approach which works well with the components described in the section on “Identifying

meaningful sites in genomes” above. Once the meaningful sites have been identiﬁed, they

133

can simply be counted to get a measurement of complexity; the value of the complexity

metric at a given time point is the highest observed count of informative sites across all

components in the population that make it through the ﬁlter. This metric is somewhat

crude and can be improved by using more advanced information-theoretic techniques where

all possible mutations are considered at each site. Ideally epistatic interactions between sites

would also be considered by measuring the ﬁtness eﬀects of knocking out combinations of

genes. Unfortunately, doing so is often not viable in practice.

There is no equivalent to the complexity metric in evolutionary activity statistics. How-

ever, as many believe growth in complexity to be an important hallmark of open-ended

evolution (Taylor et al., 2016), I feel it is a critical addition.

In particular, it would be

interesting to ﬁnd non-trivial systems that exhibit unbounded growth in complexity. I sus-

pect that such growth could only occur in systems exhibiting class 4b or 4c evolutionary

dynamics, as bounded evolutionary activity should imply bounded complexity (although the

converse is not true).

7.3.2.4 Ecological Metric

The ecological metric measures the amount of information in the population as a whole.

While components may not individually contain increasing amounts of information (as mea-

sured by the complexity metric), they could still be increasingly diverse and therefore contain

increased information collectively in the population. Ideally, we would measure this collec-

tive information by tracking the origin of each piece of information across all components in

the population and summing up the number of unique pieces of information. Unfortunately,

this approach is not computationally practical for many systems. As a proxy, we can look at

the diversity of post-ﬁlter genotypes (reduced to informative sites, where possible). Complex

ecologies in which multiple subsets of the population are using diﬀerent information about

the environment to survive are likely to be characterized by a relatively balanced distribu-

tion of individuals across the various successful phenotypes. Thus, I use Shannon entropy

134

(Shannon, 1948), a popular metric of diversity that also measures evenness, to measure the

diversity of the persistent genotypes and calculate the ecological metric. This metric is

equivalent to D in evolutionary activity statistics (Bedau et al., 1997).

The equation for Shannon entropy is:

ecology = −(cid:88)

c∈F

P (c) log2(P (c))

(7.3)

where c is a component in F , the set of components at the current time step that passed

the ﬁlter. P(c) is the proportion of F occupied by component c. This value gets higher when

the number of components in F increases and when the components occupy more equal

proportions of the population.

7.4 Experimental Design

I used two radically diﬀerent experimental systems in order to ensure both that these

metrics can be broadly applied and that they produce meaningfully consistent results. For

both systems, I used genomes as components.

7.4.1 NK Landscape

Our basic treatment used N = 20 (i.e., 20 bits in an individual) and K = 3 (the ﬁt-

ness contribution of each bit was inﬂuenced by three other bits). I used a population size

of 200 and a per site mutation rate of .05, with tournament selection and a tournament

size of 2. In addition to this baseline treatment, I tested the eﬀects of eight experimental

treatments: High K tests the eﬀect of a highly rugged landscape (K=10) where ﬁtness

is eﬀectively randomized whenever a mutation occurs. High N tests the eﬀect of longer

bit-string genomes (N=100; mutation rate was adjusted to 0.01 to keep the whole-genome

mutation rate consistent with the base condition), allowing for a higher potential complexity.

Low Mutation and High Mutation test the eﬀects of more extreme mutation rates (0.005

135

and 0.1 respectively); I expect the mutation rate to be important for ﬁnding new areas of

the ﬁtness landscape and thus the novelty metric. Small Pop and Large Pop vary the

population size (to 20 and 1000 respectively); in small populations I expect more drift in the

population, allowing more change, while in a large population I expect stronger selection and

consequently that a higher percentage of changes along the line of descent are beneﬁcial. Fi-

nally, I included two special treatments: in Oscillating Environment, the ﬁtness function

was toggled between two pre-deﬁned NK Landscapes every 500 generations, allowing us to

see the eﬀect of changing selective pressures where the populations was not able to stay on

a single peak. In Fitness Sharing organisms that were too similar to each other detracted

from each other’s ﬁtness, creating a pressure to explore multiple portions of the landscape

at the same time and, ideally, maintain a high diversity. I used the ﬁtness sharing equations

described by Goldberg and Richardson Goldberg et al. (1987), with a sharing threshold of

50 and an α of 1. For all experiments, I used a ﬁlter length (t) equal to the population size.

7.4.2 Avida

To understand how MODES metrics will behave in a full-featured artiﬁcial life system,

I tested them in Avida under a variety of scenarios. For all experiments, I used a well-mixed

population in order to speed up the expected rate of coalescence. All other parameters in

Avida were left at their default values. I ran experiments in both the minimal and Logic-9

environments.

Artiﬁcial life systems necessarily have constraints on the amount of time and memory we

can give them. It is important in open-ended evolution research to determine whether these

constraints are imposing practical limitations on the dynamics the system exhibits (Zaman,

2018). To do so, I ran experiments in each environment at three diﬀerent population sizes:

500, 1000, and 2000. In each condition, I ran 30 replicate runs of Avida.

Additionally, to understand how sensitive these metrics are to the choice of the ﬁlter

length (t) I conducted some additional experiments in the empty environment in which we

136

varied t. In general, since the Avida runs are so long, I aim to ﬁlter neutral mutants out with

the persistence ﬁlter, rather than just deleterious mutants. At each of the three population

sizes, we tried t values of 500, 1000, and 2000. To ensure that I always have data from a

ﬁlter length larger than population size, I also included a condition with a population size

of 2000 and a t of 4000.

7.4.3

Implementation Details

If not implemented with care, these metrics can become computationally intractable

in the context of the long experiments that open-ended evolution research often entails. In

particular, RAM requirements can become prohibitive. I provide a few high-level approaches

to mitigating these issues.

The largest memory cost is imposed by the novelty metric’s requirement that we keep

track of every taxon that has ever passed the persistence ﬁlter. Because we only need to

know when we encounter a repeat taxon (rather than storing an archive of all taxa we have

encountered), this cost can be dramatically reduced by using a Bloom ﬁlter (Bloom, 1970).

Although this approach does introduce a (tunable) risk of false negatives (i.e. miss-classifying

a novel taxon as not novel), this risk only makes the metrics more conservative.

The next largest cost is imposed by needing to keep track of the phylogeny over time.

In addition to standard phylogenetic pruning techniques (such as removing all taxa that do

not have extant descendants), all taxa that died before the current generation minus t can

safely be removed5. This optimization prevents the tree from growing without bound over

the course of the experiment.

Lastly, it is helpful to be aware that increasing t will reduce computational demands

by increasing the percentage of taxa that will be ﬁltered out. With these optimizations,

MODES metrics can be implemented with minimal overhead.

5An important caveat is that this approach will only work with a strictly increasing unit of time. In many
systems (including Avida) the average generation is not guaranteed to consistently increase. To support
such systems, the implementation of the metrics allows for time to be tracked using two units at once, one
corresponding to generation, and one that is guaranteed to be strictly increasing.

137

7.4.4 Statistical Methods

Eﬀect size measurements determine whether a treatment has a meaningful impact on

a variable. Because standard deviations varied wildly among conditions, I used Glass’s ∆

as a measure of eﬀect size (Hedges, 1981). As a general guideline for interpreting Glass’s

∆, a value of 0.2 is generally thought to be low while a value of 0.8 is generally thought to

be high (although this guideline is context dependent). Distributions of ﬁnal metric values

are visualized using rain-cloud plots (Allen et al., 2018). Statistical code and supplemental

statistical information is freely available (Dolson, 2018b).

7.4.5 Code Availability

All code used in this chapter is open source and freely available (Dolson, 2018b; Bryson

et al., 2018; Ofria et al., 2018).

7.5 Results and Discussion

To ensure that these metrics are capturing the dynamics that we want them to, I tested

them on a range of variants of the basic NK model and a range of conditions in Avida. The

preliminary results for each metric are presented here.

7.5.1 Change Metric

In the baseline and low mutation rate conditions for the NK Landscape, change is close

to 0 (see Figures 7.3 and 7.4), indicating that these metrics are capable of detecting the stag-

nation typical of many genetic algorithms. As shown in Figure 7.3, several environmental

changes increase the amount of meaningful change found in the NK Landscape popula-

tions over time. When organisms are forced to share ﬁtness between others with the same

genotype, the amount of change increases and remains higher than the baseline over time.

138

Figure 7.3: Amount of change at over time in varying NK Landscape environ-
ments. Fitness sharing increases the amount of change in the population over time. Con-
versely, a routinely changing environment leads to spikes in change that quickly drop as
the population converges again. Shaded region represents a bootstrapped 95% conﬁdence
interval around the mean.

Conversely, when the environment changes frequently, there is an initial spike of increased

change that quickly drops back down to the baseline value.

The majority of environmental conditions I tested in the NK Landscape system produced

dynamics over time qualitatively similar to the baseline treatment. In Figure 7.4, I show the

amount of meaningful change in populations at the ﬁnal time point in more environmental

conditions. A higher mutation rate leads to increased meaningful change (p < 0.0001,

Wilcoxon test; Glass’s ∆ = 0.80) because mutations are necessary to create any meaningful

change in this system. A smaller population size produces more meaningful change (p =

0.004, Wilcoxon test; Glass’s ∆ = 0.33) because a small population cannot hold as many

diﬀerent genomes at one time and therefore there are more genomes that can arise that are

diﬀerent than what is in the previous population. Finally, ﬁtness sharing produces increased

meaningful change (p < 0.0001, Wilcoxon test; Glass’s ∆ = 0.63) because it creates a

constant pressure for the population to adapt away from whatever is the current dominant

genotype.

In Avida, there is always at least a little change (see Figure 7.5. This observation is

139

Figure 7.4: Rain-cloud plot of change at ﬁnal generation across NK Landscape
treatments. Environmental conditions that increase the amount of change at the ﬁnal time
point include: increasing the mutation rate, decreasing the population size, and implement-
ing ﬁtness sharing. Black circle and line indicate mean and bootstrapped 95% conﬁdence
interval.

Figure 7.5: Rain-cloud plot of change at ﬁnal generation across multiple popula-
tion sizes and ﬁlter lengths in Avida in the empty environment. Labels along the
top indicate population size. Black circle and lines indicate mean and bootstrapped 95%
conﬁdence interval. Horizontal bar indicates change = 1, the expected average change in the
empty environment.

140

consistent with previous ﬁndings that ﬁtness in Avida increases indeﬁnitely (Wiser, 2015), as

an increase in ﬁtness implies both change and novelty. Based on coalescence theory, I would

expect change in the empty environment to usually be less than or equal to one, because it is

a single-niche environment. During each interval, either a ﬁtter genotype will arise and sweep

the population or the current ﬁttest genotype will remain dominant. Because the value of t

is not higher than the maximum expected coalescence interval, we should also expect to see

the occasional time point with change greater than one. These data are roughly consistent

with this expectation. In addition, there is a subtle downward trend in the change data,

likely due to the progressively increasing diﬃculty of ﬁnding beneﬁcial mutations.

As expected, increasing the ﬁlter length, t, decreases the amount of change observed

because fewer taxa are able to get through the ﬁlter (see Figure 7.5). In general, using a

value of t equal to population size seems to be an adequate ﬁlter. The conﬁdence interval

for the mean of these conditions always overlaps 1, indicating that a substantial amount

of ﬁltering is occurring. Using lower values of t begins to lead to substantial increases in

the variance of observed change. Using a higher value of t does further reduce noise, but

with diminishing returns. In the empty environment, population size does not appear to

have much impact on change, implying (unsurprisingly) that population size does not exert

pressure on evolutionary dynamics in this environment.

In the logic-9 environment, however, there is a slight increase in change as population

size increases, particularly early in the experiment (see Figure 7.6). Additionally, change

is generally a little higher in the logic-9 environment than the empty environment. This

distinction is an unexpected beneﬁt of using a value of t too low to guarantee coalescence.

Logic-9 is a single-niche environment, so if we chose a large enough value of t, we would

expect change to always be less than or equal to 1. However, logic-9 is a more complex

environment than the empty environment, which increases the odds that multiple lineages

will be able to keep evolutionary pace with other for a substantial amount of time. As such,

at the values of t I used, the change metric is able to reﬂect the fact that more is going on

141

Figure 7.6: Change over time across diﬀerent environments and population sizes
in Avida. Note that y axes have diﬀerent scales. In general, change is much higher in the
Logic-9 environment. Filter length, t, is equal to population size.

in the logic-9 environment than the empty environment.

While change is a metric often not considered in discussions of open-ended evolution,

these results show that the amount of meaningful change can reﬂect diﬀerences in the envi-

ronment and evolution of the populations and is likely a necessary dynamic for open-ended

evolution. The change metric responds in intuitive ways to variations in parameter settings,

suggesting that it is a reliable indicator of the dynamics I designed it to capture.

7.5.2 Novelty Metric

As shown in Fig 7.7, a higher mutation rate increases the amount of novelty generated

by the NK Landscape system. This result is to be expected, because more mutations make

it easier to cross ﬁtness valleys and otherwise traverse the ﬁtness landscape. Even at a high

mutation rate, novelty does start to decrease over time as the search space is explored.

I again found that the majority of treatments had a qualitatively similar trajectory over

time and therefore in Figure 7.8 I show only the ﬁnal novelty value. As predicted, the baseline

treatment has, with the exception of 1 replicate, stopped producing meaningful novelty by the

ﬁnal time point. Indeed, the only environment still reliably producing meaningful novelty

142

Figure 7.7: Amount of novelty over time in NK Landscape with varying mutation
rate. The novelty metric measures the number of completely new meaningful genomes that
have lineages that persisted since the previous time point. As the mutation rate increases,
more novelty is continuously produced. However, at all mutation rates the novelty decreases
over time. Shaded region represents a bootstrapped 95% conﬁdence interval around the
mean.

Figure 7.8: Rain-cloud plot of novelty at ﬁnal generation across NK Landscape
. Black circle and lines indicate mean and bootstrapped 95% conﬁdence
treatments.
interval. At the ﬁnal time point, little meaningful novelty is found in the baseline populations.
Only increasing the mutation rate increases novelty, implying that other conditions with high
change are simply promoting cycling between a ﬁxed set of genotypes.

143

is the high mutation rate condition (p = 0.003, Wilcoxon test; Glass’s ∆ = 0.3). High

mutation rates produce ongoing novelty by shifting the mutation-selection balance such that

drift is able to preserve novel lineages for longer than t. Notably, ﬁtness sharing does not

increase the ﬁnal novelty (p = 0.333, Wilcoxon test), presumably because it is promoting

cycling among previously-discovered genomes.

Like NK Lanscapes, Avida shows gradually declining novelty over time (although novelty

in Avida declines more slowly) (Dolson, 2018b). This trend presumably reﬂects the declining

availability of beneﬁcial mutations. Interestingly, the novelty and change graphs from Avida

look almost identical, implying that there is little cycling among previously discovered solu-

tions. This result, too, is consistent with the previously observed indeﬁnite ﬁtness increases

in Avida (Wiser, 2015) - if the population is always getting ﬁtter, genotypes from earlier in

the run would be unlikely to survive in the present.

These results highlight the power of the novelty metric to identify environments and pop-

ulations that have the potential to be open-ended due to the high number of new genotypes

being consistently discovered. Novelty is likely necessary, but not suﬃcient, for open-ended

evolution because if nothing new is being produced by a population, neither the complexity

nor the ecological metric can be non-zero.

7.5.3 Complexity Metric

In an NK Landscape populations, organisms cannot evolve to be more complex than

N. As a result, the complexity increases over time and then saturates. High mutation rate
(p < 0.0001, Wilcoxon test; Glass’s ∆ = −2.6) and small population size (p < 0.0001,
Wilcoxon test; Glass’s ∆ = −.23) reduce complexity because they shift mutation-selection
balance to make staying on a ﬁtness peak more challenging (see Figure 7.9). All other

treatments were able to achieve maximal complexity fairly reliably. Note that, due to the

rough nature of the complexity metric, this merely indicates the presence of an individual on

a ﬁtness peak. I cannot distinguish between higher and lower peaks. The high N treatment

144

Figure 7.9: Rain-cloud plot of complexity at ﬁnal generation across NK Landscape
treatments. Black circle and lines indicate mean and bootstrapped 95% conﬁdence interval.
Note that I have excluded the high N condition from this graph because it throws oﬀ the
axes. Most of the treatments reach the maximum complexity allowed by the genome length
(20 or 100) and cannot continue to increase. High mutation rate and smaller populations
decrease the ﬁnal complexity achieved by the populations on average.

was, unsurprisingly, able to attain a drastically higher complexity due to its increased upper

bound on complexity.

The NK Landscape results demonstrate that the complexity metric correctly identiﬁes a

system that is not able to continuously produce more complex solutions. Once the maximum

complexity allowed by the genome length is reached, no higher value is possible.

In Avida, the complexity metric reveals a stark diﬀerence between the empty and logic-9

environments. In the empty environment, there is a rapid rise in complexity followed by a

decrease and leveling out. This behavior is likely due to the strong pressure in Avida to

become a more eﬃcient self-replicator by optimizing code. Over time, simpler solutions are

selected for, all else being equal. In the logic-9 environment, on the other hand, there is an

ongoing upward trajectory in complexity. While logic-9 still rewards eﬃciency, algorithms

that can make maximally eﬃcient use of the tasks are complex. These results are consistent

with other measurements of complexity in Avida over time (Adami et al., 2000).

In the biosphere, complexity appears to be growing without bound (Korb and Dorin,

2011), although there is debate over the mechanisms behind this process. As such, building a

145

Figure 7.10: Complexity in Avida over time across diﬀerent environments and
population sizes. Note that y axes have diﬀerent scales. In general, complexity appears to
continue increasing in the Logic-9 environment, whereas it drops and then stabilizes in the
empty environment. Filter length, t, is equal to population size.

non-trivial system that exhibits such behavior is a worthwhile goal for open-ended evolution

research. Unbounded growth in complexity is only possible in a system with a suﬃciently

complex environment such that there is always new information to be integrated into the

genome.

7.5.4 Ecology Metric

Across both systems, the only condition that creates a multi-niche environment is the

ﬁtness sharing condition in NK Landscapes. Accordingly, that is the only condition in which I

observe ecology signiﬁcantly above the baseline (p < 0.0001, Wilcoxon test; Glass’s ∆ = 0.91)

(see Figure 7.11). Because ﬁtness sharing speciﬁcally rewards organisms with less common

genotypes, it promotes a stably high ecology value over time. This result demonstrates the

trade-oﬀ inherent in ﬁtness sharing because it leads to higher ecology at the expense of lower

complexity. Interestingly, in the Avida data, I see a consistent low level of ecology across

all conditions (Dolson, 2018b). I hypothesize that this slight increase over NK landscapes is

due to the noise introduced by Avida’s variable generation times.

Ecology is an incredibly powerful force in nature, leading to feedback cycles of ever-

146

Figure 7.11: Rain-cloud plot of ecology at ﬁnal generation across NK Landscape
treatments. Black circle and lines indicate mean and bootstrapped 95% conﬁdence interval.
Fitness sharing is the only condition that reliably produces ecology.

increasing diversity. Like complexity, diversity seems to be growing without bound in the

biosphere (Harmon and Harrison, 2015). Thus, it will be important to see what mechanisms

are important to promoting it in artiﬁcial life systems, too.

7.6 Conclusions

I have proposed a suite of metrics that quantify the presence of four generally-accepted

hallmarks of evolution. These metrics build on prior work with evolutionary activity statis-

tics and are largely compatible with them. Additionally, I have proposed techniques for

reducing noise in these statistics. By testing them on two very diﬀerent well-understood

evolutionary systems, I have demonstrated that these metrics respond in an intuitive way

to the dynamics these systems exhibit. Thus, these metrics should also be useful in un-

derstanding the extent to which novel systems exhibit hallmarks of open-ended evolution.

Moreover, we can use them to understand the impact of incremental changes to a system. By

breaking the seemingly monolithic problem of designing an open-ended evolutionary system

into smaller, measurable pieces, we facilitate improved use of the scientiﬁc method. One

of the primary goals in building an open-ended evolutionary system is to understand the

147

underlying components that are necessary to do so. By measuring the eﬀects that controlled

changes to a system have on this suite of metrics, we can more productively work toward

these goals.

Going forward, it will be interesting to see how a wider variety of artiﬁcial life systems

respond to the metrics in the MODES toolbox. In particular, further investigation on the

diﬀerences between using a shadow run as a ﬁlter and using the persistence ﬁlter described

here would be worthwhile. Ultimately, these two techniques capture suﬃciently diﬀerent

information that it may be valuable to use each in turn.

For a long time, the ﬁeld of open-ended evolution has been plagued by a lack of data

that can be meaningfully compared across systems. I believe that the MODES toolbox will

help remedy this problem by making useful metrics easily accessible. As new hallmarks of

open-ended evolution are identiﬁed and new techniques of reducing noise are developed, I

encourage contributions to the toolbox.

It may also be worthwhile to complement these

high-level metrics that screen for the presence of hallmarks of open-ended evolution with

a suite of lower-level metrics that provide more mechanistic insight into why hallmarks are

or are not being observed (Dolson et al., 2018b). By working together as a community of

users, researchers, and developers, we can dramatically increase the rate at which open-ended

evolution research progresses.

148

Chapter 8

Interpreting the Tape of Life:

Ancestry-based Analyses Provide

Insights and Intuition about

Evolutionary Dynamics

This chapter is adapted from a paper in the Artiﬁcial Life Journal (Dolson et al., 2019a).

8.1 Introduction

Evolution is a collective eﬀect of many smaller events such as replication, variation,

and competition that occur on a ﬁne-grained temporal scale. While evolution’s emergent

nature can be fascinating, it also presents challenges to studying the short-term mechanisms

that, in aggregate, govern long-term results. In computational evolutionary systems, we can

theoretically collect data to help untangle these mechanisms. In practice, however, the sheer

number of constituent events produce an overwhelming quantity of data. In response, I have

developed a standardized suite of diagnostic metrics to summarize short-term evolutionary

149

dynamics within a population by measuring lineages and phylogenies. Here, I describe these

metrics and provide experimental results to develop an intuition for what they can tell us

about evolution.

A lineage describes a continuous line of descent, linking parents and oﬀspring in an

unbroken chain from an original ancestor. A complete lineage can provide a post-hoc, step-

by-step guide to the evolution of an extant organism where each step involves replication

and inherited variation. Indeed, lineage analyses are a powerful tool for disentangling evo-

lutionary dynamics in both natural and digital systems; digital systems, however, allow for

perfect lineage tracking at a level of granularity that is impossible in modern wet lab exper-

iments. These data allow us to replay the tape of life in precise detail and to tease apart the

evolutionary recipe for any phenomenon of interest (McPhee et al., 2016b). In one notable

example, Lenski et al. used the lineage of an evolved digital organism in Avida to tease apart,

step by step, how a complex feature (the capacity to perform the equals logical operation)

emerged (Lenski et al., 2003).

Yet, tracking the full details of a single lineage, much less a population of lineages,

can be computationally expensive and will inevitably generate an unwieldy amount of data

that can be challenging to visualize or interpret (McPhee et al., 2016a). Summary statistics

can help alleviate these issues by enabling the user to focus on aggregate trends across a

population rather than needing to examine each individual’s lineage. The question is how

to eﬀectively summarize a path through ﬁtness space. One useful abstraction is to treat the

path as a sequence of states. Here, we primarily use phenotypes and genotypes as the states

in the sequence, but we could just as easily use some other descriptor of the lineage at a

given point in time. With this abstraction in hand, a few metrics are easily formalized: the

number of unique states, the number of transitions between states, and the amount of time

spent in each state. Additionally, we may care about how the transitions between states

happened. What mutations led to them? Were those mutations beneﬁcial, deleterious, or

neutral at the time? These mutations are particularly notable because they did not simply

150

appear brieﬂy, but stood the test of time, leaving descendants in the ﬁnal population. Here,

I explore a subset of these metrics that I expect will be broadly useful.

Whereas a lineage recounts the evolutionary history of a single individual, a phylogeny

details the evolutionary history of an entire population. Measurements that summarize

phylogenies can provide useful insight into population-level evolutionary dynamics, such

as diversiﬁcation and co-existence among diﬀerent clades. A variety of useful phylogeny

measurements have already been developed by biologists (Tucker et al., 2017). These mea-

surements tend to treat the phylogeny as a graph and make calculations about its topology.

Tucker et al. group them into three broad categories: assessments of the quantity of evolu-

tionary history represented by a population, assessments of the amount of divergence within

that evolutionary history, and assessments of the topological regularity of the phylogenetic

tree. Such measurements can help quantify the behavior of the population as a whole, pro-

viding insight into interactions between its members. Thus, they are useful indicators of the

presence of various types of eco-evolutionary dynamics.

Here, I present three types of lineage measurements and suggest adopting four phylogeny

measurements from biology; these are lineage length, mutation accumulation, phenotypic

volatility, depth of the most-recent common ancestor, phylogenetic richness, phylogenetic

divergence, and phylogenetic regularity. For each metric, I discuss its application and my

expectation for what it can tell us about evolution. I evaluate my intuition in two computa-

tional contexts: ﬁrst, on a set of two-dimensional, real-valued optimization problems under

a range of mutation rates and selection strengths and, second, on four qualitatively diﬀerent

environments in the Avida Digital Evolution Platform (a minimal control environment, an

environment that rewards solving nine Boolean logic functions, an environment with lim-

ited resources, and a simple changing environment). For simplicity, I restrict my attention to

asexually reproducing populations; however, I suggest techniques for extending these metrics

to sexual populations.

There are many visual analysis tools that operate on lineages and phylogenies. These

151

data visualizations can provide further insight into evolutionary dynamics within a run. In

this chapter, I discuss some approaches to visualizing lineages and phylogenies; I use these

tools to build intuition about the behavior of these metrics.

In addition to demonstrating a range of analysis tools that are useful to digital evolution

research, I intend for this work to begin a conversation within the artiﬁcial life community

about how we quantify, interpret, and compare observed evolutionary histories. There have

been extensive eﬀorts to improve our ability to represent and visualize both lineages and

phylogenies (Standish and Galloway, 2002; Burlacu et al., 2013; McPhee et al., 2016b,a;

Lalejini and Ofria, 2016), which are indispensable for building intuitions and qualitatively

understanding the dynamics embedded in a population’s evolutionary history. However, I

am unaware of eﬀorts to formalize a suite of quantitative lineage and phylogeny-based met-

rics for computational evolution. Lineage and phylogeny-based analyses in artiﬁcial life are

often a component of a larger set of experiment-speciﬁc analyses or are limited to qualita-

tive descriptions and/or visualizations. Well-deﬁned metrics not only provide valuable tools

for teasing apart evolutionary dynamics, they also allow us to move away from exclusively

qualitative descriptions of results toward a deeper quantitative understanding. This under-

standing, in turn, facilitates rigorous statistical analyses and hypothesis testing as well as

comparison of evolutionary dynamics across diﬀerent digital evolution systems.

8.2 Metrics

8.2.1 Lineage Metrics

Each of the three lineage metrics that I discuss — lineage length, mutation accumulation,

and phenotypic volatility — reduces a lineage to a linear sequence of states where each

state represents an individual or sequence of individuals that share a common genotypic

or phenotypic characteristic of interest; Figure 8.1 is given as a toy example to help guide

discussion of these metrics. While I limit my focus to three lineage metrics, this abstraction

152

Figure 8.1: Four methods of representing a lineage. This example lineage has accu-
mulated three mutations (one reverse mutation and two substitutions) and gone through
three distinct phenotypes. In (A), each state along the lineage represents a single individual;
lineage length is the number of generations spanned by the lineage (eight). In (B), states
represent the sequence of genotypes along the lineage, reducing lineage length to four. In (C)
states represent the sequence of phenotypes along the lineage; lineage length is the number
of times a diﬀerent phenotype is expressed (three). In (D), states are a particular phenotypic
characteristic; here, lineage length is two.

places lineages in a form suitable for a wide range of measurements, including the direct

application of many data mining techniques designed to operate over sequences such as

sequential pattern mining, trend analysis, et cetera (Han et al., 2011).

Only asexual lineages where genetic material is exclusively vertically transmitted can

be directly abstracted as a linear sequence of states. Sexual reproduction (and any form of

horizontal gene transfer) complicates matters signiﬁcantly as such lineages are more appro-

priately represented by trees rooted at the extant organism, branching for each contributor

of genetic material. For the metrics I present here, I limit discussion to asexual popula-

tions; however, I suggest three approaches for generalizing these lineage metrics to sexual

populations:

1. Build lineages based on sites in a genome. For genetic representations composed

153

of multiple constituent parts that are inherited atomically (i.e. sites in a genome),

the complication of sexual lineages can be avoided by tracking the lineages of these

parts. Genomic sites will only ever have a single parent, so their lineages are eﬀectively

asexual. An organism could then be viewed as a collection of lineages rather than a

single one.

2. Apply a lossy compression to reduce sexual lineages into linear sequences.

By modeling sexual reproduction events as asexual reproduction events, it is possible to

compress sexual lineages into linear sequences of states. One parent is designated to be

a part of the lineage and the genetic contributions of other parents are considered to be

sources of genetic variation (mutations). The primary downside to this approach is its

lossy-ness (i.e., the fact that it discards potentially important parentage information).

3. Extend the metrics to operate over non-linear (tree) structured lineages.

Alternatively, we can extend these metrics to operate over the more complex state

sequences that constitute the lineages of sexually reproducing organisms. One such

approach would be to consider all possible ancestor paths for an extant individual,

calculate a given metric for each of them and then average the resulting values together.

Assessing the eﬃcacy of these and potentially other approaches would be a useful line

of research to pursue in the future.

8.2.1.1 Lineage Length

Lineage length describes the number of states traversed by a lineage. If a state is deﬁned

as a single individual, lineage length is a count of the number of generations. Generation

count is most useful in systems where generational turnover is not ﬁxed, but instead de-

termined by the life history strategies of organisms. For lineages that span equal lengths

of time, more generations imply faster replication rates (e.g., r-selected lineage) while fewer

generations imply slower replication rates (e.g., K-selected lineage).

154

Lineage length becomes a more ﬂexible and informative metric if we consider more

abstract deﬁnitions of states along a lineage. We might measure lineage length where a

state represents a sequence of individuals that share a particular phenotypic or genotypic

characteristic. In these cases, lineage length only increases when the characteristic of interest

changes from parent to oﬀspring. For example, in an environment where organisms must

perform functions to be successful, we might deﬁne state as the set of functions performed by

an individual. In this scenario, lineage length would only increase when the set of functions

performed by an ancestor changes; sequential ancestors that perform the identical sets of

functions would be compressed into a single state in the sequence, even if other traits diﬀer.

8.2.1.2 Mutation Accumulation

Mutation accumulation deﬁnes a set of measurements that track mutational changes

across a lineage. These changes can be measured as the magnitude of the change (for real-

valued genomes) or as the total count of changes (for discrete-valued genomes). Mutation

eﬀects can also be tracked to gain insights about their distribution along a given lineage.

Measures of mutation accumulation along the lineages of successful individuals can help tease

apart the relative importance of diﬀerent types of mutational events when compared to what

is expected by chance.

In conjunction with collected ﬁtness information, the class of a mutation (e.g., ben-

eﬁcial, deleterious, or neutral) can also be tracked. Diﬀerent evolutionary conditions are

expected to cause diﬀerent distributions of mutations along a lineage (Barrick and Lenski,

2013); deviations revealed by measures of mutation accumulation can act as a barometer

for unexpected evolutionary dynamics. The number and magnitude of deleterious mutations

along a lineage can tell us both about the ruggedness of the ﬁtness landscape, and about a

lineage’s ability to cross ﬁtness valleys (Covert et al., 2013). Similarly, an elevated measure of

neutral mutations relative to beneﬁcial or deleterious mutations can suggest that the ﬁtness

landscape has neutral space that the lineage is spending most of its time drifting around.

155

8.2.1.3 Phenotypic Volatility

Phenotypic volatility addresses the rate at which phenotype changes as you move down a

lineage (although the same concept can be applied to speciﬁc phenotypic traits or other types

of state). In systems with discrete/categorical phenotypes, this can be measured by summing

the number of times the phenotype changes. A related but subtly diﬀerent measurement in

such systems is the number of unique phenotypes on a lineage. In most cases, these values

will be similar; a discrepancy would suggest that the lineage was cycling through a set of

phenotypes. Such behavior could, for example, be indicative of some form of evolutionary

bet-hedging (Beaumont et al., 2009).

In systems with continuous-valued phenotypes, a subtly diﬀerent approach is needed to

measure phenotypic volatility, because there are no discrete state transitions. Instead, we

can measure the overall variance in phenotype along a lineage. In some cases, it may be

desirable to smooth out the noise inherent in a real-valued phenotype. We can do so by

instead taking the variance of the moving average of ﬁtness, to more closely approximate the

idea of measuring phase transitions.

8.2.1.4 Summary Statistics

Each of these metrics can be calculated for each member of the population at each time

step. Doing so, however, would produce an amount of data so large that it would be diﬃcult

to make sense of. Instead, we need to come up with ways to generate useful summaries.

There are two main approaches to doing so: 1) choose a small number of representative

lineages from a given time point, or 2) collect summary statistics about the distribution of

metric values across the population.

A single lineage can be chosen by selecting the lineage of a representative organism

(either the most ﬁt or the most numerous). In populations where diverse strategies coexist,

this approach can be uninformative as any one lineage is unlikely to be representative of all

successful lineages. One alternative is to ﬁlter out lineages that do not have oﬀspring some

156

predetermined number of generations later as such lineages were likely not representative of

an important subset of the population. Still, any approach based on measuring only a subset

of lineages can be challenging to interpret when the current dominant lineage (or lineages)

is replaced with a diﬀerent one; such changes can introduce a discontinuity if the value is

being measured over time.

If graceful responses to changes in which lineage is dominant

are required, it can be advantageous to instead measure summary statistics (e.g., mean,

variance, and range) across the entire population.

In scenarios with frequent selective sweeps, the dominant lineage will likely be similar to

the average lineage, as most of the population will be closely related. When the population

contains more phylogenetic diversity, however, the dominant lineage may diﬀer from the

mean. Of course, the nature of such diﬀerences is likely informative about the evolutionary

dynamics occurring in the population.

8.2.2 Phylogeny Metrics

These metrics operate on entire phylogenies rather than single lineages within a popula-

tion, eliminating the need to identify a representative organism or lineage. Because they use

data from the entire population, phylogeny metrics can be more computationally expensive

to calculate than single lineage metrics. On the other hand, because most lineages tend

to share substantial history, phylogeny metrics can usually be calculated more rapidly than

full-population lineage metrics. Note that phylogenies can be constructed with regard to

any taxonomic level of organization, be it individual, genotype, phenotype, et cetera. Thus,

when I refer generally to items in a phylogeny, I will use the term taxa.

A standard technique for saving memory and time when working with phylogenies in

computational systems is to “prune" them, removing dead (extinct) branches. Since all of

the phylogeny metrics I discuss here are borrowed from natural systems (where we do not

have information about taxa without oﬀspring), they all are designed to work on pruned

phylogenies. Thus, for the remainder of this chapter, I will assume we are working with

157

pruned phylogenies.

In populations without ecological forces promoting coexistence, phylogenies should co-

alesce periodically, resulting in pruned lineages that mostly consist of a single path. When

there is strong selection, this coalescence should happen even more rapidly. Thus, phylo-

genies with topologies that deviate from that expectation are an indication of ecological

interactions within the population. The metrics discussed here can provide insight into the

nature of those interactions and their long-term evolutionary eﬀects (Dolson et al., 2018d).

As a result, they are often referred to as phylogenetic diversity metrics (Tucker et al., 2017).

An important distinction between phylogenies in natural versus computational systems

is that natural phylogenies are generally inferred from extant taxa, whereas computational

phylogenies are directly recorded. Inferred phylogenies do not contain internal nodes except

at branch points. They also do not contain history prior to the most recent common ancestor

(MRCA) of all extant organisms. For consistency, I exclude pre-MRCA taxa from these

analyses. However, I will not remove non-branching internal nodes, as these only serve to

make the phylogenies more informative.

Here I provide a high-level summary of phylogeny metrics that I expect will be partic-

ularly useful. For more metrics and more detail on all of these metrics, see (Winter et al.,

2013; Tucker et al., 2017).

8.2.2.1 Depth of Most-Recent Common Ancestor

The depth of the MRCA (i.e., the number of steps it is from the original ancestor) is

an informative metric and is easy to calculate. A recent MRCA implies frequent selective

sweeps and less long-term stable coexistence between clades. Measuring the frequency with

which the MRCA changes (i.e., the number of coalescence events) can also be informative,

as some conditions can inﬂate the length of the lineage relative to other conditions without

actually increasing the frequency of selective sweeps. This scenario is particularly likely when

the population size is changing over time. A downside to the depth of MRCA as a metric is

158

that any population that does have a stable ecology will likely never change its MRCA after

the very beginning of evolution (which at least allows us to detect stable coexistence in the

population).

8.2.2.2 Phylogenetic Richness

Measurements of phylogenetic richness quantify the total amount of evolutionary history

contained in a set of taxa. The most traditional metric of phylogenetic richness is “Phyloge-

netic Diversity", which is calculated as the number of nodes in the minimum spanning tree

from the MRCA to all extant taxa (Faith, 1992). Another approach is to calculate the pair-

wise distances between all taxa and sum them (Tucker et al., 2017). A third approach is to

sum evolutionary distinctiveness, a measurement of a taxon’s evolutionary uniqueness (Isaac

et al., 2007), across all extant taxa (Tucker et al., 2017).

8.2.2.3 Phylogenetic Divergence

Measurements of phylogenetic divergence quantify how distinct the taxa in the popu-

lation are from each other and are often averaged across individual taxa. For example, one

option is to average the pairwise distances across all taxa in the population (Webb and Losos,

2000). Similarly, phylogenetic divergence can be calculated by averaging the evolutionary

distinctiveness across each taxon in the population.

8.2.2.4 Phylogenetic Regularity

Measurements of phylogenetic regularity quantify how balanced the branches are in

a phylogeny and are often the variances of values calculated for individual taxa. Just as

the mean of the pairwise distances between all taxa in the population is a measurement of

phylogenetic divergence, taking their variance is a measurement of phylogenetic regularity.

The same is true of the variance of evolutionary distinctiveness across the population.

159

Figure 8.2: Schematic representation of state sequence visualization. Colors indicate
diﬀerent states. The optional legend along the side can be used to indicate any information
relevant to understanding the drivers of the state changes.

8.3 Visualizations

Data visualization is a critical component of data analysis. Summary measurements

can be misleading (Anscombe, 1973), but we often have too much data to digest from purely

numerical values. Visualizations allow us to structure data in a way that is more intuitive

to the viewer. Through this lens, we can conﬁrm that our intuitions about the drivers of a

system’s behavior are correct. In this section, I will present some visualization techniques

that are useful for understanding lineages and phylogenies. These visualizations have the

power to substantially strengthen our intuitions about evolutionary systems. Later, I will

use these visualizations to ensure that these metrics are capturing the information intended.

The idea that any lineage can be abstracted into a sequence of states (see Figure 8.1) is

important for visualization as well as quantitative measurements. We must be careful about

choosing the right level of abstraction; using too low a level can obscure broader patterns,

whereas too high a level loses the pattern entirely. Furthermore, generating some of these

visualizations at all becomes intractable with too little abstraction.

160

Figure 8.3: A lineage drawn on top of a ﬁtness landscape

8.3.1 State Sequence Visualizations

A direct consequence of the fact that lineages are just sequences of states is the fact

that we can visualize these sequences (Lalejini and Ofria, 2016). Each sequence can be

represented as a series of stacked rectangles. The colors of these rectangles indicate which

state they represent and their heights represent how long the lineage stayed in that state (see

Figure 8.2). Placed side-by-side, these sequences provide an aggregate visual representation

of the behavior of lineages from a given run or experiment.

8.3.2 Fitness Landscape Overlays

In order to understand how a particular evolutionary algorithm performs on a given

problem, we can observe how that algorithm moves a population across the problem’s ﬁtness

landscape (Kauﬀman and Levin, 1987). A ﬁtness landscape can be thought of as a map

from solution representations into the quality of that solution (its ﬁtness) (Wright, 1932).

Traditionally, ﬁtness landscapes have most commonly been visualized as three-dimensional

surface plots illustrating the eﬀects of two continuously-varying traits on ﬁtness (e.g., Li

et al., 2013, or Figure 2.1 of this chapter). One of these traits is the x-axis, one is the y-axis,

and the ﬁtness conferred by that combination of values is the z-axis. Although I will focus

here on ﬁtness landscapes that can be fully depicted in three dimensions, it is important

161

to recognize that most real-world ﬁtness landscapes have far more dimensions. Various

dimensionality-reduction techniques can be applied to reduce them to three dimensions, but

doing so may be misleading (Gavrilets, 2010).

We can visualize the path that each lineage takes through the ﬁtness landscape, mapping

the x, y, and z (ﬁtness) coordinates of each ancestor of each member of the population (Virgo

et al., 2017). In some cases, it may also be helpful to overlay lineages on top of other spaces

(Dolson and Ofria, 2017; Mouret and Clune, 2015). Creating such a visualization entails

condensing a large quantity of information into a limited space. When projected onto two

dimensions, lineages can obscure and be obscured by parts of the ﬁtness landscape (and each

other).

Fortunately, we are entering an era where we no longer need to squeeze this information

into two dimensions, conﬁned to a computer screen or piece of paper. Over the past few

years, virtual reality technology has advanced to the point where it is a viable tool for data

visualization (van Dam et al., 2002; Olshannikova et al., 2015). To this end, I used the A-

Frame framework (A-Frame authors, 2018) to build a three-dimensional data visualization

that can be viewed in virtual reality (see Figure 8.6) (Dolson and Ofria, 2018).

A-Frame supports building 3D scenes and rendering them to a variety of platforms.

In the simplest case, the visualization is rendered in WebGL, allowing it to be viewed in

a standard web browser. Mouse interactions such as rotation make it possible to view the

visualization from all angles, and WebGL’s use of the graphics card allows it to render data-

rich visualizations. A-Frame also supports rendering the page with WebVR, allowing it to be

viewed using various virtual reality headsets. These platforms allow the user to explore the

data in three dimensions. For the data interpretation in this chapter, I used an Oculus Rift

to provide myself with ﬁne-grained control of which part of the visualization I was looking

at.

Scenes in A-Frame are built by layering components on top of each other. The base

component of this visualization is a three-dimensional surface plot of the ﬁtness landscape.

162

On top of this layer, I overlay a three-dimensional path from an individual that existed at

the end of an evolutionary run all the way back to its earliest ancestor, passing through the

locations of all intermediate ancestors along the way. A path can be added for each extant

member of a ﬁnal population to produce an entire overlaid phylogeny.

To make visualizations more interpretable, I introduced a color gradient along the lin-

eage; in the visualization presented here, I use a grayscale lineage drawn onto a colorful

landscape (see Figure 8.3). Lineages transition from white to black as evolutionary time

progresses, indicating when each portion of the lineage existed.

Our full visualization, complete with data, can be viewed on the web or using a virtual re-

ality headset at https://emilydolson.github.io/fitness_landscape_visualizations.

8.3.3 Phylogenetic Trees

Phylogenetic trees are a time-honored data visualization in biology. They depict the

parent-child relationships of taxa over the course of evolution (see Figure 8.4A). As with

phylogeny metrics, memory usage is a primary concern in constructing these visualizations,

as phylogenies can be very large. Applying pruning and choosing more abstract taxonomic

units (as discussed in the phylogeny metrics section) are the best ways to mitigate this

problem.

While simply drawing the phylogenetic tree is valuable in its own right, a lot of additional

information can be added to these visualizations. Nodes in the tree can be colored based

on features of the corresponding taxon, such as phenotype or ﬁtness. Edges can be colored

based on information about the transition between taxa, such as the types of mutations that

occurred or the direction of ﬁtness change. Note, though, that in large trees coloring nodes

and edges the same way often makes patterns easier to see.

Another way to convey additional information is in the placement of nodes. In biology,

the length of edge between nodes is generally used to indicate the amount of time that

elapsed between the origin of taxa. This convention also results in contemporaneous nodes

163

Figure 8.4: The same phylogeny depicted in two diﬀerent ways. A) Depicts a stan-
dard phylogeny, where nodes represent taxa and edges indicate parent-child relationships.
In this phylogeny, node position along the time axis represents time of birth. Nodes are
colored to distinguish taxa that are extant in the present (blue) from extinct taxa (black).
Note that, because this is a pruned phylogeny, all leaf nodes are currently extant. Edges can
be colored to convey whatever information is most useful. B) Depicts the same phylogenetic
tree but with boxes indicating taxon lifespans in place of circular nodes.

being placed at the same level of the tree; in essence, the tree has a time axis running from

top to bottom. In Artiﬁcial Life systems, we can take this idea a step further. Since we know

the complete time span that a taxon existed for, we can depict it on the phylogenetic tree

by plotting rectangles in place of circular nodes (see Figure 8.4B). The tops and bottoms of

these rectangles correspond to the birth and death times of the taxon along the time axis.

While this approach runs the risk of producing an overly-complicated plot, the boxes can

yield wildly diﬀerent intuitions than circles would.

164

8.3.4 Muller Plots

Muller plots show the abundance and ancestry of taxa in a population over time (Muller,

1932; Maddamsetti et al., 2015). The percentage of a population occupied by each taxon

at a given time point is depicted via a stacked-area plot – the vertical space of the plot

is occupied by a stack of colored polygons, each with height proportional to the relative

abundance of a corresponding taxon (see Figure 8.5). These polygons are connected over

time (as in a streamgraph), showing the changes in each taxon’s abundance. Additionally,

each taxon’s polygon is depicted as emanating from its ancestor’s polygon. Muller plots

in this chapter were made using the ggmuller package (Noble, 2017) for the R Statistical

Computing Language (R Core Team, 2017).

When should you use a Muller plot versus a phylogenetic tree? In general, Muller plots

are more useful when you care about the population dynamics of a relatively small number

of fairly abstract taxonomic units. Phylogenetic trees can accommodate a larger number

of less abstract taxonomic units and allow for a greater ﬂexibility in depicting evolutionary

patterns.

8.4 Experimental Design

I applied the metrics deﬁned above to four benchmark functions from the GECCO

Competition on Niching Methods (Li et al., 2013) and to four qualitatively diﬀerent envi-

ronments in Avida. Both of these evolutionary contexts are well understood and studied,

making them particularly well suited for building intuition about what this proposed suite

of ancestry-based metrics can tell us about evolutionary dynamics.

8.4.1 Niching Competition Benchmark Problems

For each test problem, I evolved populations of 1000 organisms under a range of mutation

rates and selection strengths for 5000 generations.

I initialized populations by randomly

165

Figure 8.5: A Muller plot depicting the same phylogeny as the visualizations in
Figure 8.4. The red region represents the root node. As it gradually goes extinct, the
proportion of the ﬁgure it takes up gradually diminishes. Its three oﬀspring are shown in
purple, green, and blue.

166

generating a number of organisms equal to the population size. I evolved populations under

tournament selection with ﬁve diﬀerent tournament sizes: one, two, four, eight, and sixteen.

Organisms selected to reproduce did so asexually. Values in an oﬀspring’s genome were

mutated by adding noise given by a normal distribution with a mean of 0; the ‘mutation

rate’ of a treatment deﬁned the standard deviation used to deﬁne this normal distribution

and was given as a proportion of the test problem’s domain. I prevented mutations from

causing a value to exceed the valid domain of the given problem. For each problem and

tournament size, I evolved populations at eight mutation rates: 1e-08, 1e-07, 1e-06, 1e-05,

1e-04, 1e-03, 1e-02, and 1e-01.

I also ran a second set of experiments using these benchmark problems to explore the

impact of ecological dynamics on this suite of ancestry-based metrics. For these experiments,

I generated a stable ecology using the Eco-EA algorithm as a selection technique (Goings

et al., 2012). In these test problems, I created niches associated with spatial locations across

the ﬁtness landscape. For all experimental conditions, I ran ten replicates, each with a

unique random number seed. This experiment is implemented using the Empirical library;

the implementation (and analysis scripts) is included in the supplemental material for the

paper this chapter is based on (Lalejini et al., 2018).

8.5 Results and Discussion

8.5.1 Niching Competition Benchmark Problems

Overall, my results were consistent with evolutionary theory. As mutation rate in-

creases, coalescence takes longer, as evidenced by the fact that the MRCA is farther back in

time at higher mutation rates (see Figure 8.7). Consequently, phylogenetic richness (as mea-

sured by phylogenetic diversity) is higher at high mutation rates. Phylogenetic divergence,

measured here as mean pairwise distance between taxa, is similarly higher at high mutation

rates. Evolutionary distinctiveness, being another measurement of phylogenetic divergence,

167

Figure 8.6: A close-up on two adjacent peaks in the Shubert function ﬁtness
landscape. Lineages are depicted as paths fading from white to black over evolutionary
time. The lineages shown here evolved under a mutation rate of 0.01. A) Was evolved using
a tournament size of 2, whereas B) was evolved using a tournament size of 16. These ﬁgures
neatly illustrate how increased tournament size keeps the lineage near the tops of the peaks.

behaved almost identically (Lalejini et al., 2018). Variance of evolutionary distinctiveness

and pairwise distance between taxa (phylogenetic regularity metrics) behaved similarly to

the phylogenetic divergence metrics. This pattern makes sense, as most phylogenetic diver-

gence on these landscapes will produce unbalanced phylogenetic trees. If there were stable

coexistence between multiple clades, we would expect to see a reduced correlation between

the phylogenetic divergence metrics and the phylogenetic regularity metrics. Increased mu-

tation rate also increases the number of deleterious steps taken, a logical consequence of

increasing mutation relative to strength of selection.

Similarly, increasing tournament size generally increases the rate of coalescence, as

higher tournament sizes correspond to stronger selection (see Figure 8.8). As a result,

all of the measurements of phylogenetic richness and divergence decrease as tournament

size increases. MRCA depth, on the other hand, increases, directly reﬂecting the increased

frequency of selective sweeps.

Surprisingly, there is no clear eﬀect of tournament size on the count of deleterious

steps along the dominant lineage (as evidenced by the fact that the conﬁdence intervals all

overlap). Values for all selection schemes and tournament sizes hover near 2500, meaning

that a deleterious step is taken in roughly half of the 5000 generations. This result is partially

168

Figure 8.7: Values of example metrics across diﬀerent mutation rates for each of
the four problems. All lineage-based metrics are calculated on the lineage of the ﬁttest
organism at the ﬁnal time point; population-level means behaved similarly. All experiments
shown here used a tournament size of 4. Circles are medians, vertical lines show inter-quartile
range, and shaded area is a bootstrapped 95% conﬁdence interval around the mean. Note
that both axes are on log scales.

169

Figure 8.8: Values of example metrics across diﬀerent tournament sizes for each
of the four problems. All experiments shown here used a mutation rate of 0.001. All
lineage-based metrics are calculated on the lineage of the ﬁttest organism at the ﬁnal time
point; population-level means behaved similarly. Circles are medians, vertical lines show
inter-quartile range, and shaded area is a bootstrapped 95% conﬁdence interval around the
mean. Note that both axes are on log scales.

170

Figure 8.9: Values of example metrics across diﬀerent mutation rates for each
of the four problems under a diversity-preserving selection regime, Eco-EA. All
lineage-based metrics are calculated on the lineage of the ﬁttest organism at the ﬁnal time
point; population-level means behaved similarly. All experiments shown here used a tourna-
ment size of 4. Circles are medians, vertical lines show inter-quartile range, and shaded area
is a bootstrapped 95% conﬁdence interval around the mean. Note that both axes are on log
scales.

171

an eﬀect of mutation rate; at the lowest mutation rate, there is a clear trend toward fewer

deleterious steps as tournament sizes increase (Lalejini et al., 2018). However, the eﬀect of

mutation rate on the relationship between tournament size and dominant deleterious steps is

complex, particularly for Composition Function 2 (Lalejini et al., 2018). These trends likely

share a common cause with the thresholding eﬀect evident in Figure 8.7, where the number

of deleterious steps along the dominant lineage abruptly climbs between mutation rates of
10−7 and 10−5 and remains relatively ﬂat over other mutation rates. Based on an inspection
of the 3D ﬁtness landscape visualizations, we can see that this is not an eﬀect of lineages

moving from peak-to-peak; at most mutation rates, they tend to remain on a single peak

(with the exception of adjacent peaks in the Shubert function; see Figure 8.6). Thus, we

can infer that this eﬀect is the result of a drift-like phenomenon where, at suﬃciently high

mutation rates, all members of the population are constantly somewhat displaced from their

local ﬁtness peak.

Having reinforced our intuition about these metrics in a simple system, we can now

expand them to a slightly more complex system. A large proportion of interesting short-

term evolutionary dynamics relate to interaction between individuals in the population (i.e.,

ecological dynamics). In particular, such interactions often promote the stable coexistence

of clades occupying diﬀerent niches. As such, it is important to establish a baseline for how

these metrics respond to ecological coexistence.

Indeed, the presence of stabilizing ecological dynamics substantially changes the values

I observe for most metrics (see Figure 8.9). Perhaps the least surprising of these is MRCA

depth is far lower than it was for tournament selection, reﬂecting the rarity of coalescence

events under these conditions. Consequently, phylogenetic diversity is higher, as the extant

population represents a greater amount of evolutionary history. Relatedly, mean pairwise

distance among extant taxa is higher in the presence of ecology, as clades in diﬀerent niches

continue to diverge. Interestingly, the relationship of many metrics (e.g., deleterious steps

and phylogenetic diversity) to mutation rate is reversed in the presence of ecology. Explaining

172

Figure 8.10: A sampling of informative lineage metrics from Avida calculated after
approximately 10,000 generations. Mean pairwise distance is a measurement of phylogenetic
divergence; higher values imply greater phylogenetic diversity. MRCA generation is the
generation at which the current most recent common ancestor occurred (lower numbers
mean it was longer ago). Phenotypic volatility is the number of changes in phenotype along
the dominant (most plentiful) lineage. Unique phenotypes is the count of unique phenotypes
that occurred along the dominant lineage.

the underlying mechanisms behind these distinctions is beyond the scope of this chapter, but

the ease with which the metrics identiﬁed their presence clearly indicates their power.

8.5.2 Avida

The data from Avida are also consistent with my expectations and additionally point

towards some interesting directions for future investigation. As predicted, the only environ-

ment that preserved phylogenetic diversity was the limited resources environment. Lacking

diversity-preserving dynamics, the other environments all repeatedly lose phylogenetic di-

versity due to selective sweeps (see Figure 8.10).

Looking at phenotypic volatility, I see that the changing environment condition has dra-

matically higher volatility than the others. This result is precisely what I hypothesized, as the

phenotype should change approximately every time the environment does (see Figure 8.10).

Indeed, from Figure 8.13, we can see that this predicted mechanism is accurate. The limited

173

Figure 8.11: Muller plot from a representative run of Avida in the limited re-
sources environment. This plot shows the population dynamics of diﬀerent phenotypes
over 8,000,000 updates (a unit of time in Avida that is proportional to the number of CPU
cycles used thus far). Only phenotypes that make up at least 5% of the population at at
least one time point are shown. Note the region in the middle of the plot where we can see
repeated selective sweeps that are contained to a single niche.

resources environment has lower phenotypic volatility than the changing environment, but

higher phenotypic volatility than the other two environments. This observation is consistent

with the fact that limited resources is also a changing environment; competitive pressures

shift with the composition of the population. This interpretation is corroborated by Muller

plots from the limited resources condition, which show frequent shifts in the abundance of

various phenotypes (see Figure 8.11). We can gain further insight by comparing phenotypic

volatility to the count of unique phenotypes along a lineage. Despite its high phenotypic

volatility, the changing environment condition produces lineages with relatively few unique

phenotypes. This ﬁnding makes sense, as the changing environment is only creating pres-

sure to toggle back and forth between two phenotypes. Limited resources, however, has a

high count of unique states relative to its phenotypic volatility. From this comparison, I can

conclude that the competition in this environment is creating pressure for the population to

explore new phenotypic states, rather than simply cycling among a ﬁxed set.

From the mutation accumulation data, we can begin to glean information about the

precise dynamics of adaptation (see Figure 8.12). Across the board, the limited resources

environment has more mutations than any other condition, which is consistent with the

174

Figure 8.12: Mutations along the dominant lineage across treatments in Avida,
calculated after approximately 10,000 generations. The upper left shows the total count
of mutations of all types. The other plots show the quantity of each mutation type as a
proportion of the total.

greater number of unique phenotypes explored. The changing environment has the sec-

ond greatest mutation accumulation, presumably due to the mutations that are required to

change phenotype as the environment changes. If we break down mutation accumulation

by type, we can see that approximately 75% of mutations along dominant lineages are sub-

stitution mutations. This value is dramatically higher than the approximately 7% that we

would expect to see by chance. Thus, this result suggests that substitution mutations are

more likely to prove helpful in these contexts over evolutionary time.

8.6 Conclusions

I have demonstrated that these analysis techniques conform to my intuitions in well-

understood systems. In more complex environments, they provide a quick way to identify

interesting behavior to study further. We believe they are now ready to be adopted more

broadly.

Our goals for this work are two-fold: 1) to suggest a set of analyses that will improve

175

Figure 8.13: State sequence visualization of phenotypes along the dominant lin-
eage in the changing environment condition. The key along the left indicates the
environment’s state over time. Each column is one replicate. From this visualization, we can
see that most successful lineages are able to adapt to a change in the environment relatively
quickly. We can also see that there is a fair amount of variation in the precise timings of
these changes and whether they occur for every change of the environment.

176

our capacity to quantitatively understand evolutionary histories in digital evolution experi-

ments, and 2) to spark a conversation in the computational evolution community about how

to quantify, interpret, and compare observed evolutionary histories. With feedback from the

community, I will expand this suite of lineage and phylogeny metrics, compiling accessible

descriptions and examples of each metric. More generally, I suggest that artiﬁcial life re-

searchers should keep phylogeny- and lineage-based analyses in mind when trying to answer

specialized questions as well.

177

Chapter 9

Conclusions

Just as evolution is a general purpose algorithm that applies across substrates, so is

ecology. I have illustrated this point by comparing ecological dynamics across a range of

digital systems. In the process, I have developed tools for further exploring, understanding,

and controlling eco-evolutionary dynamics. At the same time, I have addressed fundamental

questions across the ﬁelds of biology, evolutionary computation, and artiﬁcial life.

9.1 Contributions

This dissertation makes the following contributions:

• In chapter 3, I demonstrated that some types of spatial heterogeneity promote the
evolution of diversity and produce populations with elevated evolutionary potential. I

have proposed an explanation for why this ﬁnding diﬀers from the results of similar

studies.

• In chapter 4, I showed that the spatial conﬁguration of niches has a profound impact
on the way a population traverses a ﬁtness landscape. As a result, certain region

of a spatially heterogeneous environment can be hotspots of evolutionary potential.

This ﬁnding opens up the possibility of manipulating evolutionary trajectories via the

178

spatial conﬁguration of the abiotic environment.

• In chapter 5, I illustrated the power of importing ecological theory into evolutionary
computation. Indeed, I discovered that the same co-existence equation has been inde-

pendently derived in both ﬁelds, lending credence to the idea that it generalizes across

all ecological systems. Moreover, I used ecological techniques to clarify the relationship

between evolutionary algorithms that seem similar at face value but turn out to behave

very diﬀerently on a mechanistic level.

• In chapter 6, I presented a new approach to evolutionary computation that is made
possible by ecology-inspired techniques: ecologically-mediated hints. By encoding hints

in the form of niches, the user of an algorithm can create a ﬁtness gradient towards a

global solution without many of the pitfalls traditionally associated with attempts to

do so. Additionally, I explored the speciﬁc properties of Eco-EA that make it a good

candidate for this approach.

• In chapter 7, I developed a suite of measurements for identifying the presence of open-
ended evolutionary dynamics in an evolving system. Chief among these measurements

was an approach to recognizing the presence of ecology.

• In chapter 8, I described an additional suite of measurements that I hope to use in the

future to provide insight into the lower-level mechanics of the evolutionary process.

9.2 Future Directions

Thus far, I have transferred techniques amongst the ﬁelds of biology, evolutionary com-

putation, and artiﬁcial life to address open questions within these ﬁelds. However, I have

barely scratched the surface of what we can learn from work at this interface. The next step

is to go beyond transferring techniques between these ﬁelds and begin transferring ques-

tions. Evolutionary computation has fantastic potential as a model system for studying

179

eco-evolutionary dynamics in the way theoretical biologists traditionally do. Furthering our

understanding of how to build complex evolving systems, as artiﬁcial life researchers often

attempt to do, could expand the classes of problems that are solvable via evolutionary com-

putation. Although research on the drivers of diversity has historically been a question for

biologists rather than artiﬁcial life researchers, these drivers are likely a critical component

of creating open-ended evolution. Cross-cutting all of these ﬁelds is the question of how to

control evolutionary trajectories of entire ecological communities. This question is relevant

in artiﬁcial life (where it helps us build life-like systems), biology (where we often beneﬁt

from ecological communities being in certain target states), and evolutionary computation

(where it helps us solve problems).

9.2.1 Evolutionary Computation

My work has set the stage for a rapid bi-directional transfer of ideas between ecology

and evolutionary computation. The ﬁrst step in this process will be to continue importing

ecological theory as I have already done for modern coexistance theory in the context of ﬁt-

ness sharing, resource competition theory in the context of Eco-EA, and spatial structure in

the . Research in evolutionary computation often gets hung up on the eﬃcacy of an approach

at the expense of understanding why it works. Now that I have shown diversity main-

tenance techniques in evolutionary computation to be isomorphic to ecological

dynamics (Dolson et al., 2018d), I will use the decades of work that ecologists have put

into developing ecological theory to understand why these techniques work in evolutionary

computation.

In this phase, I will take stock of the current landscape of diversity maintenance tech-

niques as viewed through this new ecological lens, locating each in relation to the others.

Critically, I hypothesize that ecologically-similar diversity maintenance techniques will be-

have similarly to each other. I will test this hypothesis by observing the behavior of

diﬀerent diversity maintenance techniques on an array of benchmark problems

180

(Helmuth and Spector, 2015; White et al., 2013; Li et al., 2013). To characterize the behavior

of diversity maintenance techniques, I will use a combination of interaction networks and

evolutionary summary statistics that directly evaluate the underlying evolutionary behavior

of the population (Dolson et al., 2018b). Over the course of this phase, I will establish a

meaningful taxonomy of diversity maintenance techniques cross-referenced with their per-

formance on various problem classes.

I will also characterize properties of the benchmark problems using a pre-existing suite

of metrics (Bosman et al., 2018). Once I have both of these characterizations in place,

correlations between them will be apparent. The presence of such correlations would suggest

that diﬀerent ecological communities are better suited to solving diﬀerent problems. At this

point, practitioners of evolutionary computation will, for the ﬁrst time, have the ability to

choose the best diversity maintenance technique for a speciﬁc problem a priori.

This ability will tangibly improve the eﬃcacy of all branches of evolutionary computation.

In subsequent work, I will delve into the mechanisms that make diﬀerent techniques more

eﬀective on diﬀerent problems. As this work will be a direct attempt to predict the future

evolutionary state of an entire community, it will synergize closely with Aim 1.

9.2.2 Biology

This line of research also has important consequences for biological systems. There are

a wide variety of evolving ecological communities that it would be advantageous for us to

control the evolutionary trajectories of.

In the long run, I hope to apply my ﬁndings to

cancer, the gut microbiome, and macro-scale ecosystems.

My immediate future plans are to extend my research thus far to the context of can-

cer, as it is the most direct and easily-studied analog to my digital work. Because cancer

readily evolves resistance to new treatments, it would be incredibly valuable to control its

evolutionary trajectory. Although the goal in this line of research is to prevent the evolution

of complex traits (whereas my goal in other systems is to promote it), achieving this goal

181

requires answering the same underlying questions.

The abiotic environment of tumors is spatially heterogenous, resulting in the forma-

tion of various micro-environments to which cancer cells can adapt. Low-oxygen micro-

environments create a distinct ecological niche, the presence of which is associated with the

evolution of treatment resistance and other negative outcomes. The spatial conﬁguration

of these regions appears to play an important role in determining when treatment will and

will not be eﬀective (Scott et al., 2016). I will be extending my work on spatial resource

heterogeneity in chapters 3 and 4 and my work on spatial mortality heterogeneity (Dolson

et al., 2016b) to understand this phenomenon better. In the long run, this understanding

will help us design treatments that prevent cancer from evolving resistance. At the same

time, working with a non-digital system will allow me to conﬁrm that my results generalize

across evolutionary substrates.

Studying eco-evolutionary dynamics in the gut microbiome (and other microbiomes) is

likely more challenging than it is in cancer due because the ecological interactions are more

plentiful and diﬃcult to measure. Nevertheless, it is another ecological community evolving

in a spatially heterogeneous environment which it would be advantageous to control; there is

increasing evidence that certain gut microbial communities lead to better health outcomes

for humans. Thus, it represents a promising future direction. Once the general principles

behind harnessing the power of ecology to control evolution have been thoroughly untangled

and tested across a variety of more tractable experimental systems, they may be a valuable

tool for mitigating the collapse of larger-scale ecological communities as result of climate

change.

182

BIBLIOGRAPHY

183

BIBLIOGRAPHY

A-Frame authors (2018). A-Frame: a web framework for building virtual reality experiences.

https://github.com/aframevr/aframe.

Adami, C. (2002). Ab initio modeling of ecosystems with artiﬁcial life. Natural Resource

Modeling, 15(1):133–145.

Adami, C., Ofria, C., and Collier, T. C. (2000). Evolution of biological complexity. Proceed-

ings of the National Academy of Sciences, 97(9):4463–4468.

Adler, P. B., HilleRisLambers, J., and Levine, J. M. (2007). A niche for neutrality. Ecology

Letters, 10(2):95–104.

Allen, M., Poggiali, D., Whitaker, K., Marshall, T. R., and Kievit, R. (2018). Raincloud

plots: a multi-platform tool for robust data visualization. PeerJ Preprints, 6:e27137v1.

Allouche, O. and Kadmon, R. (2009). A general framework for neutral models of community

dynamics. Ecology letters, 12(12):1287–1297.

Allouche, O., Kalyuzhny, M., Moreno-Rueda, G., Pizarro, M., and Kadmon, R. (2012).
Area–heterogeneity tradeoﬀ and the diversity of ecological communities. Proceedings of
the National Academy of Sciences, 109(43):17495–17500.

Anscombe, F. J. (1973). Graphs in Statistical Analysis. The American Statistician, 27(1):17–

21.

Arnell, N. W. (2004). Climate change and global water resources: SRES emissions and

socio-economic scenarios. Global Environmental Change, 14:31–52.

Baddeley, A., Rubak, E., and Turner, R. (2015). Spatial Point Patterns: Methodology and

Applications with R. CRC Press.

Banzhaf, W., Baumgaertner, B., Beslon, G., Doursat, R., Foster, J. A., McMullin, B.,
Melo, V. V. d., Miconi, T., Spector, L., Stepney, S., and White, R. (2016). Deﬁning
and simulating open-ended novelty: requirements, guidelines, and challenges. Theory in
Biosciences, 135(3):131–161.

Barnett, A. and Beisner, B. E. (2007). Zooplankton biodiversity and lake trophic state:

explanations invoking resource abundance and distribution. Ecology, 88(7):1675–1686.

Barrick, J. E. and Lenski, R. E. (2013). Genome dynamics during experimental evolution.

Nature Reviews Genetics, 14(12):827.

Baym, M., Lieberman, T. D., Kelsic, E. D., Chait, R., Gross, R., Yelin, I., and Kishony,
Science,

Spatiotemporal microbial evolution on antibiotic landscapes.

R. (2016).
353(6304):1147–1151.

184

Beaumont, H. J., Gallie, J., Kost, C., Ferguson, G. C., and Rainey, P. B. (2009). Experi-

mental evolution of bet hedging. Nature, 462(7269):90.

Beckmann, B. E., Knoester, D. B., Connelly, B. D., Waters, C. M., and McKinley, P. K.
(2012). Evolution of resistance to quorum quenching in digital organisms. Artiﬁcial life,
18(3):291–310.

Beckmann, B. E. and McKinley, P. K. (2009). Evolving quorum sensing in digital organisms.
In Proceedings of the 11th Annual conference on Genetic and evolutionary computation,
pages 97–104. ACM.

Bedau, M. A. and Bahm, A. (1994). Bifurcation structure in diversity dynamics. In Brooks,
R. and Maes, P., editors, Artiﬁcial Life IV: Proceedings of the Fourth International Work-
shop on the Synthesis and Simulation of Living Systems, pages 258–268, Cambridge, MA.
MIT Press.

Bedau, M. A. and Packard, N. H. (1992). Measurement of evolutionary activity, teleology,
and life. In Langton, I. C., Taylor, C., Farmer, D., and Rasmussen, S., editors, Artiﬁcial
Life II, volume X of Santa Fe Institute Studies in the Sciences of Complexity, pages 431–
461. Addison-Wesley, Redwood City, CA.

Bedau, M. A., Snyder, E., Brown, C. T., and Packard, N. H. (1997). A comparison of
evolutionary activity in artiﬁcial evolving systems and in the biosphere. In Husbands, P.
and Harvey, I., editors, Proceedings of the Fourth European Conference on Artiﬁcial Life,
ECAL97, pages 125–134, Cambridge, MA. MIT Press.

Bedau, M. A., Snyder, E., and Packard, N. H. (1998). A classiﬁcation of long-term evolu-
tionary dynamics. In Adami, C., Belew, R. K., Kitano, H., and Taylor, C. E., editors,
Artiﬁcial Life VI: Proceedings of the Sixth International Conference on Artiﬁcial Life,
pages 228–237, Cambridge, MA. MIT Press.

Biswas, R., Bryson, D., Ofria, C., and Wagner, A. (2014). Causes vs Beneﬁts in the Evolution
of Prey Grouping. The 2018 Conference on Artiﬁcial Life: A Hybrid of the European
Conference on Artiﬁcial Life (ECAL) and the International Conference on the Synthesis
and Simulation of Living Systems (ALIFE), pages 641–648.

Blickle, T. and Thiele, L. (1995). A mathematical analysis of tournament selection.

ICGA, pages 9–16. Citeseer.

In

Bloom, B. H. (1970). Space/time trade-oﬀs in hash coding with allowable errors. Commu-

nications of the ACM, 13(7):422–426.

Blount, Z. D., Barrick, J. E., Davidson, C. J., and Lenski, R. E. (2012). Genomic analysis of
a key innovation in an experimental Escherichia coli population. Nature, 489(7417):513–
518.

Blount, Z. D., Borland, C. Z., and Lenski, R. E. (2008). Historical contingency and the
evolution of a key innovation in an experimental population of Escherichia coli. Proceedings
of the National Academy of Sciences, 105(23):7899–7906.

185

Bongard, J. C. and Hornby, G. S. (2010). Guarding Against Premature Convergence While
Accelerating Evolutionary Search. In Proceedings of the 12th Annual Conference on Ge-
netic and Evolutionary Computation, GECCO ’10, pages 111–118, New York, NY, USA.
ACM.

Bosman, A. S., Engelbrecht, A. P., and Helbig, M. (2018). Progressive gradient walk for
neural network ﬁtness landscape analysis. In Proceedings of the Genetic and Evolutionary
Computation Conference Companion on - GECCO ’18, pages 1473–1480, Kyoto, Japan.
ACM Press.

Bryson, D., Baer, B., Ofria, C., Barrick, J., Vostinar, A., Goldsby, H., Zaman, L., Chandler,
C., Goings, S., Dolson, E., Vogel, M., Covert, A., Wright, G., Nahum, J., Misevic, D.,
Wagner, A., Rupp, M., Yilmaz, E., Pakanati, A., and Biswas, R. (2018). Code for using
modes metrics in avida. Available at https://doi.org/10.5281/zenodo.1439479.

Burlacu, B., Aﬀenzeller, M., Kommenda, M., Winkler, S., and Kronberger, G. (2013). Vi-
sualization of Genetic Lineages and Inheritance Information in Genetic Programming.
In Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary
Computation, GECCO ’13 Companion, pages 1351–1358, New York, NY, USA. ACM.
event-place: Amsterdam, The Netherlands.

Cantú-Paz, E. and Goldberg, D. E. (2003). Are Multiple Runs of Genetic Algorithms Better
than One? In Cantú-Paz, E., Foster, J. A., Deb, K., Davis, L. D., Roy, R., O’Reilly, U.-M.,
Beyer, H.-G., Standish, R., Kendall, G., Wilson, S., Harman, M., Wegener, J., Dasgupta,
D., Potter, M. A., Schultz, A. C., Dowsland, K. A., Jonoska, N., and Miller, J., editors,
Genetic and Evolutionary Computation — GECCO 2003, number 2723 in Lecture Notes
in Computer Science, pages 801–812. Springer Berlin Heidelberg.

Channon, A. (2001). Passing the ALife test: activity statistics classify evolution in Geb
as unbounded. In Kelemen, J. and Sosík, P., editors, Advances in Artiﬁcial Life, pages
417–426, Berlin, Heidelberg. Springer Berlin Heidelberg.

Channon, A. (2003). Improving and Still Passing the ALife Test: Component-normalised
Activity Statistics Classify Evolution in Geb As Unbounded. In Standish, R. K., Bedau,
M. A., and Abbass, H. A., editors, Proceedings of the Eighth International Conference on
Artiﬁcial Life, ICAL 2003, pages 173–181, Cambridge, MA, USA. MIT Press.

Chase, J. M. and Leibold, M. A. (2003). Ecological Niches: Linking Classical and Contem-

porary Approaches. University of Chicago Press. Google-Books-ID: Ssmcl_ubQUQC.

Chesson, P. (2000). Mechanisms of Maintenance of Species Diversity. Annual Review of

Ecology and Systematics, 31:343–366.

Chesson, P. and Kuang, J. J. (2008). The interaction between predation and competition.

Nature, 456(7219):235–238.

Chow, S. S., Wilke, C. O., Ofria, C., Lenski, R. E., and Adami, C. (2004). Adaptive Radiation

from Resource Competition in Digital Organisms. Science, 305(5680):84–86.

186

Clune, J., Goldsby, H. J., Ofria, C., and Pennock, R. T. (2010). Selective pressures for
accurate altruism targeting: evidence from digital evolution for diﬃcult-to-test aspects of
inclusive ﬁtness theory. Proceedings of the Royal Society B: Biological Sciences.

Cooper, T. F. and Ofria, C. (2003). Evolution of stable ecosystems in populations of digital
organisms. In Standish, R. K., Bedau, M. A., and Abbass, H. A., editors, Artiﬁcial Life
VIII: Proceedings of the Eighth International Conference on Artiﬁcial life, pages 227–232,
Cambridge, MA. MIT Press.

Covert, A. W., Lenski, R. E., Wilke, C. O., and Ofria, C. (2013). Experiments on the role of
deleterious mutations as stepping stones in adaptive evolution. Proceedings of the National
Academy of Sciences, 110(34):E3171–E3178.

Craine, J. M., Ocheltree, T. W., Nippert, J. B., Towne, E. G., Skibbe, A. M., Kembel,
S. W., and Fargione, J. E. (2013). Global diversity of drought tolerance and grassland
climate-change resilience. Nature Climate Change, 3(1):63–67.

Dawkins, R. (1976). The Selﬁsh Gene. Oxford University Press, New York, british ﬁrst

edition edition.

De Jong, K. A. (1975). An Analysis of the Behavior of a Class of Genetic Adaptive Systems.

PhD Thesis, University of Michigan, Ann Arbor, MI, USA.

Deb, K. and Goldberg, D. E. (1989). An Investigation of Niche and Species Formation in
Genetic Function Optimization.
In Proceedings of the 3rd International Conference on
Genetic Algorithms, pages 42–50, San Francisco, CA, USA. Morgan Kaufmann Publishers
Inc.

Diggle, S. P., Griﬃn, A. S., Campbell, G. S., and West, S. A. (2007). Cooperation and

conﬂict in quorum-sensing bacterial populations. Nature, 450(7168):411–414.

Doebeli, M. and Dieckmann, U. (2003). Speciation along environmental gradients. Nature,

421(6920):259–264.

Dolson, E. (2018a). Code for Ecological Theory Provides Insights into Evolutionary Com-

putation at the time of preprint/submission to journal.

Dolson, E. (2018b). Data, code, and supplemental ﬁgures for the MODES toolbox. Available

at https://doi.org/10.5281/zenodo.2345140.

Dolson, E., Banzhaf, W., and Ofria, C. (2018a). Applying Ecological Principles to Genetic
Programming.
In Banzhaf, W., Olson, R. S., Tozier, W., and Riolo, R., editors, Ge-
netic Programming Theory and Practice XV, pages 73–88, Cham. Springer International
Publishing.

Dolson, E., Lalejini, A., Jorgensen, S., and Ofria, C. (2018b). Quantifying the tape of life:
Ancestry-based metrics provide insights and intuition about evolutionary dynamics. In
Ikegami, T., Virgo, N., Witkowski, O., Oka, M., Suzuki, R., and Iizuka, H., editors, The
2018 Conference on Artiﬁcial Life: A Hybrid of the European Conference on Artiﬁcial

187

Life (ECAL) and the International Conference on the Synthesis and Simulation of Living
Systems (ALIFE), pages 75–82, Cambridge, MA. MIT Press.

Dolson, E., Lalejini, A., and Ofria, C. (2018c). Exploring genetic programming systems with

MAP-Elites. PeerJ Preprints, 6:e27154v1.

Dolson, E. and Ofria, C. (2017). Spatial resource heterogeneity creates local hotspots of
evolutionary potential. In Proceedings of the European Conference on Artiﬁcial Life, pages
122–129, Lyon, France. MIT Press.

Dolson, E. and Ofria, C. (2018). Visualizing the tape of life: Exploring evolutionary his-
In Proceedings of the Genetic and Evolutionary Computation

tory with virtual reality.
Conference Companion, GECCO ’18, pages 1553–1559, New York, NY, USA. ACM.

Dolson, E., Perez, S., and Chaput, A. (2016a). emilydolson/resource-heterogeneity: Journal

Submission.

Dolson, E., Perez, S., Olson, R., and Ofria, C. (2017). Spatial resource heterogeneity increases

diversity and evolutionary potential. bioRxiv.

Dolson, E., Vostinar, A., and Ofria, C. (2015). What’s holding artiﬁcial life back from

open-ended evolution? The Winnower.

Dolson, E., Wiser, M. J., and Ofria, C. A. (2016b). The Eﬀects of Evolution and Spatial
Structure on Diversity in Biological Reserves. In Artiﬁcial Life XV: Proceedings of the
Fifteenth International Conference on Artiﬁcial Life, pages 434–440, Cancun, Mexico.
MIT Press.

Dolson, E. L., Banzhaf, W., and Ofria, C. (2018d). Ecological theory provides insights about

evolutionary computation. PeerJ Preprints, 6:e27315v1.

Dolson, E. L., Lalejini, A., Jorgensen, S., and Ofria, C. A. (2019a). Interpreting the tape of
life: Ancestry-based Analyses Provide Insights and Intuition about Evolutionary Dynam-
ics. Artiﬁcial Life. status: In Press.

Dolson, E. L., Vostinar, A. E., Wiser, M. J., and Ofria, C. A. (2019b). The MODES toolbox:

Measurements of Open-ended Dynamics in Evolving Systems. Artiﬁcial Life, 25(1).

Erwin, D. H. (1998). The end and the beginning: recoveries from mass extinctions. Trends

in Ecology & Evolution, 13(9):344–349.

Faith, D. P. (1992). Conservation evaluation and phylogenetic diversity. Biological Conser-

vation, 61(1):1–10.

Ferrière, R., Dieckmann, U., and Couvet, D. (2004). Evolutionary conservation biology.

Cambridge University Press.

Fontaine, C., Guimarães, P. R., Kéﬁ, S., Loeuille, N., Memmott, J., van der Putten, W. H.,
van Veen, F. J. F., and Thébault, E. (2011). The ecological and evolutionary implications
of merging diﬀerent types of networks. Ecology Letters, 14(11):1170–1181.

188

Forest, F., Grenyer, R., Rouget, M., Davies, T. J., Cowling, R. M., Faith, D. P., Balmford,
A., Manning, J. C., Proches, S., van der Bank, M., Reeves, G., Hedderson, T. A. J.,
and Savolainen, V. (2007). Preserving the evolutionary potential of ﬂoras in biodiversity
hotspots. Nature, 445(7129):757–760.

Forman, R. T. (1995). Land Mosaics: The Ecology of Landscapes and Regions. Cambridge

University Press.

Forman, R. T. T. and Godron, M. (1986). Landscape ecology. Wiley.

Fortuna, M. A., Zaman, L., Wagner, A. P., and Ofria, C. (2013). Evolving Digital Ecological

Networks. PLoS Comput Biol, 9(3):e1002928.

Franklin, J. F. and Lindenmayer, D. B. (2009). Importance of matrix habitats in maintaining

biological diversity. Proceedings of the National Academy of Sciences, 106(2):349–350.

Frederickson, M. E., Greene, M. J., and Gordon, D. M. (2005). Ecology: ‘Devil’s gardens’

bedevilled by ants. Nature, 437(7058):495–496.

Fu, Y.-X. and Li, W.-H. (1999). Coalescing into the 21st century: An overview and prospects

of coalescent theory. Theoretical Population Biology, 56(1):1–10.

Galloway, J. N., Dentener, F. J., Capone, D. G., Boyer, E. W., Howarth, R. W., Seitzinger,
S. P., Asner, G. P., Cleveland, C. C., Green, P. A., Holland, E. A., Karl, D. M., Michaels,
A. F., Porter, J. H., Townsend, A. R., and Vorosmarty, C. J. (2004). Nitrogen cycles:
past, present, and future. Biogeochemistry, 70:153–226.

Gavrilets, S. (2010). High-Dimensional Fitness Landscapes and Speciation.

In Pigliucci,
M. and Müller, G. B., editors, Evolution—the Extended Synthesis, pages 45–80. The MIT
Press.

Gavrilets, S. and Vose, A. (2005). Dynamic patterns of adaptive radiation. Proceedings of
the National Academy of Sciences of the United States of America, 102(50):18040–18045.

Gillespie, R. (2004). Community Assembly Through Adaptive Radiation in Hawaiian Spiders.

Science, 303(5656):356–359.

Goings, S. (2010). Natural niching: Applying ecological principles to evolutionary computa-

tion. Ph.D., Michigan State University, United States – Michigan.

Goings, S., Goldsby, H. J., Cheng, B. H., and Ofria, C. (2012). An ecology-based evolutionary

algorithm to evolve solutions to complex problems. Artiﬁcial Life, 13:171–177.

Goings, S. and Ofria, C. (2009). Ecological approaches to diversity maintenance in evolution-

ary algorithms. In IEEE Symposium on Artiﬁcial Life, 2009. ALife ’09, pages 124–130.

Goldberg, D. E., Richardson, J., and Grefenstette, J. J. (1987). Genetic algorithms with
sharing for multimodal function optimization. In Genetic algorithms and their applications:
Proceedings of the Second International Conference on Genetic Algorithms, pages 41–49,
Hillsdale, NJ. Lawrence Erlbaum.

189

Goldsby, H. J., Dornhaus, A., Kerr, B., and Ofria, C. (2012). Task-switching costs promote
the evolution of division of labor and shifts in individuality. Proceedings of the National
Academy of Sciences, 109(34):13686–13691.

Good, B. H., McDonald, M. J., Barrick, J. E., Lenski, R. E., and Desai, M. M. (2017). The

dynamics of molecular evolution over 60,000 generations. Nature, 551(7678):45–50.

Goudard, A. and Loreau, M. (2008). Nontrophic Interactions, Biodiversity, and Ecosystem

Functioning: An Interaction Web Model. The American Naturalist, 171(1):91–106.

Grabowski, L. and Magaña, J. (2014). Building on Simplicity: Multi-stage Evolution of
Digital Organisms. The 2018 Conference on Artiﬁcial Life: A Hybrid of the European
Conference on Artiﬁcial Life (ECAL) and the International Conference on the Synthesis
and Simulation of Living Systems (ALIFE), pages 113–120.

Grover, J. P. (1997). Resource Competition. Springer Science & Business Media.

Hagberg, A. A., Schult, D. A., and Swart, P. J. (2008). Exploring network structure, dynam-
ics, and function using NetworkX. In Proceedings of the 7th Python in Science Conference
(SciPy2008), pages 11–15, Pasadena, CA USA.

Haller, B. C., Mazzucco, R., and Dieckmann, U. (2013). Evolutionary Branching in Complex

Landscapes. The American Naturalist, 182(4):E127–E141.

Hamilton, W. D. (1964). The genetical evolution of social behaviour. II. Journal of theoretical

biology, 7(1):17–52.

Han, J., Pei, J., and Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.

Hara, A. and Nagao, T. (1999). Emergence of the cooperative behavior using ADG; Auto-
matically Deﬁned Groups. In Banzhaf, W., Daida, J. M., Eiben, A. E., Garzon, M. H.,
Honavar, V. G., Jakiela, M. J., and Smith, R. E., editors, Proceedings of the 1st Annual
Conference on Genetic and Evolutionary Computation, volume 1, pages 1039–1046, San
Francisco, CA, USA. Morgan Kaufmann.

Harmon, L. J. and Harrison, S. (2015). Species diversity is dynamic and unbounded at local

and continental scales. The American Naturalist, 185(5):584–593.

Harper, R. (2012). Spatial Co-evolution: Quicker, Fitter and Less Bloated. In Proceedings
of the 14th Annual Conference on Genetic and Evolutionary Computation, GECCO ’12,
pages 759–766, New York, NY, USA. ACM.

Harpole, W. S. and Tilman, D. (2007). Grassland species loss resulting from reduced niche

dimension. Nature, 446(7137):791–793.

Hedges, L. V. (1981). Distribution theory for Glass’s estimator of Eﬀect size and related

estimators. Journal of Educational Statistics, 6(2):107–128.

190

Helmuth, T., McPhee, N. F., and Spector, L. (2016a). Eﬀects of Lexicase and Tournament
Selection on Diversity Recovery and Maintenance. In Proceedings of the 2016 on Genetic
and Evolutionary Computation Conference Companion, GECCO ’16 Companion, pages
983–990, New York, NY, USA. ACM. event-place: Denver, Colorado, USA.

Helmuth, T., McPhee, N. F., and Spector, L. (2016b). The Impact of Hyperselection on
Lexicase Selection. In Proceedings of the Genetic and Evolutionary Computation Confer-
ence 2016, GECCO ’16, pages 717–724, New York, NY, USA. ACM. event-place: Denver,
Colorado, USA.

Helmuth, T. and Spector, L. (2015). General Program Synthesis Benchmark Suite.

In
Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation,
GECCO ’15, pages 1039–1046, New York, NY, USA. ACM.

Helmuth, T., Spector, L., and Matheson, J. (2015). Solving Uncompromising Problems With

Lexicase Selection. IEEE Transactions on Evolutionary Computation, 19(5):630–643.

Hendry, A. P. (2016). Eco-evolutionary Dynamics. Princeton University Press. Google-

Books-ID: QMtIDAAAQBAJ.

Hillebrand, H. (2004). On the generality of the latitudinal diversity gradient. The American

Naturalist, 163(2):192–211.

Holland, J. H. (1985). Properties of the Bucket Brigade. In Proceedings of the 1st Inter-
national Conference on Genetic Algorithms, pages 1–7, Hillsdale, NJ, USA. L. Erlbaum
Associates Inc.

Hooper, D. U., Chapin, F. S., Ewel, J. J., Hector, A., Inchausti, P., Lavorel, S., Lawton,
J. H., Lodge, D. M., Loreau, M., Naeem, S., Schmid, B., Setälä, H., Symstad, A. J., Van-
dermeer, J., and Wardle, D. A. (2005). Eﬀects of Biodiversity on Ecosystem Functioning:
A Consensus of Current Knowledge. Ecological Monographs, 75(1):3–35.

Hornby, G. S. (2006). ALPS: The Age-layered Population Structure for Reducing the Prob-
lem of Premature Convergence. In Proceedings of the 8th Annual Conference on Genetic
and Evolutionary Computation, GECCO ’06, pages 815–822, New York, NY, USA. ACM.

Hu, J., Goodman, E., Seo, K., Fan, Z., and Rosenberg, R. (2005). The Hierarchical Fair
Competition (HFC) Framework for Sustainable Evolutionary Algorithms. Evolutionary
Computation, 13(2):241–277.

Hutchinson, G. E. (1961). The Paradox of the Plankton. The American Naturalist,

95(882):137–145.

Isaac, N. J. B., Turvey, S. T., Collen, B., Waterman, C., and Baillie, J. E. M. (2007).
Mammals on the EDGE: Conservation Priorities Based on Threat and Phylogeny. PLOS
ONE, 2(3):e296.

Jankowski, K., Schindler, D. E., and Horner-Devine, M. C. (2014). Resource Availability and
Spatial Heterogeneity Control Bacterial Community Response to Nutrient Enrichment in
Lakes. PLoS ONE, 9(1):e86991.

191

Johnson, A., Strauss, E., Pickett, R., Adami, C., Dworkin, I., and Goldsby, H. (2014).
More Bang For Your Buck: Quorum-Sensing Capabilities Improve the Eﬃcacy of Suicidal
Altruism. The 2018 Conference on Artiﬁcial Life: A Hybrid of the European Conference on
Artiﬁcial Life (ECAL) and the International Conference on the Synthesis and Simulation
of Living Systems (ALIFE), pages 120–128.

Johnson, T. J. and Wilke, C. O. (2004). Evolution of Resource Competition between Mutu-

ally Dependent Digital Organisms. Artiﬁcial Life, 10(2):145–156.

Kauﬀman, S. and Levin, S. (1987). Towards a general theory of adaptive walks on rugged

landscapes. Journal of Theoretical Biology, 128(1):11–45.

Kawecki, T. J., Lenski, R. E., Ebert, D., Hollis, B., Olivieri, I., and Whitlock, M. C. (2012).

Experimental evolution. Trends in Ecology & Evolution, 27(10):547–560.

Kokko, H., Chaturvedi, A., Croll, D., Fischer, M. C., Guillaume, F., Karrenberg, S., Kerr, B.,
Rolshausen, G., and Stapley, J. (2017). Can Evolution Supply What Ecology Demands?
Trends in Ecology & Evolution, 32(3):187–197.

Korb, K. B. and Dorin, A. (2011). Evolution unbound: releasing the arrow of complexity.

Biology & Philosophy, 26(3):317–338.

Kosmidis, I. (2017). brglm: Bias Reduction in Binary-Response Generalized Linear Models.

La Cava, W., Helmuth, T., Spector, L., and Moore, J. H. (2018). A Probabilistic and
Multi-Objective Analysis of Lexicase Selection and ε-Lexicase Selection. Evolutionary
Computation, pages 1–26.

Laﬀerty, K. D., Dobson, A. P., and Kuris, A. M. (2006). Parasites dominate food web links.

Proceedings of the National Academy of Sciences, 103(30):11211–11216.

Lalejini, A., Dolson, E., and Jorgensen, S. (2018).

stevenjson/ALife2018-Lineage: Final

Release for Paper. https://doi.org/10.5281/zenodo.1254061.

Lalejini, A. and Ofria, C. (2016). The Evolutionary Origins of Phenotypic Plasticity. In

Proceedings of the Artiﬁcial Life Conference 2016, pages 372–379. The MIT Press.

Lavergne, S. and Molofsky, J. (2007). Increased genetic variation and evolutionary potential
drive the success of an invasive grass. Proceedings of the National Academy of Sciences,
104(10):3883–3888.

Le Gac, M., Plucain, J., Hindré, T., Lenski, R. E., and Schneider, D. (2012). Ecological
and evolutionary dynamics of coexisting lineages during a long-term experiment with
Escherichia coli. Proceedings of the National Academy of Sciences, 109(24):9487–9492.

Lehman, J. and Stanley, K. O. (2008). Exploiting Open-Endedness to Solve Problems

Through the Search for Novelty. In ALIFE, pages 329–336.

Lehman, J. and Stanley, K. O. (2011). Abandoning objectives: Evolution through the search

for novelty alone. Evol. Comput., 19(2):189–223.

192

Lehmann, K. D. S., Goldman, B. W., Dworkin, I., Bryson, D. M., and Wagner, A. P. (2014).
From Cues to Signals: Evolution of Interspeciﬁc Communication via Aposematism and
Mimicry in a Predator-Prey System. PLOS ONE, 9(3):e91783.

Lenski, R. E., Ofria, C., Pennock, R. T., and Adami, C. (2003). The evolutionary origin of

complex features. Nature, 423(6936):139–144.

Lenski, R. E., Rose, M. R., Simpson, S. C., and Tadler, S. C. (1991). Long-term experimental
evolution in Escherichia coli. I. Adaptation and divergence during 2,000 generations. The
American Naturalist, 138(6):1315–1341.

Lenski, R. E., Wiser, M. J., Ribeck, N., Blount, Z. D., Nahum, J. R., Morris, J. J., Zaman,
L., Turner, C. B., Wade, B. D., Maddamsetti, R., Burmeister, A. R., Baird, E. J., Bundy,
J., Grant, N. A., Card, K. J., Rowles, M., Weatherspoon, K., Papoulis, S. E., Sullivan, R.,
Clark, C., Mulka, J. S., and Hajela, N. (2015). Sustained ﬁtness gains and variability in
ﬁtness trajectories in the long-term evolution experiment with Escherichia coli. Proceedings
of the Royal Society of London B: Biological Sciences, 282(1821).

Letten, A. D., Ke, P.-J., and Fukami, T. (2017). Linking modern coexistence theory and

contemporary niche theory. Ecological Monographs, 87(2):161–177.

Li, X., Engelbrecht, A., and Epitropakis, M. G. (2013). Benchmark Functions for CEC’2013
Special Session and Competition on Niching Methods for Multimodal Function Optimiza-
tion. Technical report, RMIT University, Australia.

Losos, J. (2010). Adaptive Radiation, Ecological Opportunity, and Evolutionary Determin-

ism. The American Naturalist, 175(6):623–639.

MacArthur, R. H. and Wilson, E. O. (1967). The Theory of Island Biogeography. Princeton

University Press.

MacPherson, B. and Gras, R. (2016). Individual-based ecological models: Adjunctive tools

or experimental systems? Ecological Modelling, 323:106–114.

Maddamsetti, R., Lenski, R. E., and Barrick, J. E. (2015). Adaptation, clonal interfer-
ence, and frequency-dependent interactions in a long-term evolution experiment with Es-
cherichia coli. Genetics, 200(2):619–631.

Mahfoud, S. W. (1992). Crowding and preselection revisited. Urbana, 51:61801.

Mahfoud, S. W. (1995). Niching methods for genetic algorithms. Urbana, 51(95001).

Maley, C. C. (1999). Four steps toward open-ended evolution. In Banzhaf, W., Daida, J. M.,
Eiben, A. E., Garzon, M. H., and Honavar, V. G., editors, Proceedings of the 1st Annual
Conference on Genetic and Evolutionary Computation, volume 2 of GECCO’99, pages
1336–1343, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.

Maynard Smith, J. (1992). Byte-sized evolution. Nature, 355(6363):772–773.

193

Maynard Smith, J. and Szathmary, E. (1997). The major transitions in evolution. Oxford

University Press.

McCann, K. S. (2000). The diversity–stability debate. Nature, 405(6783):228–233.

McDonald, B. A. and Linde, C. (2002). Pathogen population genetics, evolutionary potential,

and durable resistance. Annual Review of Phytopathology, 40(1):349–379.

McGarigal, K. (2006). Landscape Pattern Metrics. In Encyclopedia of Environmetrics. John

Wiley & Sons, Ltd.

McGarigal, K., Tagil, S., and Cushman, S. A. (2009). Surface metrics: an alternative to patch
metrics for the quantiﬁcation of landscape structure. Landscape Ecology, 24(3):433–450.

McKinney, M. L. (2006). Urbanization as a major cause of biotic homogenization. Biological

Conservation, 127(3):247–260.

McPhee, N. F., Casale, M. M., Finzel, M., Helmuth, T., and Spector, L. (2016a). Visualizing
genetic programming ancestries. In Proceedings of the 2016 on Genetic and Evolutionary
Computation Conference Companion, GECCO ’16 Companion, pages 1419–1426, New
York, NY, USA. ACM.

McPhee, N. F., Donatucci, D., and Helmuth, T. (2016b). Using Graph Databases to Ex-
plore the Dynamics of Genetic Programming Runs. In Genetic Programming Theory and
Practice XIII, Genetic and Evolutionary Computation, pages 185–201. Springer, Cham.

Mittelbach, G. G., Schemske, D. W., Cornell, H. V., Allen, A. P., Brown, J. M., Bush,
M. B., Harrison, S. P., Hurlbert, A. H., Knowlton, N., Lessios, H. A., McCain, C. M.,
McCune, A. R., McDade, L. A., McPeek, M. A., Near, T. J., Price, T. D., Ricklefs, R. E.,
Roy, K., Sax, D. F., Schluter, D., Sobel, J. M., and Turelli, M. (2007). Evolution and the
latitudinal diversity gradient: speciation, extinction and biogeography. Ecology Letters,
10(4):315–331.

Moritz, C. (2002). Strategies to protect biological diversity and the evolutionary processes

that sustain it. Systematic biology, 51(2):238–254.

Moritz, C., Patton, J. L., Schneider, C. J., and Smith, T. B. (2000). Diversiﬁcation of rain-
forest faunas: an integrated molecular approach. Annual review of ecology and systematics,
31(1):533–563.

Mouret, J.-B. and Clune, J. (2015).

Illuminating search spaces by mapping elites.

arXiv:1504.04909 [cs, q-bio]. arXiv: 1504.04909.

Mouret, J.-B. and Doncieux, S. (2009). Using Behavioral Exploration Objectives to Solve
Deceptive Problems in Neuro-evolution.
In Proceedings of the 11th Annual Conference
on Genetic and Evolutionary Computation, GECCO ’09, pages 627–634, New York, NY,
USA. ACM.

Muller, H. J. (1932). Some genetic aspects of sex. The American Naturalist, 66(703):118–138.

194

Nahum, J. R., Godfrey-Smith, P., Harding, B. N., Marcus, J. H., Carlson-Stevermer, J., and
Kerr, B. (2015). A tortoise–hare pattern seen in adapting structured and unstructured
populations suggests a rugged ﬁtness landscape in bacteria. Proceedings of the National
Academy of Sciences, 112(24):7530–7535.

Noble, R. (2017). ggmuller: Create Muller Plots of Evolutionary Dynamics. R package

version 0.5.2.

Ochoa, G., Tomassini, M., Vérel, S., and Darabos, C. (2008). A study of NK landscapes’
basins and local optima networks. In Proceedings of the 10th annual conference on Genetic
and evolutionary computation, pages 555–562. ACM.

O’Donnell, D. R., Parigi, A., Fish, J. A., Dworkin, I., and Wagner, A. P. (2014). The Roles
of Standing Genetic Variation and Evolutionary History in Determining the Evolvability
of Anti-Predator Strategies. PLOS ONE, 9(6):e100163.

Ofria, C., Dolson, E., Lalejini, A., Fenton, J., Jorgensen, S., Miller, R., Moreno, M. A.,
Stredwick, J., Zaman, L., Schossau, J., Gillespie, L., C G, N., and Vostinar, A. (2018).
Empirical.

Ofria, C. and Wilke, C. O. (2004). Avida: A software platform for research in computational

evolutionary biology. Artiﬁcial Life, 10(2):191–229.

Oldford, W. (2016). qqtest: Self Calibrating Quantile-Quantile Plots for Visual Testing.

Olshannikova, E., Ometov, A., Koucheryavy, Y., and Olsson, T. (2015). Visualizing Big
Data with augmented and virtual reality: challenges and research agenda. Journal of Big
Data, 2(1):22.

Olson, R., Mirmomeni, M., Brom, T., Bruger, E., Hintze, A., Knoester, D., and Adami, C.
(2013). Evolved digital ecosystems: Dynamic steady state, not optimal ﬁxed point. The
2018 Conference on Artiﬁcial Life: A Hybrid of the European Conference on Artiﬁcial
Life (ECAL) and the International Conference on the Synthesis and Simulation of Living
Systems (ALIFE), pages 126–133.

Ovaska, S. J., Sick, B., and Wright, A. H. (2009). Periodical switching between related
goals for improving evolvability to a ﬁxed goal in multi-objective problems. Information
Sciences, 179(23):4046–4056.

Pacala, S. W. and Tilman, D. (1994). Limiting Similarity in Mechanistic and Spatial
Models of Plant Competition in Heterogeneous Environments. The American Natural-
ist, 143(2):222–257.

Pianka, E. R. (1966a). Convexity, Desert Lizards, and Spatial Heterogeneity. Ecology,

47(6):1055–1059.

Pianka, E. R. (1966b). Latitudinal Gradients in Species Diversity: A Review of Concepts.

The American Naturalist, 100(910):33–46.

195

Potter, M. A. and De Jong, K. A. (2000). Cooperative coevolution: An architecture for

evolving coadapted subcomponents. Evolutionary Computation, 8(1):1–29.

Pugh, J. K., Soros, L. B., Szerlip, P. A., and Stanley, K. O. (2015). Confronting the Chal-
lenge of Quality Diversity. In Proceedings of the 2015 Annual Conference on Genetic and
Evolutionary Computation, GECCO ’15, pages 967–974, New York, NY, USA. ACM.

R Core Development Team (2013). R: A language and environment for statistical computing.

R Foundation for Statistical Computing, Vienna, Austria.

R Core Team (2017). R: A language and environment for statistical computing. R Foundation

for Statistical Computing, Vienna, Austria.

Ray, T. S. (1994). Evolution, complexity, entropy and artiﬁcial reality. Physica D: Nonlinear

Phenomena, 75(1):239–263.

Ribeck, N. and Lenski, R. E. (2015). Modeling and quantifying frequency-dependent ﬁtness

in microbial populations with cross-feeding interactions. Evolution, 69(5):1313–1320.

Ripley, B. D. (1981). Mapped Point Patterns. In Spatial Statistics, pages 144–190. John

Wiley & Sons, Inc.

Roopnarine, P. D., Angielczyk, K. D., Wang, S. C., and Hertog, R. (2007). Trophic network
models explain instability of Early Triassic terrestrial communities. Proceedings of the
Royal Society of London B: Biological Sciences, 274(1622):2077–2086.

Rouzic, A. L. and Carlborg, O. (2008). Evolutionary potential of hidden genetic variation.

Trends in Ecology & Evolution, 23(1):33–37.

Rozen, D. E. and Lenski, R. E. (2000). Long-term experimental evolution in Escherichia
coli. VIII. Dynamics of a balanced polymorphism. The American Naturalist, 155(1):24–35.

Schluter, D. (2009). Evidence for Ecological Speciation and Its Alternative. Science,

323(5915):737–741.

Schluter, D., Price, T. D., and Grant, P. R. (1985). Ecological Character Displacement in

Darwin’s Finches. Science, 227(4690):1056–1059.

Schoener, T. W. (2011). The Newest Synthesis: Understanding the Interplay of Evolutionary

and Ecological Dynamics. Science, 331(6016):426–429.

Scott, J. G., Fletcher, A. G., Anderson, A. R. A., and Maini, P. K. (2016). Spatial Metrics
of Tumour Vascular Organisation Predict Radiation Eﬃcacy in a Computational Model.
PLOS Computational Biology, 12(1):e1004712.

Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical

Journal, 27(3):379–423.

Skusa, A. and Bedau, M. A. (2003). Towards a comparison of evolutionary creativity in

biological and cultural evolution. Artiﬁcial Life, 8:233–242.

196

Soros, L. (2018). Necessary conditions for open-ended evolution. Electronic Theses and

Dissertations.

Soros, L. and Stanley, K. (2014). Identifying necessary conditions for open-ended evolution
through the artiﬁcial life world of Chromaria. In Sayama, H., Rieﬀel, J., Risi, S., Doursat,
R., and Lipson, H., editors, ALIFE 14: Proceedings of the Fourteenth International Con-
ference on the Synthesis and Simulation of Living Systems, pages 793–800, Cambridge,
MA. MIT Press.

Spector, L. (2012). Assessment of problem modality by diﬀerential performance of lexicase
selection in genetic programming: a preliminary report. In Proceedings of the 14th annual
conference companion on Genetic and evolutionary computation, pages 401–408. ACM.

Standish, R. K. and Galloway, J. (2002). Visualising Tierra’s tree of life using Netmap. In

ALife VIII Workshop proceedings, page 171.

Stanley, K. and Miikkulainen, R. (2004). Competitive coevolution through evolutionary

complexiﬁcation. J. Artif. Intell. Res. (JAIR), 21:63–100.

Stanley, K. O. and Soros, L. B. (2016). The role of subjectivity in the evaluation of open-
endedness. Presentation delivered in OEE2: The Second Workshop on Open-Ended Evo-
lution, at ALIFE 2016, in Cancún, Mexico, 4-8 July 2016.

Stein, A., Gerstner, K., and Kreft, H. (2014). Environmental heterogeneity as a universal
driver of species richness across taxa, biomes and spatial scales. Ecology Letters, pages
n/a–n/a.

Strona, G. and Laﬀerty, K. D. (2016). Environmental change makes robust ecological net-

works fragile. Nature Communications, 7:12462.

Taylor, T., Bedau, M., Channon, A., Ackley, D., Banzhaf, W., Beslon, G., Dolson, E., Froese,
T., Hickinbotham, S., Ikegami, T., McMullin, B., Packard, N., Rasmussen, S., Virgo, N.,
Agmon, E., Clark, E., McGregor, S., Ofria, C., Ropella, G., Spector, L., Stanley, K. O.,
Stanton, A., Timperley, C., Vostinar, A., and Wiser, M. (2016). Open-ended evolution:
Perspectives from the OEE workshop in York. Artiﬁcial Life, 22(3):408–423.

Tenaillon, O., Barrick, J. E., Ribeck, N., Deatherage, D. E., Blanchard, J. L., Dasgupta,
A., Wu, G. C., Wielgoss, S., Cruveiller, S., Médigue, C., Schneider, D., and Lenski, R. E.
(2016). Tempo and mode of genome evolution in a 50,000-generation experiment. Nature,
536:165.

Tilman, D. (1977). Resource Competition between Plankton Algae: An Experimental and

Theoretical Approach. Ecology, 58(2):338.

Tilman, D., Isbell, F., and Cowles, J. M. (2014). Biodiversity and Ecosystem Functioning.

Annual Review of Ecology, Evolution, and Systematics, 45(1):471–493.

Tomassini, M. (2005). Spatially Structured Evolutionary Algorithms: Artiﬁcial Evolution in
Space and Time. Natural Computing Series. Springer Berlin Heidelberg, Berlin, Heidel-
berg.

197

Tucker, C. M., Cadotte, M. W., Carvalho, S. B., Davies, T. J., Ferrier, S., Fritz, S. A.,
Grenyer, R., Helmus, M. R., Jin, L. S., Mooers, A. O., Pavoine, S., Purschke, O., Redding,
D. W., Rosauer, D. F., Winter, M., and Mazel, F. (2017). A guide to phylogenetic metrics
for conservation, community ecology and macroecology. Biological Reviews, 92(2):698–715.

Turner, C. B., Blount, Z. D., and Lenski, R. E. (2015a). Replaying evolution to test the
cause of extinction of one ecotype in an experimentally evolved population. PLoS ONE,
10(11):e0142050.

Turner, C. B., Blount, Z. D., Mitchell, D. H., and Lenski, R. E. (2015b). Evolution and
coexistence in response to a key innovation in a long-term evolution experiment with
Escherichia coli. bioRxiv.

van Dam, A., Laidlaw, D. H., and Simpson, R. M. (2002). Experiments in Immersive Virtual

Reality for Scientiﬁc Visualization. Computers & Graphics, 26(4):535–555.

Vandergast, A. G., Bohonak, A. J., Hathaway, S. A., Boys, J., and Fisher, R. N. (2008). Are
hotspots of evolutionary potential adequately protected in southern California? Biological
Conservation, 141(6):1648–1664.

Virgo, N., Agmon, E., and Fernando, C. (2017). Lineage selection leads to evolvability at
large population sizes. Proceedings of the European Conference on Artiﬁcial Life, pages
420–427.

Vostinar, A. E., Fenton, J., Waters, C., and Ofria, C. (2018). Signals in the Dark: What
factors select for the evolution of cooperation controlled by quorum sensing? The 2018
Conference on Artiﬁcial Life: A Hybrid of the European Conference on Artiﬁcial Life
(ECAL) and the International Conference on the Synthesis and Simulation of Living Sys-
tems (ALIFE), pages 651–658.

Vostinar, A. E. and Ofria, C. (2019). Spatial Structure Can Decrease Symbiotic Cooperation.

Artiﬁcial Life, pages 1–21.

Wagner, A. P., Zaman, L., Dworkin, I., and Ofria, C. (2013). Behavioral Strategy Chases

Promote the Evolution of Prey Intelligence. arXiv:1310.1369 [q-bio].

Walker, B. L. and Ofria, C. (2012). Evolutionary Potential is Maximized at Intermediate
Diversity Levels. In Proceedings of the 13th International Conference for Artiﬁcial Life,
pages 116–120, East Lansing, MI, USA. MIT Press.

Walker, J. C. (2011). The Evolution of Optimal Foraging Strategies in Populations of Digital
Organisms. In Proceedings of the 13th Annual Conference on Genetic and Evolutionary
Computation, GECCO ’11, pages 203–210, New York, NY, USA. ACM.

Webb, C. O. and Losos, A. E. J. B. (2000). Exploring the Phylogenetic Structure of Ecological
Communities: An Example for Rain Forest Trees. The American Naturalist, 156(2):145–
155.

198

White, D. R., McDermott, J., Castelli, M., Manzoni, L., Goldman, B. W., Kronberger, G.,
Jaśkowski, W., O’Reilly, U.-M., and Luke, S. (2013). Better GP benchmarks: community
survey results and proposals. Genetic Programming and Evolvable Machines, 14(1):3–29.

Whitley, D., Rana, S., and Heckendorn, R. B. (1998). The Island Model Genetic Algorithm:
On Separability, Population Size and Convergence. Journal of Computing and Information
Technology, 7:33–47.

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York.

Wilke, C. O., Wang, J. L., Ofria, C., Lenski, R. E., and Adami, C. (2001). Evolution of digital
organisms at high mutation rates leads to survival of the ﬂattest. Nature, 412(6844):331–
333.

Wilmers, C. C. and Getz, W. M. (2005). Gray Wolves as Climate Change Buﬀers in Yellow-

stone. PLoS Biol, 3(4):e92.

Winter, M., Devictor, V., and Schweiger, O. (2013). Phylogenetic diversity and nature

conservation: where are we? Trends in Ecology & Evolution, 28(4):199–204.

Wiser, M. J. (2015). An analysis of ﬁtness in long-term asexual evolution experiments. PhD

thesis, Michigan State University.

Wiser, M. J., Dolson, E. L., Vostinar, A., Lenski, R. E., and Ofria, C. (2018). The bound-
edness illusion: Asymptotic projections from early evolution underestimate evolutionary
potential. PeerJ Preprints, 6:e27246v2.

Wiser, M. J., Ribeck, N., and Lenski, R. E. (2013). Long-term dynamics of adaptation in

asexual populations. Science, 342(6164):1364–1367.

Wolpert, D. H. and Macready, W. G. (1997). No free lunch theorems for optimization. IEEE

Transactions on Evolutionary Computation, 1(1):67–82.

Wright, S. (1932). The roles of mutation, inbreeding, crossbreeding, and selection in evo-
In Proceedings of the Sixth International Congress of Genetics, pages 356–366,

lution.
Brooklyn Botanic Garden, Brooklyn, NY.

Yedid, G., Ofria, C., and Lenski, R. (2009). Selective Press Extinctions, but Not Random
Pulse Extinctions, Cause Delayed Ecological Recovery in Communities of Digital Organ-
isms. The American Naturalist, 173(4):E139–E154.

Yedid, G., Ofria, C. A., and Lenski, R. E. (2008). Historical and contingent factors aﬀect
re-evolution of a complex feature lost during mass extinction in communities of digital
organisms. Journal of Evolutionary Biology, 21(5):1335–1357.

Yedid, G., Stredwick, J., Ofria, C. A., and Agapow, P.-M. (2012). A Comparison of the
Eﬀects of Random and Selective Mass Extinctions on Erosion of Evolutionary History in
Communities of Digital Organisms. PLOS ONE, 7(5):e37233.

199

Zaman, L. (2018). Investigating open-ended coevolution in digital organisms. In Ikegami,
T., Virgo, N., Witkowski, O., Oka, M., Suzuki, R., and Iizuka, H., editors, The 2018 Con-
ference on Artiﬁcial Life: A Hybrid of the European Conference on Artiﬁcial Life (ECAL)
and the International Conference on the Synthesis and Simulation of Living Systems (AL-
IFE), pages 258–259, Cambridge, MA. MIT Press.

Zaman, L., Devangam, S., and Ofria, C. (2011). Rapid host-parasite coevolution drives the
production and maintenance of diversity in digital organisms. In Proceedings of the 13th
annual conference on Genetic and evolutionary computation, pages 219–226. ACM.

Zaman, L., Meyer, J. R., Devangam, S., Bryson, D. M., Lenski, R. E., and Ofria, C. (2014).
Coevolution drives the emergence of complex traits and promotes evolvability. PLoS Biol,
12(12):e1002023.

200